MapReduce Service
MapReduce Service
All results for "
" in this service
All results for "
" in this service
What's New
Function Overview
Product Bulletin
Vulnerability Notice
Guide for Fixing the Apache Log4j2 Remote Code Execution Vulnerability (CVE-2021-44228)
MRS Fastjson Vulnerability Remediation Guide
Overview
Impact
Remediating Manager Web
Remediating Manager Controller
Remediating Manager NodeAgent
Remediating Kafka
Remediating Flink
Version Support Bulletin
MRS Cluster Version Lifecycle
Service Overview
Infographics
What Is MRS?
Advantages
Application Scenarios
MRS Cluster Version Overview
List of MRS Component Versions
Components
Alluxio
CarbonData
ClickHouse
ClickHouse Basic Principles
Relationships Between ClickHouse and Other Components
ClickHouse Enhanced Open Source Features
CDL
CDL Basic Principles
DBService
DBService Basic Principles
Flink
Flink Basic Principles
Flink HA Solution
Relationships Between Flink and Other Components
Flink Enhanced Open Source Features
Flink Sliding Window Enhancement
Flink Job Pipeline Enhancement
Flink Stream SQL Join
Flink CEP in SQL
Flume
Flume Basic Principles
Relationships Between Flume and Other Components
Flume Enhanced Open Source Features
HBase
HBase Basic Principles
HBase HA Solution
Relationship with Other Components
HBase Enhanced Open Source Features
HDFS
HDFS Basic Principles
HDFS HA Solution
Relationship Between HDFS and Other Components
HDFS Enhanced Open Source Features
HetuEngine
HetuEngine Basic Principles
Relationships Between HetuEngine and Other Components
Hive
Hive Basic Principles
Hive CBO Principles
Relationships Between Hive and Other Components
Enhanced Open Source Feature
Hudi
Hue
Hue Basic Principles
Relationships Between Hue and Other Components
Hue Enhanced Open Source Features
Impala
IoTDB
IoTDB Basic Principles
IoTDB Enhanced Open Source Features
Kafka
Kafka Basic Principles
Relationships Between Kafka and Other Components
Kafka Enhanced Open Source Features
KafkaManager
KrbServer and LdapServer
KrbServer and LdapServer Principles
KrbServer and LdapServer Enhanced Open Source Features
Kudu
Loader
Loader Basic Principles
Relationship Between Loader and Other Components
Loader Enhanced Open Source Features
Manager
Manager Basic Principles
Manager Key Features
MapReduce
MapReduce Basic Principles
Relationship Between MapReduce and Other Components
MapReduce Enhanced Open Source Features
Oozie
Oozie Basic Principles
Oozie Enhanced Open Source Features
OpenTSDB
Presto
Ranger
Ranger Basic Principles
Relationships Between Ranger and Other Components
Spark
Spark Basic Principles
Spark HA Solution
Relationships Between Spark and Other Components
Spark Enhanced Open Source Features
Spark2x
Spark2x Basic Principles
Spark2x HA Solution
Spark2x Multi-active Instance
Spark2x Multi-tenant
Relationship Between Spark2x and Other Components
Spark2x Enhanced Open Source Features
Spark2x Open Source New Features
CarbonData Basic Principles
Optimizing Spark SQL Query of Data of Multiple Sources
Storm
Storm Basic Principles
Relationships Between Storm and Other Components
Storm Enhanced Open Source Features
Tez
YARN
YARN Basic Principles
YARN HA Solution
Relationships Between YARN and Other Components
Yarn Enhanced Open Source Features
ZooKeeper
ZooKeeper Basic Principles
Relationships Between ZooKeeper and Other Components
ZooKeeper Enhanced Open Source Features
Functions
Job Management
Metadata Management
Enterprise Project Management
Managing Multi-Tenancy Resources
Easy Access to Web UIs of Components
Node Bootstrap Actions
Cluster Management
Cluster Lifecycle Management
Cluster Scaling
Creating Task Nodes
Auto Scaling
Isolating Nodes
Scaling Up Master Node Specifications
Managing Node Labels
Cluster O&M
Cluster Status Notification
MRS Security Hardening
MRS Reliability Enhancement
Security
Shared Responsibilities
Asset Identification and Management
Identity Authentication and Access Control
Data Protection Technologies
Audit and Logging
Service Resilience
Security Risk Monitoring
Update Management
Security Hardening
Constraints
Technical Support
Billing
Permissions Management
Related Services
Quota Description
Common Concepts
Released Versions
Version Overview
Release Notes
MRS 3.1.2-LTS.3 Version Description
MRS 3.1.5 Version Description
MRS 3.2.0-LTS.1 Version Description
Billing
Overview
Billing Modes
Overview
Yearly/Monthly Billing
Pay-per-Use Billing
Billing Items
Billing Examples
Billing Mode Changes
Introduction
Changing the Billing Mode from Pay-per-Use to Yearly/Monthly
Renewal
Introduction
Manually Renewing an MRS Cluster
Auto-renewing an MRS Cluster
Bills
Arrears
Stopping Billing
Managing Costs
FAQs
Why Is the Price Not Displayed During MRS Cluster Creation?
How Is the Task Node in an MRS Cluster Billed?
Why Does My Unsubscription from ECS Fail After I Unsubscribe from MRS?
Getting Started
Creating and Using a Hadoop Cluster for Offline Analysis
Creating and Using a Kafka Cluster for Stream Processing
Creating and Using an HBase Cluster for Offline Query
Creating and Using a ClickHouse Cluster for Columnar Store
Creating and Using an MRS Cluster Requiring Security Authentication
Best Practices for Beginners
User Guide
Preparations
Configuring MRS Cloud Service Authorization
Creating an IAM User and Granting MRS Permissions
Creating a Custom Policy for MRS
MRS Cluster Planning
Service Selection
MRS Cluster Types
MRS Cluster Node Types
MRS Cluster Node Specifications
MRS Cluster Deployment
Overview
Kerberos Authentication for MRS Clusters
ECS Specifications Supported by MRS Clusters
Buying MRS Clusters
Quickly Buying an MRS Cluster
Manually Buying an MRS Cluster
Installing an MRS Cluster Client
Installing a Client (MRS 3.x)
Installing a Client (MRS 2.x or Earlier)
Submitting an MRS Job
MRS Job Types
Uploading Application Data to an MRS Cluster
Running an MRS Job
Running a MapReduce Job
Running a SparkSubmit Job
Running a HiveSQL Job
Running a Spark SQL Job
Running a Flink Job
Running a HadoopStreaming Job
Viewing MRS Job Details and Logs
Managing Clusters
Overview
Introduction to MRS Manager
Accessing MRS Manager
Managing an MRS Cluster
Viewing Basic Information About an MRS Cluster
Checking the Running Status of an MRS Cluster
Starting and Stopping an MRS Cluster
Restarting an MRS Cluster
Exporting MRS Cluster Configuration Parameters
Synchronizing the MRS Cluster Configuration
Transforming a Pay-per-Use MRS Cluster to a Yearly/Monthly Cluster
Deleting an MRS Cluster
Changing the VPC Subnet of an MRS Cluster
Replacing the NTP Server for an MRS Cluster
Modifying the OMS Service Configuration
Modifying MRS Manager Routing Table
Managing MRS Cluster Components
Checking the Running Status of an MRS Cluster Component
Starting and Stopping an MRS Cluster Component
Restarting an MRS Cluster Component
Adding and Deleting an MRS Cluster Component
Modifying the Configuration Parameters of an MRS Cluster Component
Viewing the Modified Component Configuration Parameters of an MRS Cluster
Synchronizing MRS Component Configuration Parameters
Adding Custom MRS Component Parameters
Managing MRS Role Instances
Managing MRS Role Instance Groups
Modifying MRS Role Instance Parameters
Perform an Active/Standby Switchover for MRS Role Instances
Decommissioning and Recommissioning an MRS Role Instance
Enabling and Disabling Ranger Authentication for an MRS Component
Accessing Web Pages of Open Source Components Managed in MRS Clusters
Managing MRS Cluster Nodes
Checking the Running Status of an MRS Cluster Node
Starting and Stopping All Roles on an MRS Cluster Node
Isolating an MRS Cluster Node
Modifying the Rack Information of an MRS Cluster Node
Scaling Up Master Node Specifications in an MRS Cluster
Synchronizing Disk Information of an MRS Cluster Node
Adding a Tag to an MRS Cluster/Node
Configuring Bootstrap Actions for an MRS Cluster Node
MRS Bootstrap Action Overview
Preparing the Bootstrap Action Script for an MRS Node
Adding MRS Node Bootstrap Actions and Installing Third-Party Software
Viewing the Bootstrap Action Execution Records of an MRS Node
Managing the MRS Cluster Client
Updating the MRS Cluster Client After the Server Configuration Expires
Viewing the Installed MRS Cluster Client
Batch Upgrading MRS Cluster Clients
Managing MRS Cluster Jobs
Stopping and Deleting an MRS Cluster Job
Configuring Notification Rules for MRS Jobs
Managing MRS Cluster Tenants
Introduction to MRS Multi-Tenancy
Using MRS Multi-Tenancy
Configuring MRS Tenants
Creating an MRS Tenant
Creating an MRS Sub-Tenant
Binding Tenant to an MRS Cluster User
Adding an MRS Tenant Resource Pool
Configuring the Queue Capacity Policy of a Resource Pool
Configuring the MRS Tenant Queue
Managing MRS Tenant Resources
Managing the MRS Tenant Resource Directory
Managing MRS Tenant Resource Pools
Clearing the MRS Tenant Queue Configuration
Restoring MRS Tenant Data After YARN Is Reinstalled
Deleting an MRS Tenant
Managing Global User Policies When Using Superior Scheduler
Clearing Tenant's Non-Associated Queues Using Capacity Scheduler
Switching the MRS Tenant Resource Scheduler
Managing MRS Cluster Users
Cluster User Permissions
MRS Cluster User Permission Model
MRS Cluster User Identity Authentication Policy
MRS Cluster User Permission Authentication Policy
Default Permissions of the MRS Cluster
Synchronizing IAM Users to MRS
MRS Cluster User Accounts
Managing MRS Cluster Roles
Managing MRS Cluster User Groups
Managing MRS Cluster Users
Creating an MRS Cluster User
Modifying MRS Cluster User Information
Locking an MRS Cluster User
Deleting an MRS Cluster User
Initializing MRS Cluster User Passwords
Downloading MRS Cluster User Credentials
Unlocking an MRS Cluster User
Unlocking an LDAP User in the MRS Cluster
Unlocking the LDAP Management Account of the MRS Cluster
Configuring Password Policies for MRS Cluster Users
Configuring the Private Attribute of MRS Cluster Users
Managing MRS Cluster Metadata
MRS Cluster Metadata Overview
Storing Ranger Metadata to RDS
Storing Hive Metadata to RDS
Configuring a LakeFormation Data Connection
LakeFormation Overview
Preparing for a LakeFormation Data Connection
Configuring a LakeFormation Data Connection During Cluster Creation
Managing MRS Cluster Data Connections
Managing Static Service Resources in an MRS Cluster
Overview of Static Service Resources
Configuring Static Resources for an MRS Cluster
Checking the Static Resources of an MRS Cluster
MRS Cluster O&M
Cluster O&M
Logging In to an MRS Cluster
Checking MRS Active/Standby Management Nodes
Logging In to an MRS Cluster Node
Viewing MRS Cluster Monitoring Metrics
Viewing MRS Cluster Resource Monitoring Metrics
Viewing MRS Cluster Component Monitoring Metrics
Viewing MRS Node Resource Monitoring Metrics
Dumping MRS Cluster Monitoring Data
Checking MRS Cluster Health
Performing a Health Check for an MRS Cluster
Performing Health Checks on MRS Cluster Nodes
Viewing and Exporting a Health Check Report
Adjusting the Capacity of an MRS Cluster
Scaling Out an MRS Cluster
Expanding a Data Disk of an MRS Cluster Node
Scaling In an MRS Cluster
Scaling In ClickHouseServer Nodes
Unsubscribing from a Specified Node in a Yearly/Monthly MRS Cluster
MRS Task Node Auto Scaling
Automatic Scaling of Task Nodes in an MRS Cluster
Adding an Auto Scaling Policy for MRS Task Nodes
Managing MRS Cluster Auto Scaling Policies
MRS Cluster Data Backup and Restoration
Backing Up and Restoring MRS Cluster Data
Enabling MRS Inter-Cluster Replication
Creating an MRS Cluster Data Backup Task
Creating an MRS Cluster Data Restoration Task
Backing Up MRS Cluster Component Data
Backing Up Manager Data (MRS 2.x and Earlier)
Backing Up Manager Data (MRS 3.x and Later Versions)
Backing Up CDL Service Data
Backing Up ClickHouse Metadata
Backing Up ClickHouse Service Data
Backing Up DBService Data
Backing Up Doris Data
Backing Up Flink Metadata
Backing Up HBase Metadata
Backing Up HBase Service Data
Backing Up HDFS NameNode Data
Backing Up HDFS Service Data
Backing Up Hive Service Data
Backing Up IoTDB Metadata
Backing Up IoTDB Service Data
Backing Up Kafka Metadata
Restoring MRS Cluster Component Data
Restoring Manager Data (MRS2.x and Earlier)
Restoring Manager Data (MRS 3.x and Later Versions)
Restoring CDL Service Data
Restoring ClickHouse Metadata
Restoring ClickHouse Service Data
Restoring DBService Metadata
Restoring Doris Service Data
Restoring Flink Metadata
Restoring HBase Metadata
Restoring HBase Service Data
Restoring HDFS NameNode Metadata
Restoring HDFS Service Data
Restoring Hive Service Data
Restoring IoTDB Metadata
Restoring IoTDB Service Data
Restoring Kafka Metadata
Managing MRS Cluster Backup and Restoration Tasks
Using HDFS Snapshots to Quickly Restore Component Service Data
MRS Cluster Patching
Viewing Patch Information for an MRS Cluster
Patching an MRS Cluster
Applying Rolling Patches for an MRS Cluster
Patching Hosts Isolated in an MRS Cluster
MRS Cluster Patch Description
MRS 3.0.5.1 Patch Description
MRS 2.1.0.11 Patch Description
MRS 2.1.0.10 Patch Description
MRS 2.1.0.9 Patch Description
MRS 2.1.0.8 Patch Description
MRS 2.1.0.7 Patch Description
MRS 2.1.0.6 Patch Description
MRS 2.1.0.3 Patch Description
MRS 2.1.0.2 Patch Description
MRS 2.1.0.1 Patch Description
MRS 2.0.6.1 Patch Description
MRS 2.0.1.3 Patch Description
MRS 2.0.1.2 Patch Description
MRS 2.0.1.1 Patch Description
MRS 1.9.3.3 Patch Description
MRS 1.9.3.1 Patch Description
MRS 1.9.2.2 Patch Description
MRS 1.9.0.8, 1.9.0.9, and 1.9.0.10 Patch Description
MRS 1.9.0.7 Patch Description
MRS 1.9.0.6 Patch Description
MRS 1.9.0.5 Patch Description
MRS 1.8.10.1 Patch Description
Viewing Logs of an MRS Cluster
Overview of MRS Cluster Logs
Viewing MRS Operation Logs
Viewing MRS Cluster History
Viewing MRS Cluster Audit Logs
Viewing Role Instance Logs of MRS Components
Searching for MRS Cluster Logs Online
Downloading MRS Cluster Logs
Collecting MRS Cluster Service Stack Information
Configuring Default Log Level and Archive File Size for MRS Components
Configuring the Number of Local Backups of MRS Cluster Audit Logs
Configuring Dumping for MRS Cluster Audit Logs
MRS Cluster Security Configuration
Cluster Mutual Trust Management
Overview of Mutual Trust Between MRS Clusters
Changing the System Domain Name of an MRS Cluster
Configuring Mutual Trust Between MRS Clusters
Configuring User Permissions for Mutually Trusted MRS Clusters
Replacing MRS Cluster Certificates
Replacing the CA Certificate
Replacing an HA Certificate
MRS Cluster Security Hardening
MRS Cluster Security Hardening Policies
Configuring Hadoop Data Encryption During Transmission
Configuring Kafka Data Encryption During Transmission
Configuring HDFS Data Encryption During Transmission
Configuring Spark Data Encryption During Transmission
Configuring ZooKeeper Data Encryption During Transmission
Encrypting Data Transmission Between the Controller and Agent
Configuring a Trusted IP Address to Access LDAP
HFile and WAL Encryption
Configuring the IP Address Whitelist for Modifying Data in an HBase Read-Only Cluster
Configuring LDAP Output Audit Logs
Updating Encryption Keys of an MRS Cluster
Updating the SSH Key of User omm on MRS Cluster Nodes
Enabling and Disabling Permission Verification on MRS Cluster Components
Allowing External Users to Access MRS Clusters in Normal Mode
Configuring Secure Communication Authorization for an MRS Cluster
Changing the Passwords for System Users of an MRS Cluster
Changing or Resetting the Password for User admin of an MRS Cluster
Changing the Passwords for OS Users of an MRS Cluster Node
Changing the Password for the Kerberos Administrator of an MRS Cluster
Changing the Passwords for Manager Users of an MRS Cluster
Changing the Password for a Regular LDAP User of an MRS Cluster
Changing the LDAP Administrator Password for an MRS Cluster
Changing the Passwords for MRS Cluster Component Running Users
Changing the Passwords for Database Users of an MRS Cluster
Changing the Password for the OMS Database Administrator
Changing the Password for an OMS Database Access User
Changing the Passwords for Database Users of MRS Cluster Components
Resetting the MRS Component Database User Password
Resetting the Password for User omm in DBService
Changing the Password for User compdbuser of the DBService Database
Viewing and Configuring MRS Alarm Events
Viewing MRS Cluster Events
Viewing Alarms of an MRS Cluster
Configuring Alarm Thresholds for an MRS Cluster
Configuring Alarm Masking for an MRS Cluster
Connecting an MRS Cluster to SNMP to Report Alarms
Connecting an MRS Cluster to the Syslog Server to Report Alarms
Periodically Backing Up Alarm and Audit Information
Enabling the MRS Cluster Maintenance Mode to Disable Alarm Reporting
Configuring Notifications for MRS Cluster Alarms and Events
MRS Cluster Alarm Handling Reference
ALM-12001 Audit Log Dumping Failure
ALM-12004 OLdap Resource Abnormal
ALM-12005 OKerberos Resource Abnormal
ALM-12006 Node Fault
ALM-12007 Process Fault
ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes
ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes
ALM-12012 NTP Service Is Abnormal
ALM-12014 Partition Lost
ALM-12015 Partition Filesystem Readonly
ALM-12016 CPU Usage Exceeds the Threshold
ALM-12017 Insufficient Disk Capacity
ALM-12018 Memory Usage Exceeds the Threshold
ALM-12027 Host PID Usage Exceeds the Threshold
ALM-12028 Number of Processes in the D State and Z State on a Host Exceeds the Threshold
ALM-12033 Slow Disk Fault
ALM-12034 Periodical Backup Failure
ALM-12035 Unknown Data Status After Recovery Task Failure
ALM-12037 NTP Server Abnormal
ALM-12038 Monitoring Indicator Dumping Failure
ALM-12039 Active/Standby OMS Databases Not Synchronized
ALM-12040 Insufficient System Entropy
ALM-12041 Incorrect Permission on Key Files
ALM-12042 Incorrect Configuration of Key Files
ALM-12045 Read Packet Dropped Rate Exceeds the Threshold
ALM-12046 Write Packet Dropped Rate Exceeds the Threshold
ALM-12047 Read Packet Error Rate Exceeds the Threshold
ALM-12048 Write Packet Error Rate Exceeds the Threshold
ALM-12049 Network Read Throughput Rate Exceeds the Threshold
ALM-12050 Network Write Throughput Rate Exceeds the Threshold
ALM-12051 Disk Inode Usage Exceeds the Threshold
ALM-12052 TCP Temporary Port Usage Exceeds the Threshold
ALM-12053 Host File Handle Usage Exceeds the Threshold
ALM-12054 Invalid Certificate File
ALM-12055 Certificate File Is About to Expire
ALM-12057 Metadata Not Configured with the Task to Periodically Back Up Data to a Third-Party Server
ALM-12061 Process Usage Exceeds the Threshold
ALM-12062 OMS Parameter Configurations Mismatch with the Cluster Scale
ALM-12063 Unavailable Disk
ALM-12064 Host Random Port Range Conflicts with Cluster Used Port
ALM-12066 Trust Relationships Between Nodes Become Invalid
ALM-12067 Tomcat Resource Is Abnormal
ALM-12068 ACS Resource Exception
ALM-12069 AOS Resource Exception
ALM-12070 Controller Resource Is Abnormal
ALM-12071 Httpd Resource Is Abnormal
ALM-12072 FloatIP Resource Is Abnormal
ALM-12073 CEP Resource Is Abnormal
ALM-12074 FMS Resource Is Abnormal
ALM-12075 PMS Resource Is Abnormal
ALM-12076 GaussDB Resource Is Abnormal
ALM-12077 User omm Expired
ALM-12078 Password of User omm Expired
ALM-12079 User omm Is About to Expire
ALM-12080 Password of User omm Is About to Expire
ALM-12081User ommdba Expired
ALM-12082 User ommdba Is About to Expire
ALM-12083 Password of User ommdba Is About to Expire
ALM-12084 Password of User ommdba Expired
ALM-12085 Service Audit Log Dump Failure
ALM-12087 System Is in the Upgrade Observation Period
ALM-12089 Inter-Node Network Is Abnormal
ALM-12091 Abnormal disaster Resources
ALM-12099 core dump Occurred
ALM-12100 AD Service Connection Failed
ALM-12101 AZ Unhealthy
ALM-12102 AZ HA Component Is Not Deployed Based on DR Requirements
ALM-12103 Executor Resource Exception
ALM-12104 Abnormal Knox Resources
ALM-12110 Failed to get ECS temporary AK/SK
ALM-12172 Failed to Report Metrics to Cloud Eye
ALM-12180 Suspended Disk I/O
ALM-12186 CGroup Task Usage Exceeds the Threshold
ALM-12187 Failed to Expand Disk Partition Capacity
ALM-12188 diskmgt Disk Monitoring Unavailable
ALM-12190 Number of Knox Connections Exceeds the Threshold
ALM-12191 Disk I/O Usage Exceeds the Threshold
ALM-12192 Host Load Exceeds the Threshold
ALM-12200 Password Is About to Expire
ALM-12201 Process CPU Usage Exceeds the Threshold
ALM-12202 Process Memory Usage Exceeds the Threshold
ALM-12203 Process Full GC Duration Exceeds the Threshold
ALM-12204 Wait Duration of a Disk Read Exceeds the Threshold
ALM-12205 Wait Duration of a Disk Write Exceeds the Threshold
ALM-12206 Password Has Expired
ALM-12207 Slow Disk Processing Timeout
ALM-13000 ZooKeeper Service Unavailable
ALM-13001 Available ZooKeeper Connections Are Insufficient
ALM-13002 ZooKeeper Direct Memory Usage Exceeds the Threshold
ALM-13003 GC Duration of the ZooKeeper Process Exceeds the Threshold
ALM-13004 ZooKeeper Heap Memory Usage Exceeds the Threshold
ALM-13005 Failed to Set the Quota of Top Directories of ZooKeeper Components
ALM-13006 Znode Number or Capacity Exceeds the Threshold
ALM-13007 Available ZooKeeper Client Connections Are Insufficient
ALM-13008 ZooKeeper Znode Usage Exceeds the Threshold
ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold
ALM-13010 Znode Usage of a Directory with Quota Configured Exceeds the Threshold
ALM-14000 HDFS Service Unavailable
ALM-14001 HDFS Disk Usage Exceeds the Threshold
ALM-14002 DataNode Disk Usage Exceeds the Threshold
ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold
ALM-14006 Number of HDFS Files Exceeds the Threshold
ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold
ALM-14008 DataNode Heap Memory Usage Exceeds the Threshold
ALM-14009 Number of Dead DataNodes Exceeds the Threshold
ALM-14010 NameService Service Is Abnormal
ALM-14011 DataNode Data Directory Is Not Configured Properly
ALM-14012 JournalNode Is Out of Synchronization
ALM-14013 Failed to Update the NameNode FsImage File
ALM-14014 NameNode GC Time Exceeds the Threshold
ALM-14015 DataNode GC Time Exceeds the Threshold
ALM-14016 DataNode Direct Memory Usage Exceeds the Threshold
ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold
ALM-14018 NameNode Non-heap Memory Usage Exceeds the Threshold
ALM-14019 DataNode Non-heap Memory Usage Exceeds the Threshold
ALM-14020 Number of Entries in the HDFS Directory Exceeds the Threshold
ALM-14021 NameNode Average RPC Processing Time Exceeds the Threshold
ALM-14022 NameNode Average RPC Queuing Time Exceeds the Threshold
ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold
ALM-14024 Tenant Space Usage Exceeds the Threshold
ALM-14025 Tenant File Object Usage Exceeds the Threshold
ALM-14026 Blocks on DataNode Exceed the Threshold
ALM-14027 DataNode Disk Fault
ALM-14028 Number of Blocks to Be Supplemented Exceeds the Threshold
ALM-14029 Number of Blocks in a Replica Exceeds the Threshold
ALM-14030 HDFS Allows Write of Single-Replica Data
ALM-14031 DataNode Process Is Abnormal
ALM-14032 JournalNode Process Is Abnormal
ALM-14033 ZKFC Process Is Abnormal
ALM-14034 Router Process Is Abnormal
ALM-14035 HttpFS Process Is Abnormal
ALM-14036 NameNode Is In Safe Mode
ALM-14037 DataNodes Outside the Cluster
ALM-14038 Router Heap Memory Usage Exceeds the Threshold
ALM-14039 Slow DataNodes Exist in the Cluster
ALM-16000 Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold
ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold
ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold
ALM-16003 Background Thread Usage Exceeds the Threshold
ALM-16004 Hive Service Unavailable
ALM-16005 The Heap Memory Usage of the Hive Process Exceeds the Threshold
ALM-16006 The Direct Memory Usage of the Hive Process Exceeds the Threshold
ALM-16007 Hive GC Time Exceeds the Threshold
ALM-16008 Non-Heap Memory Usage of the Hive Process Exceeds the Threshold
ALM-16009 Map Number Exceeds the Threshold
ALM-16045 Hive Data Warehouse Is Deleted
ALM-16046 Hive Data Warehouse Permission Is Modified
ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
ALM-16048 Tez or Spark Library Path Does Not Exist
ALM-16051 Percentage of Sessions Connected to MetaStore Exceeds the Threshold
ALM-16052 Latency for MetaStore to Access the Meta Database During Table Creation Exceeds the Threshold
ALM-16053 Average HQL Submission Time of Hive in the Last 5 Minutes Exceeds the Threshold
ALM-17003 Oozie Service Unavailable
ALM-17004 Oozie Heap Memory Usage Exceeds the Threshold
ALM-17005 Oozie Non Heap Memory Usage Exceeds the Threshold
ALM-17006 Oozie Direct Memory Usage Exceeds the Threshold
ALM-17007 Garbage Collection (GC) Time of the Oozie Process Exceeds the Threshold
ALM-17008 Abnormal Connection Between Oozie and ZooKeeper
ALM-17009 Abnormal Connection Between Oozie and DBService
ALM-17010 Abnormal Connection Between Oozie and HDFS
ALM-17011 Abnormal Connection Between Oozie and Yarn
ALM-18000 Yarn Service Unavailable
ALM-18002 NodeManager Heartbeat Lost
ALM-18003 NodeManager Unhealthy
ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold
ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold
ALM-18010 ResourceManager GC Time Exceeds the Threshold
ALM-18011 NodeManager GC Time Exceeds the Threshold
ALM-18012 JobHistoryServer GC Time Exceeds the Threshold
ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold
ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold
ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold
ALM-18016 Non Heap Memory Usage of ResourceManager Exceeds the Threshold
ALM-18017 Non Heap Memory Usage of NodeManager Exceeds the Threshold
ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold
ALM-18019 Non Heap Memory Usage of JobHistoryServer Exceeds the Threshold
ALM-18020 Yarn Task Execution Timeout
ALM-18021 Mapreduce Service Unavailable
ALM-18022 Insufficient Yarn Queue Resources
ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold
ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold
ALM-18025 Number of Terminated Yarn Tasks Exceeds the Threshold
ALM-18026 Number of Failed Yarn Tasks Exceeds the Threshold
ALM-18027 JobHistoryServer Process Is Abnormal
ALM-18028 TimeLineServer Process Is Abnormal
ALM-19000 HBase Service Unavailable
ALM-19006 HBase Replication Sync Failed
ALM-19007 HBase GC Time Exceeds the Threshold
ALM-19008 Heap Memory Usage of the HBase Process Exceeds the Threshold
ALM-19009 Direct Memory Usage of the HBase Process Exceeds the Threshold
ALM-19011 RegionServer Region Number Exceeds the Threshold
ALM-19012 HBase System Table Directory or File Lost
ALM-19013 Duration of Regions in transaction State Exceeds the Threshold
ALM-19014 Capacity Quota Usage on ZooKeeper Exceeds the Threshold Severely
ALM-19015 Quantity Quota Usage on ZooKeeper Exceeds the Threshold
ALM-19016 Quantity Quota Usage on ZooKeeper Exceeds the Threshold Severely
ALM-19017 Capacity Quota Usage on ZooKeeper Exceeds the Threshold
ALM-19018 HBase Compaction Queue Size Exceeds the Threshold
ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold
ALM-19020 Number of HBase WAL Files to Be Synchronized Exceeds the Threshold
ALM-19021 Handler Usage of RegionServer Exceeds the Threshold
ALM-19022 HBase Hotspot Detection Is Unavailable
ALM-19023 Region Traffic Restriction for HBase
ALM-19024 RPC Requests P99 Latency on RegionServer Exceeds the Threshold
ALM-19025 Damaged StoreFile in HBase
ALM-19026 Damaged WAL Files in HBase
ALM-19030 P99 Latency of RegionServer RPC Request Exceeds the Threshold
ALM-19031 Number of RegionServer RPC Connections Exceeds the Threshold
ALM-19032 Number of Tasks in the RegionServer RPC Write Queue Exceeds the Threshold
ALM-19033 Number of Tasks in the RegionServer RPC Read Queue Exceeds the Threshold
ALM-19034 Number of RegionServer WAL Write Timeouts Exceeds the Threshold
ALM-19035 Size of the RegionServer Call Queue Exceeds the Threshold
ALM-19036 Bad Blocks Exist in HBase Key Directory Data
ALM-20002 Hue Service Unavailable
ALM-23001 Loader Service Unavailable
ALM-23003 Loader Task Execution Failure
ALM-23004 Loader Heap Memory Usage Exceeds the Threshold
ALM-23005 Loader Non-Heap Memory Usage Exceeds the Threshold
ALM-23006 Loader Direct Memory Usage Exceeds the Threshold
ALM-23007 Garbage Collection (GC) Time of the Loader Process Exceeds the Threshold
ALM-24000 Flume Service Unavailable
ALM-24001 Flume Agent Exception
ALM-24003 Flume Client Connection Interrupted
ALM-24004 Exception Occurs When Flume Reads Data
ALM-24005 Exception Occurs When Flume Transmits Data
ALM-24006 Heap Memory Usage of Flume Server Exceeds the Threshold
ALM-24007 Flume Server Direct Memory Usage Exceeds the Threshold
ALM-24008 Flume Server Non Heap Memory Usage Exceeds the Threshold
ALM-24009 Flume Server Garbage Collection (GC) Time Exceeds the Threshold
ALM-24010 Flume Certificate File Is Invalid or Damaged
ALM-24011 Flume Certificate File Is About to Expire
ALM-24012 Flume Certificate File Has Expired
ALM-24013 Flume MonitorServer Certificate File Is Invalid or Damaged
ALM-24014 Flume MonitorServer Certificate Is About to Expire
ALM-24015 Flume MonitorServer Certificate File Has Expired
ALM-25000 LdapServer Service Unavailable
ALM-25004 Abnormal LdapServer Data Synchronization
ALM-25005 nscd Service Exception
ALM-25006 Sssd Service Exception
ALM-25007 Number of SlapdServer Connections Exceeds the Threshold
ALM-25008 SlapdServer CPU Usage Exceeds the Threshold
ALM-25500 KrbServer Service Unavailable
ALM-25501 Too Many KerberosServer Requests
ALM-26051 Storm Service Unavailable
ALM-26052 Number of Available Supervisors of the Storm Service Is Less Than the Threshold
ALM-26053 Storm Slot Usage Exceeds the Threshold
ALM-26054 Nimbus Heap Memory Usage Exceeds the Threshold
ALM-27001 DBService Service Unavailable
ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
ALM-27004 Data Inconsistency Between Active and Standby DBServices
ALM-27005 Database Connections Usage Exceeds the Threshold
ALM-27006 Disk Space Usage of the Data Directory Exceeds the Threshold
ALM-27007 Database Enters the Read-Only Mode
ALM-29000 Impala Service Unavailable
ALM-29004 Impalad Process Memory Usage Exceeds the Threshold
ALM-29005 Number of JDBC Connections to Impalad Exceeds the Threshold
ALM-29006 Number of ODBC Connections to Impalad Exceeds the Threshold
ALM-29010 Number of Queries Being Submitted by Impalad Exceeds the Threshold
ALM-29011 Number of Queries Being Executed by Impalad Exceeds the Threshold
ALM-29012 Number of Queries Being Waited by Impalad Exceeds the Threshold
ALM-29013 Impalad FGC Time Exceeds the Threshold
ALM-29014 Catalog FGC Time Exceeds the Threshold
ALM-29015 Catalog Process Memory Usage Exceeds the Threshold
ALM-29016 Impalad Instance in the Sub-healthy State
ALM-29100 Kudu Service Unavailable
ALM-29104 Tserver Process Memory Usage Exceeds the Threshold
ALM-29106 Tserver Process CPU Usage Exceeds the Threshold
ALM-29107 Tserver Process Memory Usage Exceeds the Threshold
ALM-38000 Kafka Service Unavailable
ALM-38001 Insufficient Kafka Disk Capacity
ALM-38002 Kafka Heap Memory Usage Exceeds the Threshold
ALM-38004 Kafka Direct Memory Usage Exceeds the Threshold
ALM-38005 GC Duration of the Broker Process Exceeds the Threshold
ALM-38006 Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold
ALM-38007 Status of Kafka Default User Is Abnormal
ALM-38008 Abnormal Kafka Data Directory Status
ALM-38009 Busy Broker Disk I/Os (Applicable to Versions Later Than MRS 3.1.0)
ALM-38009 Kafka Topic Overload (Applicable to MRS 3.1.0 and Earlier Versions)
ALM-38010 Topics with Single Replica
ALM-38011 User Connection Usage on Broker Exceeds the Threshold
ALM-38012 Number of Broker Partitions Exceeds the Threshold
ALM-38013 Produce Request Latency in the Request Queue Exceeds the Threshold
ALM-38014 Total Produce Request Latency Exceeds the Threshold
ALM-38015 Fetch Request Latency in the Request Queue Exceeds the Threshold
ALM-38016 Total Fetch Request Latency Exceeds the Threshold
ALM-38017 Partition Reassignment Duration Exceeds the Threshold
ALM-38018 Kafka Consumer Lag
ALM-43001 Spark2x Service Unavailable
ALM-43006 Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold
ALM-43007 Non-Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold
ALM-43008 The Direct Memory Usage of the JobHistory2x Process Exceeds the Threshold
ALM-43009 JobHistory2x Process GC Time Exceeds the Threshold
ALM-43010 Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
ALM-43011 Non-Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
ALM-43012 Direct Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
ALM-43013 JDBCServer2x Process GC Time Exceeds the Threshold
ALM-43017 JDBCServer2x Process Full GC Number Exceeds the Threshold
ALM-43018 JobHistory2x Process Full GC Number Exceeds the Threshold
ALM-43019 Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold
ALM-43020 Non-Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold
ALM-43021 Direct Memory Usage of the IndexServer2x Process Exceeds the Threshold
ALM-43022 IndexServer2x Process GC Time Exceeds the Threshold
ALM-43023 IndexServer2x Process Full GC Number Exceeds the Threshold
ALM-43028 JDBCServer Session Overflow
ALM-43029 JDBCServer Job Submission Timed Out
ALM-44000 Presto Service Unavailable
ALM-44004 Presto Coordinator Resource Group Queuing Tasks Exceed the Threshold
ALM-44005 Presto Coordinator Process GC Time Exceeds the Threshold
ALM-44006 Presto Worker Process GC Time Exceeds the Threshold
ALM-45000 HetuEngine Service Unavailable
ALM-45001 Faulty HetuEngine Compute Instances
ALM-45003 HetuEngine QAS Disk Capacity Is Insufficient
ALM-45004 Tasks Stacked on HetuEngine Compute Instance
ALM-45005 CPU Usage of HetuEngine Compute Instance Exceeded the Threshold
ALM-45006 Memory Usage of a HetuEngine Compute Instance Exceeded the Threshold
ALM-45007 Number of Workers of a HetuEngine Compute Instance Is Less Than the Threshold
ALM-45008 Query Latency of HetuEngine Compute Instances Exceeds the Threshold
ALM-45009 Task Failure Rate of HetuEngine Compute Instances Exceeds the Threshold
ALM-45175 Average Time for Calling OBS Metadata APIs Is Greater than the Threshold
ALM-45176 Success Rate of Calling OBS Metadata APIs Is Lower than the Threshold
ALM-45177 Success Rate of Calling OBS Data Read APIs Is Lower than the Threshold
ALM-45178 Success Rate of Calling OBS Data Write APIs Is Lower Than the Threshold
ALM-45179 Number of Failed OBS readFully API Calls Exceeds the Threshold
ALM-45180 Number of Failed OBS read API Calls Exceeds the Threshold
ALM-45181 Number of Failed OBS write API Calls Exceeds the Threshold
ALM-45182 Number of Throttled OBS Operations Exceeds the Threshold
ALM-45275 Ranger Service Unavailable
ALM-45276 Abnormal RangerAdmin Status
ALM-45277 RangerAdmin Heap Memory Usage Exceeds the Threshold
ALM-45278 RangerAdmin Direct Memory Usage Exceeds the Threshold
ALM-45279 RangerAdmin Non Heap Memory Usage Exceeds the Threshold
ALM-45280 RangerAdmin GC Duration Exceeds the Threshold
ALM-45281 UserSync Heap Memory Usage Exceeds the Threshold
ALM-45282 UserSync Direct Memory Usage Exceeds the Threshold
ALM-45283 UserSync Non Heap Memory Usage Exceeds the Threshold
ALM-45284 UserSync Garbage Collection (GC) Time Exceeds the Threshold
ALM-45285 TagSync Heap Memory Usage Exceeds the Threshold
ALM-45286 TagSync Direct Memory Usage Exceeds the Threshold
ALM-45287 TagSync Non Heap Memory Usage Exceeds the Threshold
ALM-45288 TagSync Garbage Collection (GC) Time Exceeds the Threshold
ALM-45289 PolicySync Heap Memory Usage Exceeds the Threshold
ALM-45290 PolicySync Direct Memory Usage Exceeds the Threshold
ALM-45291 PolicySync Non-Heap Memory Usage Exceeds the Threshold
ALM-45292 PolicySync GC Duration Exceeds the Threshold
ALM-45293 Ranger User Synchronization Exception
ALM-45294 RangerKMS Process Is Abnormal
ALM-45325 Presto Service Unavailable
ALM-45326 Number of Presto Coordinator Threads Exceeds the Threshold
ALM-45327 Presto Coordinator Process GC Time Exceeds the Threshold
ALM-45328 Presto Worker Process GC Time Exceeds the Threshold
ALM-45329 Presto Coordinator Resource Group Queuing Tasks Exceed the Threshold
ALM-45330 Number of Presto Worker Threads Exceeds the Threshold
ALM-45331 Number of Presto Worker1 Threads Exceeds the Threshold
ALM-45332 Number of Presto Worker2 Threads Exceeds the Threshold
ALM-45333 Number of Presto Worker3 Threads Exceeds the Threshold
ALM-45334 Number of Presto Worker4 Threads Exceeds the Threshold
ALM-45335 Presto Worker1 Process GC Time Exceeds the Threshold
ALM-45336 Presto Worker2 Process GC Time Exceeds the Threshold
ALM-45337 Presto Worker3 Process GC Time Exceeds the Threshold
ALM-45338 Presto Worker4 Process GC Time Exceeds the Threshold
ALM-45425 ClickHouse Service Unavailable
ALM-45426 ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold
ALM-45427 ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold
ALM-45428 ClickHouse Disk I/O Exception
ALM-45429 Table Metadata Synchronization Failed on the Added ClickHouse Node
ALM-45430 Permission Metadata Synchronization Failed on the Added ClickHouse Node
ALM-45431 Improper ClickHouse Instance Distribution for Topology Allocation
ALM-45432 ClickHouse User Synchronization Process Fails
ALM-45433 ClickHouse AZ Topology Exception
ALM-45434 A Single Replica Exists in the ClickHouse Data Table
ALM-45435 Inconsistent Metadata of ClickHouse Tables
ALM-45436 Skew ClickHouse Table Data
ALM-45437 Excessive Parts in the ClickHouse Table
ALM-45438 ClickHouse Disk Usage Exceeds 80%
ALM-45439 ClickHouse Node Enters the Read-Only Mode
ALM-45440 Inconsistency Between ClickHouse Replicas
ALM-45441 Zookeeper Disconnected
ALM-45442 Too Many Concurrent SQL Statements
ALM-45443 Slow SQL Queries in the Cluster
ALM-45444 Abnormal ClickHouse Process
ALM-45445 Failed to Send Data Files to Remote Shards When ClickHouse Writes Data to a Distributed Table
ALM-45446 Mutation Task of ClickHouse Is Not Complete for a Long Time
ALM-45447 ClickHouse Table Read-Only
ALM-45448 Rapid Increase of Znodes Used by ClickHouse
ALM-45449 The Counter Number of zxid Used by ClickHouse Exceeds the Threshold
ALM-45450 ClickHouse Failed to Obtain a Temporary Agency Credential
ALM-45451 ClickHouse Failed to Access OBS
ALM-45452 ClickHouse's Local Disk Space Is Below the Cold-Hot Separation Threshold
ALM-45585 IoTDB Service Unavailable
ALM-45586 IoTDBServer Heap Memory Usage Exceeds the Threshold
ALM-45587 IoTDBServer GC Duration Exceeds the Threshold
ALM-45588 IoTDBServer Direct Memory Usage Exceeds the Threshold
ALM-45589 ConfigNode Heap Memory Usage Exceeds the Threshold
ALM-45590 ConfigNode GC Duration Exceeds the Threshold
ALM-45591 ConfigNode Direct Memory Usage Exceeds the Threshold
ALM-45592 IoTDBServer RPC Execution Duration Exceeds the Threshold
ALM-45593 IoTDBServer Flush Execution Duration Exceeds the Threshold
ALM-45594 IoTDBServer Intra-Space Merge Duration Exceeds the Threshold
ALM-45595 IoTDBServer Cross-Space Merge Duration Exceeds the Threshold
ALM-45596 Procedure Execution Failed
ALM-45615 CDL Service Unavailable
ALM-45616 CDL Job Execution Exception
ALM-45617 Data Queued in the CDL Replication Slot Exceeds the Threshold
ALM-45635 FlinkServer Job Execution Failure
ALM-45636 Flink Job Checkpoints Keep Failing
ALM-45636 Number of Consecutive Checkpoint Failures of a Flink Job Exceeds the Threshold
ALM-45637 FlinkServer Task Is Continuously Under Back Pressure
ALM-45638 Number of Restarts After FlinkServer Job Failures Exceeds the Threshold
ALM-45638 Number of Restarts After Flink Job Failures Exceeds the Threshold
ALM-45639 Checkpointing of a Flink Job Times Out
ALM-45640 FlinkServer Heartbeat Interruption Between the Active and Standby Nodes
ALM-45641 Data Synchronization Exception Between the Active and Standby FlinkServer Nodes
ALM-45642 RocksDB Continuously Triggers Write Traffic Limiting
ALM-45643 MemTable Size of RocksDB Continuously Exceeds the Threshold
ALM-45644 Number of SST Files at Level 0 of RocksDB Continuously Exceeds the Threshold
ALM-45645 Pending Flush Size of RocksDB Continuously Exceeds the Threshold
ALM-45646 Pending Compaction Size of RocksDB Continuously Exceeds the Threshold
ALM-45647 Estimated Pending Compaction Size of RocksDB Continuously Exceeds the Threshold
ALM-45648 RocksDB Frequently Encounters Write-Stopped
ALM-45649 P95 Latency of RocksDB Get Requests Continuously Exceeds the Threshold
ALM-45650 P95 Latency of RocksDB Write Requests Continuously Exceeds the Threshold
ALM-45652 Flink Service Unavailable
ALM-45653 Invalid Flink HA Certificate File
ALM-45654 Flink HA Certificate Is About to Expire
ALM-45655 Flink HA Certificate File Has Expired
ALM-45736 Guardian Service Unavailable
ALM-45737 TokenServer Heap Memory Usage Exceeds the Threshold
ALM-45738 TokenServer Direct Memory Usage Exceeds the Threshold
ALM-45739 TokenServer Non-Heap Memory Usage Exceeds the Threshold
ALM-45740 TokenServer GC Duration Exceeds the Threshold
ALM-45741 Failed to Call the ECS securitykey API
ALM-45742 Failed to Call the ECS Metadata API
ALM-45743 Failed to Call the IAM API
ALM-45744 Average RPC Processing Time of the Guardian TokenServer Exceeds the Threshold
ALM-45745 Average RPC Queuing Time of the Guardian TokenServer Exceeds the Threshold
ALM-47001 MemArtsCC Service Unavailable
ALM-47002 MemArtsCC Disk Fault
ALM-47003 Memory Usage of the MemArtsCC Worker Process Exceeds the Threshold
ALM-47004 Average Latency of MemArtsCC Worker Read Requests Exceeds the Threshold
ALM-50201 Doris Service Unavailable
ALM-50202 FE CPU Usage Exceeds the Threshold
ALM-50203 FE Memory Usage Exceeds the Threshold
ALM-50205 BE CPU Usage Exceeds the Threshold
ALM-50206 BE Memory Usage Exceeds the Threshold
ALM-50207 Ratio of Connections to the FE MySQL Port to the Maximum Connections Allowed Exceeds the Threshold
ALM-50208 Failures to Clear Historical Metadata Image Files Exceed the Threshold
ALM-50209 Failures to Generate Metadata Image Files Exceed the Threshold
ALM-50210 Maximum Compaction Score of All BE Nodes Exceeds the Threshold
ALM-50211 FE Queue Length of BE Periodic Report Tasks Exceeds the Threshold
ALM-50212 Accumulated Old-Generation GC Duration of the FE Process Exceeds the Threshold
ALM-50213 Number of Tasks Queuing in the FE Thread Pool for Interacting with BE Exceeds the Threshold
ALM-50214 Number of Tasks Queuing in the FE Thread Pool for Task Processing Exceeds the Threshold
ALM-50215 Longest Duration of RPC Requests Received by Each FE Thrift Method Exceeds the Threshold
ALM-50216 Memory Usage of the FE Node Exceeds the Threshold
ALM-50217 Heap Memory Usage of the FE Node Exceeds the Threshold
ALM-50219 Length of the Queue in the Thread Pool for Query Execution Exceeds the Threshold
ALM-50220 Error Rate of TCP Packet Receiving Exceeds the Threshold
ALM-50221 BE Data Disk Usage Exceeds the Threshold
ALM-50222 Disk Status of a Specified Data Directory on BE Is Abnormal
ALM-50223 Maximum Memory Required by BE Is Greater Than the Remaining Memory of the Machine
ALM-50224 Failures a Certain Task Type on BE Are Increasing
ALM-50225 FE Instance Fault
ALM-50226 BE Instance Fault
ALM-50227 Concurrent Doris Tenant Queries Exceeds the Threshold
ALM-50228 Memory Usage of a Doris Tenant Exceeds the Threshold
ALM-50229 Doris FE Failed to Connect to OBS
ALM-50230 Doris BE Cannot Connect to OBS
ALM-50231 Abnormal Tablets Exist in Doris
ALM-50232 Large Tablets in Doris
ALM-50401 Number of JobServer Jobs Waiting to Be Executed Exceeds the Threshold
ALM-50402 JobGateway Service Unavailable
ALM-12001 Audit Log Dump Failure (For MRS 2.x or Earlier)
ALM-12002 HA Resource Abnormal (For MRS 2.x or Earlier)
ALM-12004 OLdap Resource Abnormal (For MRS 2.x or Earlier)
ALM-12005 OKerberos Resource Abnormal (For MRS 2.x or Earlier)
ALM-12006 Node Fault (For MRS 2.x or Earlier)
ALM-12007 Process Fault (For MRS 2.x or Earlier)
ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes (For MRS 2.x or Earlier)
ALM-12011 Data Synchronization Exception Between the Active and Standby Manager Nodes (For MRS 2.x or Earlier)
ALM-12012 NTP Service Abnormal (For MRS 2.x or Earlier)
ALM-12014 Device Partition Lost (For MRS 2.x or Earlier)
ALM-12015 Device Partition File System Read-Only (For MRS 2.x or Earlier)
ALM-12016 CPU Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12017 Insufficient Disk Capacity (For MRS 2.x or Earlier)
ALM-12018 Memory Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12027 Host PID Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12028 Number of Processes in the D State on the Host Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12031 User omm or Password Is About to Expire (For MRS 2.x or Earlier)
ALM-12032 User ommdba or Password Is About to Expire (For MRS 2.x or Earlier)
ALM-12033 Slow Disk Fault (For MRS 2.x or Earlier)
ALM-12034 Periodic Backup Failure (For MRS 2.x or Earlier)
ALM-12035 Unknown Data Status After Recovery Task Failure (For MRS 2.x or Earlier)
ALM-12037 NTP Server Abnormal (For MRS 2.x or Earlier)
ALM-12038 Monitoring Indicator Dump Failure (For MRS 2.x or Earlier)
ALM-12039 GaussDB Data Is Not Synchronized (For MRS 2.x or Earlier)
ALM-12040 Insufficient System Entropy (For MRS 2.x or Earlier)
ALM-12041 Permission of Key Files Is Abnormal (For MRS 2.x or Earlier)
ALM-12042 Key File Configurations Are Abnormal (For MRS 2.x or Earlier)
ALM-12043 DNS Parsing Duration Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12045 Read Packet Dropped Rate Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12046 Write Packet Dropped Rate Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12047 Read Packet Error Rate Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12048 Write Packet Error Rate Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12049 Read Throughput Rate Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12050 Write Throughput Rate Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12051 Disk Inode Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12052 Usage of Temporary TCP Ports Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12053 File Handle Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-12054 Invalid Certificate File (For MRS 2.x or Earlier)
ALM-12055 Certificate File Is About to Expire (For MRS 2.x or Earlier)
ALM-12180 Disk Card I/O (For MRS 2.x or Earlier)
ALM-12357 Failed to Export Audit Logs to OBS (For MRS 2.x or Earlier)
ALM-13000 ZooKeeper Service Unavailable (For MRS 2.x or Earlier)
ALM-13001 Available ZooKeeper Connections Are Insufficient (For MRS 2.x or Earlier)
ALM-13002 ZooKeeper Memory Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14000 HDFS Service Unavailable (For MRS 2.x or Earlier)
ALM-14001 HDFS Disk Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14002 DataNode Disk Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14004 Number of Damaged HDFS Blocks Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14006 Number of HDFS Files Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14007 HDFS NameNode Memory Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14008 HDFS DataNode Memory Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14009 Number of Faulty DataNodes Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-14010 NameService Is Abnormal (For MRS 2.x or Earlier)
ALM-14011 HDFS DataNode Data Directory Is Not Configured Properly (For MRS 2.x or Earlier)
ALM-14012 HDFS Journalnode Data Is Not Synchronized (For MRS 2.x or Earlier)
ALM-16000 Percentage of Sessions Connected to the HiveServer to the Maximum Number Allowed Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold (For MRS 2.x or Earlier)
ALM-16004 Hive Service Unavailable (For MRS 2.x or Earlier)
ALM-16005 Number of Failed Hive SQL Executions in the Last Period Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-18000 Yarn Service Unavailable (For MRS 2.x or Earlier)
ALM-18002 NodeManager Heartbeat Lost (For MRS 2.x or Earlier)
ALM-18003 NodeManager Unhealthy (For MRS 2.x or Earlier)
ALM-18004 NodeManager Disk Usability Ratio Is Lower Than the Threshold (For MRS 2.x or Earlier)
ALM-18006 MapReduce Job Execution Timeout (For MRS 2.x or Earlier)
ALM-18008 Heap Memory Usage of Yarn ResourceManager Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-18009 Heap Memory Usage of MapReduce JobHistoryServer Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-18010 Number of Pending Yarn Tasks Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-18011 Memory of Pending Yarn Tasks Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-18012 Number of Terminated Yarn Tasks in the Last Period Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-18013 Number of Failed Yarn Tasks in the Last Period Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-19000 HBase Service Unavailable (For MRS 2.x or Earlier)
ALM-19006 HBase Replication Sync Failed (For MRS 2.x or Earlier)
ALM-19007 HBase Merge Queue Exceeds the Threshold (for 2.x and Earlier Versions)
ALM-20002 Hue Service Unavailable (For MRS 2.x or Earlier)
ALM-23001 Loader Service Unavailable (For MRS 2.x or Earlier)
ALM-24000 Flume Service Unavailable (For MRS 2.x or Earlier)
ALM-24001 Flume Agent Is Abnormal (For MRS 2.x or Earlier)
ALM-24003 Flume Client Connection Interrupted (For MRS 2.x or Earlier)
ALM-24004 Flume Fails to Read Data (For MRS 2.x or Earlier)
ALM-24005 Data Transmission by Flume Is Abnormal (For MRS 2.x or Earlier)
ALM-25000 LdapServer Service Unavailable (For MRS 2.x or Earlier)
ALM-25004 Abnormal LdapServer Data Synchronization (For MRS 2.x or Earlier)
ALM-25500 KrbServer Service Unavailable (For MRS 2.x or Earlier)
ALM-26051 Storm Service Unavailable (For MRS 2.x or Earlier)
ALM-26052 Number of Available Supervisors in Storm Is Lower Than the Threshold (For MRS 2.x or Earlier)
ALM-26053 Slot Usage of Storm Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-26054 Heap Memory Usage of Storm Nimbus Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-27001 DBService Unavailable (For MRS 2.x or Earlier)
ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes (For MRS 2.x or Earlier)
ALM-27004 Data Inconsistency Between Active and Standby DBServices (For MRS 2.x or Earlier)
ALM-28001 Spark Service Unavailable (For MRS 2.x or Earlier)
ALM-38000 Kafka Service Unavailable (For MRS 2.x or Earlier)
ALM-38001 Insufficient Kafka Disk Capacity (For MRS 2.x or Earlier)
ALM-38002 Heap Memory Usage of Kafka Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43001 Spark Service Unavailable (For MRS 2.x or Earlier)
ALM-43006 Heap Memory Usage of the JobHistory Process Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43007 Non-Heap Memory Usage of the JobHistory Process Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43008 Direct Memory Usage of the JobHistory Process Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43009 JobHistory GC Time Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43010 Heap Memory Usage of the JDBCServer Process Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43011 Non-Heap Memory Usage of the JDBCServer Process Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43012 Direct Memory Usage of the JDBCServer Process Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-43013 JDBCServer GC Time Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-44004 Presto Coordinator Resource Group Queuing Tasks Exceed the Threshold (For MRS 2.x or Earlier)
ALM-44005 Presto Coordinator Process GC Time Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-44006 Presto Worker Process GC Time Exceeds the Threshold (For MRS 2.x or Earlier)
ALM-45325 Presto Service Unavailable (For MRS 2.x or Earlier)
Configuring Remote O&M for an MRS Cluster
Common Ports for MRS Cluster Services
Configuring Storage-Compute Decoupling for an MRS Cluster
Configuration Process
Interconnecting an MRS Cluster with OBS Using an IAM Agency
Interconnecting an MRS Cluster with OBS Using an IAM Agency
Configuring the Policy for Clearing Recycle Bin Directories of MRS Cluster Components
Example for Interconnecting a Cluster Service with OBS
Interconnecting Flink with OBS Using an IAM Agency
Interconnecting Flume with OBS Using an IAM Agency
Interconnecting HDFS with OBS Using an IAM Agency
Interconnecting Hive with OBS Using an IAM Agency
Interconnecting Hudi with OBS Using an IAM Agency
Interconnecting MapReduce with OBS Using an IAM Agency
Interconnecting Presto with OBS Using an IAM Agency
Interconnecting Spark with OBS Using an IAM Agency
Interconnecting Sqoop with OBS Using an IAM Agency
Configuring Fine-Grained OBS Access Permissions for MRS Cluster Users
FAQ About Decoupled Storage and Compute
How Do I Read Encrypted OBS Data When Running an MRS Job?
Example Application Development for Interconnecting HDFS with OBS
How Do I Connect an MRS Cluster Client to OBS Using an AK/SK Pair?
How Do I Access OBS Using an MRS Client Installed Outside a Cluster?
Accessing an MRS Cluster's Manager (Version 2.x or Earlier)
How Do I Handle Abnormal Status of Core Nodes in an MRS Cluster After Successful Expansion?
Component Operation Guide (LTS)
Using CarbonData
CarbonData Data Types
CarbonData Table User Permissions
Creating a CarbonData Table Using the Spark Client
CarbonData Data Analytics
Creating a CarbonData Table
Deleting a CarbonData Table
Modifying a CarbonData Table
Loading Data to a CarbonData Table
Deleting Segments of a CarbonData Table
Compacting CarbonData Table Segments
CarbonData Performance Tuning
Tuning Approach
Typical Performance Tuning Parameters
Creating a CarbonData Table with High Query Performance
Typical CarbonData Configuration Parameters
CarbonData Syntax Reference
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Concurrent CarbonData Table Operations
CarbonData Segment API
CarbonData Tablespace Index
Common Issues About CarbonData
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why Data Load Performance Decreases due to Bad Records?
Why Data loading Fails During off heap?
Why Do I Fail to Create a Hive Table?
How Do I Logically Split Data Across Different Namespaces?
Why Does the Missing Privileges Exception Occur When the Database Is Dropped?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Does CarbonData Become Abnormal After the Disk Space Quota of the HDFS Storage Directory Is Set?
Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
How Do I Restore the Latest tablestatus File That Has Been Lost or Damaged When TableStatus Versioning Is Enabled?
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deteriorated Due to Insufficient Executor Memory
Data Query or Loading Failed, and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Was Reported
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Using CDL
Integrating CDL Data
CDL User Permission Management
Creating a Data Synchronization Job with CDL
Preparing for Creating a CDL Job
Enabling Kafka High Reliability
Logging In to the CDLService WebUI
Uploading the Database Driver File
Creating a CDL Database Connection
Configuring CDL ENV Variables
Configuring the Source Data Heartbeat Table for Data Integrity Check
Creating a CDL Job
Creating a CDL Data Synchronization Job
Creating a CDL Data Comparison Job
Synchronizing Data from PgSQL to Kafka Using CDL
Synchronizing Data from PgSQL to Hudi Using CDL
Synchronizing Data from openGauss to Hudi Using CDL
Synchronizing Data from Hudi to DWS Using CDL
Synchronizing Data from Hudi to ClickHouse Using CDL
Synchronizing openGauss Data to Hudi Using CDL (ThirdKafka)
Synchronizing drs-oracle-json Database to Hudi Using CDL (ThirdKafka)
Synchronizing drs-oracle-avro Database to Hudi Using CDL (ThirdKafka)
CDL Job DDL Changes
CDL Log Overview
Common Issues About CDL
Hudi Does Not Receive Data After a CDL Job Is Executed
How Do I Capture Data from a Specified Location When a MySQL Link Task Is Started?
Why Can a User Still Perform Operations on the Tasks Created by Itself After All Permissions of the User Are Deleted from Ranger?
CDL Troubleshooting
Error 403 Is Reported When a CDL Job Is Stopped
Error 104 or 143 Is Reported After a CDL Job Runs for a Period of Time
Why Is the Value of Task configured for the OGG Source Different from the Actual Number of Running Tasks When Data Is Synchronized from OGG to Hudi?
Why Are There Too Many Topic Partitions Corresponding to the CDL Synchronization Task Names?
What Should I When a CDL Task Is Executed to Synchronize Data to the Hudi, an Error Message Indicating that the Current User Does Not Have the Permission to Create Tables?
Error Is Reported When the Job of Capturing Data From PgSQL to Hudi Is Started
Using ClickHouse
ClickHouse Overview
ClickHouse User Permission Management
ClickHouse User Rights
Creating a ClickHouse Role
Configuring Interconnection Between ClickHouse and OpenLDAP Authentication System
ClickHouse Client Practices
ClickHouse Data Import
Interconnecting ClickHouse with RDS for MySQL
Interconnecting ClickHouse with OBS
Interconnecting ClickHouse with HDFS
Configuring Interconnection Between ClickHouse and Kafka
Connecting ClickHouse to the Kafka Using the Username and Password
Interconnecting ClickHouse with Kafka Through Kerberos Authentication
Connecting ClickHouse to Kafka in Normal Mode
Synchronizing Kafka Data to ClickHouse
Importing DWS Table Data to ClickHouse
Importing ClickHouse Data in Batches
Using ClickHouse to Import and Export Data
Enterprise-Class Enhancements of ClickHouse
ClickHouse Multi-Tenancy
Overview
Configuring CPU Priority for ClickHouse Tenants
Creating a ClickHouse Tenant
Modifying the Memory Limit of ClickHouse on a ClickHouseServer Node
Checking Slow SQL Statements in ClickHouse
Checking Monitoring Metrics of ClickHouse Replication Table Data Synchronization
Configuring Strong Data Consistency Between ClickHouse Replicas
Configuring the Support for Transactions on ClickHouse
Accessing ClickHouse Through ELB
ClickHouse Performance Tuning
Optimizing ClickHouse Table Partitioning
Accelerating ClickHouse Merge
Accelerating ClickHouse TTL Operations
ClickHouse O&M Management
ClickHouse Log Overview
Collecting Dumping Logs of the ClickHouse System Tables
Enabling the Read-Only Mode for ClickHouse Tables
Migrating Data Between ClickHouseServer Nodes in a Cluster
Migrating ClickHouse Data from One MRS Cluster to Another
Expanding the Disk Capacity of the ClickHouse Node
Backing Up and Restoring ClickHouse Data Using a Data File
Configuring the Default ClickHouse User Password (MRS 3.1.2-LTS)
Configuring the Default ClickHouse User Passwords (MRS 3.3.0-LTS)
Clearing the Passwords of Default ClickHouse Users
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
DELETE: Lightweight Deleting Table Data
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Structure
ALTER TABLE: Modifying Table Data
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
UPSERT: Writing Data
Common Issues About ClickHouse
How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
How Do I Migrate Data from Hive/HDFS to ClickHouse?
How Do I Migrate Data from OBS/S3 to ClickHouse?
An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data
How Do I Grant the Select Permission at the Database Level to ClickHouse Users?
Using DBService
Configuring SSL for the HA Module
Restoring SSL for the HA Module
Configuring the Timeout Interval of DBService Backup Tasks
DBService Log Overview
Using Doris
Overview of the Doris Data Model
Managing Doris User Permissions
About Doris User Permissions
Creating a Doris Permission Role
Using the MySQL Client to Connect to Doris
Getting Started with Doris
Importing Doris Data
Importing Data to Doris with Broker Load
Importing Data to Doris with Stream Load
Analyzing Doris Data
Exporting Doris Data to HDFS
Exporting the Query Result Set
Enterprise-Class Enhancements of Doris
Configuring HA for the Doris Cluster
About Doris Cluster HA
Configuring Access to the Doris Cluster Through ELB
Configuring Multi-Source Data for Doris
About Doris Multi-Source Data
Interconnecting Doris with the Hive Data Source
Doris O&M Management
Doris Log Overview
Accessing the Doris Web UI for Component Status
Backing Up Doris Data
Restoring Doris Data
Typical SQL Syntax of Doris
CREATE DATABASE
CREATE TABLE
INSERT INTO
ALTER TABLE
DROP TABLE
Common Issues About Doris
What Should I Do If Occasionally Occurs During Table Creation Due to the Configuration of the SSD and HDD Data Directories?
What Should I Do If RPC Timeout Error Is Reported When Stream Load Is Used?
What Do I Do If the Error Message "plugin not enabled" Is Displayed When the MySQL Client Is Used to Connect to the Doris Database?
How Do I Handle the FE Startup Failure?
How Do I Handle the Startup Failure Due to Incorrect IP Address Matching for the BE Instance?
What Should I Do If Error Message "Read timed out" Is Displayed When the MySQL Client Connects to the Doris?
What Should I Do If an Error Is Reported When the BE Runs a Data Import or Query Task?
What Should I Do If a Timeout Error Is Reported When Broker Load Imports Data?
What Should I Do If an Error Message Is Displayed When Broker Load Is Used to Import Data?
Doris Troubleshooting
What Should I Do If a Query Is Performed on the BE Node Where Some Copies Are Lost or Damaged and an Error Is Reported?
How Do I Restore the FE Service from a Fault?
What Should I Do If the Data Volume of a Broker Load Import Task Exceeds the Threshold?
Using Flink
Flink Job Engine
Flink User Permission Management
Flink Security Authentication
Flink User Permissions
Creating a FlinkServer Role
Configuring Security Authentication for Interconnecting with Kafka
Configuring Flink Authentication and Encryption
Using the Flink Client
Preparing for Creating a FlinkServer Job
Accessing the FlinkServer Web UI
Creating a FlinkServer Application
Creating a FlinkServer Cluster Connection
Creating a FlinkServer Data Connection
Creating a FlinkServer Stream Table Source
Creating a FlinkServer Job
Creating a FlinkServer Job and Writing Data to a ClickHouse Table
Creating a FlinkServer Job to Interconnect with a GaussDB(DWS) Table
Creating a FlinkServer Job to Write Data to an HBase Table
Creating a FlinkServer Job to Write Data to an HDFS
Creating a FlinkServer Job to Write Data to a Hive Table
Creating a FlinkServer Job to Write Data to a Hudi Table
Creating a FlinkServer Job to Write Data to a Kafka Message Queue
Managing FlinkServer Jobs
Viewing the Health Status of FlinkServer Jobs
Importing and Exporting FlinkServer Job Information
Configuring Automatic Clearing of FlinkServer Job Residuals
Configuring the FlinkServer Job Restart Policy
Adding Third-Party Dependency JAR Packages to a FlinkServer Job
Using UDFs in FlinkServer Jobs
Enterprise-Class Enhancements of Flink
Flink SQL Syntax Enhancement
Table-Level TTL for Stream Joins
Configuring Flink SQL Client to Support SQL Verification
Enhancing the Joins of Large and Small Tables in Flink Jobs
Flink O&M Management
Typical Flink Configuration Parameters
Flink Log Overview
Flink Performance Tuning
Flink Memory GC Optimization Parameters
Flink Job Concurrency
Flink Job Process Parameters
Flink Netty Network Communication Parameters
RocksDB State Backend of Flink Jobs
Separate Storage of Cold and Hot Data for Flink Job State Backend
Typical Commands of the Flink Client
Common Flink SQL Syntax
Common Issues About Flink
Flink Troubleshooting
Using Flume
Flume Log Collection Overview
Flume Service Model Configuration
Installing the Flume Client
Quickly Using Flume to Collect Node Logs
Configuring a Non-Encrypted Flume Data Collection Task
Generating Configuration Files for the Flume Server and Client
Using Flume Server to Collect Static Logs from Local Host to Kafka
Using Flume Server to Collect Static Logs from Local Host to HDFS
Using Flume Server to Collect Dynamic Logs from Local Host to HDFS
Using Flume Server to Collect Logs from Kafka to HDFS
Using Flume Client to Collect Logs from Kafka to HDFS
Using Cascaded Agents to Collect Static Logs from Local Host to HBase
Configuring an Encrypted Flume Data Collection Task
Using Cascaded Agents to Collect Static Logs from Local Host to HDFS
Enterprise-Class Enhancements of Flume
Using the Encryption Tool of the Flume Client
Configuring Flume to Connect to Kafka in Security Mode
Flume O&M Management
Flume Common Configuration Parameters
Flume Log Overview
Viewing Flume Client Logs
Viewing Flume Client Monitoring Information
Stopping or Uninstalling the Flume Client
Common Issues About Flume
How Do I View Flume Logs
How Do I Use Environment Variables in the Flume Configuration File
How Do I Develop a Third-Party Flume Plug-in
How Do I Configure a Custom Flume Script
Using HBase
Creating an HBase Permission Role
Using the HBase Client
Using HBase for Offline Data Analysis
Migrating Data to HBase Using BulkLoad
HBase Data Operations
Creating HBase Indexes for Data Query
Configuring the HBase Data Compression and Encoding Formats
Enterprise-Class Enhancements of HBase
Configuring HBase Global Secondary Indexes for Faster Queries
Introduction to HBase Global Secondary Indexes
Creating an HBase Global Secondary Index
Querying an HBase Global Secondary Index
Changing Status of HBase Global Secondary Indexes
Creating HBase Global Secondary Indexes in Batches
Checking HBase Global Secondary Index Data Consistency
Querying HBase Table Data with Global Secondary Indexes
Configuring HBase Local Secondary Indexes for Faster Queries
Introduction to HBase Local Secondary Indexes
Loading Index Data in Batches and Generating Local Secondary Indexes
Using TableIndexer to Generate a Local HBase Secondary Index
Improving HBase BulkLoad Data Migration
Importing HBase Data in Batches Using BulkLoad
Updating HBase Data in Batches Using BulkLoad
Deleting HBase Data in Batches Using BulkLoad
Counting Rows in an HBase Table Using BulkLoad
BulkLoad Configuration File
Configuring BulkLoad to Parse Customized Separators
Configuring Hot-Cold Data Separate in HBase
Configuring Separate Storage for HBase Cold and Hot Data
Cold-Hot Separation Commands
Configuring RSGroup to Manage RegionServer Resource
Checking Slow and Oversized HBase Requests
HBase Performance Tuning
Improving the Batch Loading Efficiency of HBase BulkLoad
Improving HBase Continuous Put Performance
Improving HBase Put and Scan Performance
Improving HBase Real-Time Write Efficiency
Improving HBase Real-Time Read Efficiency
Accelerating HBase Compaction During Off-Peak Hours
Tuning HBase JVM Parameters
HBase O&M Management
HBase Log Overview
Configuring Region In Transition Recovery Chore Service
Enabling Inter-Cluster Copy to Back Up Data
Configuring Automatic Data Backup for Active and Standby HBase Clusters
Configuring HBase Cluster HA and DR
Configuring HBase Active/Standby DR
Switching Between Active and Standby HBase Clusters
Configuring HBase Standby Cluster Information for Switchover
Common Issues About HBase
Operation Failures Occur in Stopping BulkLoad On the Client
How Do I Restore a Region in the RIT State for a Long Time?
Why Does HMaster Exits Due to Timeout When Waiting for the NameSpace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
Insufficient Rights When Accessing Phoenix
Insufficient Rights When Useing the HBase Bulkload Function
How Do I Fix Region Overlapping?
Restrictions on using the Phoenix BulkLoad Tool
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
Introduction to HBase Global Secondary Index APIs
HBase Troubleshooting
Why Does a Client Keep Failing to Connect to a Server for a Long Time?
Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
Why Does the HBase BulkLoad Task Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
How Do I Delete Residual Table Names in the table-lock Directory of ZooKeeper?
Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
HMaster Fails to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata
Why Messages Containing FileNotFoundException Frequently Displayed in the HMaster Logs?
Why Does the ImportTsv Tool Display "Permission denied"
Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
HBase Fails to Recover a Task
Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed?
Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
How Do I View Regions in the CLOSED State in an ENABLED Table?
How Can I Quickly Recover the Service When HBase Files Are Damaged Due to a Cluster Power-Off?
Using HDFS
Overview of HDFS File System Directories
HDFS User Permission Management
Creating an HDFS Role
Configuring HDFS Directory Permission
Using the HDFS Client
Using Hadoop from Scratch
Configuring the Recycle Bin Mechanism
Configuring HDFS DataNode Data Balancing
Configuring HDFS DiskBalancer
Configuring HDFS Mover
Configuring HDFS NodeLabel
Configuring Memory Management
Configuring ulimit for HBase and HDFS
Configuring the Number of Files in a Single HDFS Directory
Enterprise-Class Enhancements of HDFS
Configuring the HDFS Quick File Close Function
Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring the Observer NameNode to Process Read Requests
Configuring the NameNode Blacklist
Configuring Encrypted Channels
HDFS Performance Tuning
Improving Write Performance
Improving Read Performance Using Client Metadata Cache
Improving the Connection Between the Client and NameNode Using Current Active Cache
Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Performing Concurrent Operations on HDFS Files
Configuring LZC Compression
HDFS O&M Management
Configuring HDFS Parameters
Introduction to HDFS Logs
Planning HDFS Capacity
Changing the DataNode Storage Directory
Configuring the Damaged Disk Volume
Setting the Maximum Lifetime and Renewal Interval of a Token
Running the DistCp Command
Configuring NFS
Common commands of the HDFS client
Common Issues About HDFS
Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
HDFS WebUI Cannot Properly Update Information About Damaged Data
The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
DataNode Is Normal but Cannot Report Data Blocks
Can I Delete or Modify the Data Storage Directory in DataNode?
Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
Blocks Miss on the NameNode UI After the Successful Rollback
HDFS Troubleshooting
Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
NameNode Startup Is Slow
NameNode Fails to Be Restarted Due to EditLog Discontinuity
Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
Why Does Array Border-crossing Occur During FileInputFormat Split?
Using HetuEngine
Overview of HetuEngine Interactive Query
HetuEngine User Permission Management
HetuEngine User Permissions
Creating a HetuEngine Permission Role
Configuring Proxy User Authentication
Quickly Using HetuEngine to Access Hive Data Source
Creating a HetuEngine Compute Instance
Adding a HetuEngine Data Source
Using HetuEngine to Access Data Sources Across Sources and Domains
Adding a Hive Data Source
Adding a Hudi Data Source
Adding a ClickHouse Data Source
Adding a GaussDB Data Source
Adding an HBase Data Source
Adding a Cross-Cluster HetuEngine Data Source
Adding an IoTDB Data Source
Adding a MySQL Data Source
Configuring HetuEngine Materialized Views
Overview of HetuEngine Materialized Views
SQL Examples of HetuEngine Materialized Views
Rewriting of HetuEngine Materialized Views
HetuEngine Materialized View Recommendation
HetuEngine Materialized View Caching
Validity Period and Data Update of HetuEngine Materialized Views
HetuEngine Intelligent Materialized Views
Automatic Tasks of HetuEngine Materialized Views
HetuEngine SQL Diagnosis
Developing and Deploying HetuEngine UDFs
Developing and Deploying HetuEngine Function Plugins
Hive UDFs for Interconnecting with HetuEngine
Developing and Deploying HetuEngine UDFs
Managing a HetuEngine Data Source
Managing HetuEngine Compute Instances
Configuring HetuEngine Resource Groups
Configuring the Number of HetuEngine Worker Nodes
Configuring a HetuEngine Maintenance Instance
Configuring the Nodes on Which HetuEngine Coordinator Is Running
Importing and Exporting HetuEngine Compute Instance Configurations
Viewing the HetuEngine Instance Monitoring Page
Viewing HetuEngine Coordinator and Worker Logs
Configuring HetuEngine Query Fault Tolerance
HetuEngine Performance Tuning
Adjusting YARN Resource Allocation
Adjusting HetuEngine Cluster Node Resource Configurations
Optimizing HetuEngine INSERT Statements
Adjusting HetuEngine Metadata Caching
Enabling Dynamic Filtering in HetuEngine
Adjusting the Execution of Adaptive Queries in HetuEngine
Adjusting Timeout for Hive Metadata Loading
Tuning Hudi Data Source Performance
HetuEngine Log Overview
Common HetuEngine SQL Syntax
HetuEngine Data Type
DDL Syntax
CREATE SCHEMA
CREATE VIRTUAL SCHEMA
CREATE TABLE
CREATE TABLE AS
CREATE TABLE LIKE
CREATE VIEW
CREATE FUNCTION
CREATE MATERIALIZED VIEW
ALTER MATERIALIZED VIEW STATUS
ALTER MATERIALIZED VIEW
ALTER TABLE
ALTER VIEW
ALTER SCHEMA
DROP SCHEMA
DROP TABLE
DROP VIEW
DROP FUNCTION
DROP MATERIALIZED VIEW
REFRESH MATERIALIZED VIEW
TRUNCATE TABLE
COMMENT
VALUES
SHOW Syntax Overview
SHOW CATALOGS
SHOW SCHEMAS (DATABASES)
SHOW TABLES
SHOW TBLPROPERTIES TABLE|VIEW
SHOW TABLE/PARTITION EXTENDED
SHOW STATS
SHOW FUNCTIONS
SHOW SESSION
SHOW PARTITIONS
SHOW COLUMNS
SHOW CREATE TABLE
SHOW VIEWS
SHOW CREATE VIEW
SHOW MATERIALIZED VIEWS
SHOW CREATE MATERIALIZED VIEW
DML Syntax
INSERT
DELETE
UPDATE
LOAD
TCL Syntax
START TRANSACTION
COMMIT
ROLLBACK
DQL Syntax
SELECT
WITH
GROUP BY
HAVING
UNION | INTERSECT | EXCEPT
ORDER BY
OFFSET
LIMIT | FETCH FIRST
TABLESAMPLE
UNNEST
JOINS
Subqueries
SELECT VIEW CONTENT
REWRITE HINT
SQL Functions and Operators
Logical Operators
Comparison Functions and Operators
Condition Expression
Lambda Expression
Conversion Functions
Mathematical Functions and Operators
Bitwise Functions
Decimal Functions and Operators
String Functions and Operators
Regular Expressions
Binary Functions and Operators
JSON Functions and Operators
Date and Time Functions and Operators
Aggregate Functions
Window Functions
Array Functions and Operators
Map Functions and Operators
URL Function
Geospatial Function
HyperLogLog Functions
Color Function
Session Information
Teradata Function
Data Masking Functions
IP Address Functions
Quantile Digest Functions
T-Digest Functions
Set Digest Functions
Auxiliary Command Syntax
USE
SET SESSION
RESET SESSION
DESCRIBE
DESCRIBE FORMATTED COLUMNS
DESCRIBE DATABASE| SCHEMA
DESCRIBE INPUT
DESCRIBE OUTPUT
EXPLAIN
EXPLAIN ANALYZE
REFRESH CATALOG
REFRESH SCHEMA
REFRESH TABLE
ANALYZE
CALL
PREPARE
DEALLOCATE PREPARE
EXECUTE
VERIFY
Reserved Keywords
Implicit Data Type Conversion
Enabling Implicit Conversion
Disabling Implicit Conversion
Implicit Conversion Table
Data preparation for the sample table in this document
Syntax Compatibility of Common Data Sources
Common Issues About HetuEngine
What Should I Do After the HetuEngine Domain Name Is Changed?
What Can I Do If Starting the HetuEngine Cluster on the Client Times Out?
How Do I Handle Data Loss in a HetuEngine Data Source?
HetuEngine Troubleshooting
Python Not Exist When a HetuEngine Compute Instance Failed to Start
HetuEngine Compute Instance Is Faulty After Being Started
Using Hive
Hive User Permission Management
About Hive User Permissions
Creating a Hive Role
Granting Hive Permissions on Tables, Columns, or Databases
Granting Hive User Permissions to Use Other Components
Using the Hive Client
Using Hive for Data Analysis
Configuring Hive Data Storage and Encryption
Using HDFS Colocation to Store Hive Tables
Configuring Cold-Hot Separation for Hive Partition Metadata
Hive Supporting ZSTD Compression Formats
Compressing Hive ORC Tables Using ZSTD_JNI
Configuring the Hive Column Encryption
Hive on HBase
Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
Deleting Single-Row Records from Hive on HBase
Using Hive to Read Data in a Relational Database
Hive Supporting Reading Hudi Tables
Enterprise-Class Enhancements of Hive
Storing Hive Table Partitions to OBS and HDFS
Configuring Automatic Removal of Old Data in the Hive Directory to the Recycle Bin
Configuring Hive to Insert Data to a Directory That Does Not Exist
Forbidding Location Specification When Hive Internal Tables Are Created
Creating a Foreign Table in a Directory (Read and Execute Permission Granted)
Configuring HTTPS/HTTP-based REST APIs
Configuring Hive Transform
Switching the Hive Execution Engine to Tez
Hive Load Balancing
Configuring the Maximum Number of Maps for Hive Tasks
Configuring User Lease Isolation to Access HiveServer on a Specified Node
Configuring Component Isolation to Access Hive MetaStore
Configuring Load Balancing for HiveMetaStore Client Connections
Configuring Access Control Permission for the Dynamic View of a Hive Single Table
Allowing Users without ADMIN Permission to Create Temporary Functions
Allowing Users with Select Permission to View the Table Structure
Allowing Only the Hive Administrator to Create Databases and Tables in the Default Database
Configuring Hive to Support More Than 32 Roles
Creating User-Defined Hive Functions
Configuring High Reliability for Hive Beeline
Hive Performance Tuning
Creating Hive Table Partitions to for Faster Queries
Optimizing Hive Joins
Optimizing the Hive Group By Statement
Optimizing Hive OCR Data Storage
Optimizing Hive SQL Logic
Optimizing the Multi-Table Queries with Hive CBO
Hive O&M Management
Hive Common Configuration Parameters
Hive Log Overview
Importing and Exporting Hive Databases
Importing and Exporting Table/Partition Data in Hive
Locating Abnormal Hive Files
Common Hive SQL Syntax
Extended Hive SQL Syntax
Customizing Row Separators in Hive Tables
Syntax of Traditional Relational Databases Supported by Hive
Common Issues About Hive
How Do I Delete UDFs on Multiple HiveServers?
Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
Which special characters are not supported by Hive in complex field names
How Do I Monitor the Hive Table Size?
How Do I Prevent Data Loss Caused by Misoperations of the insert overwrite Statement?
Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Description of Hive Table Location (Either Be an OBS or HDFS Path)
Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
Does Hive Support Vectorized Query?
Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
How Do I Disable the Logging Function of Hive?
Why Hive Tables in the OBS Directory Fail to Be Deleted?
Why Does an OBS Quickly Deleted Directory Not Take Effect After Being Added to the Customized Hive Configuration?
Hive Troubleshooting
How Do I Optimize the INSERT OVERWRITE for Reading and Writing in Same Table?
Using Hudi
Hudi Table Overview
Creating a Hudi Table Using Spark Shell
Operating a Hudi Table Using hudi-cli.sh
Hudi Write Operation
Writing Data to Hudi Tables In Batches
Writing Data to Hudi Tables in Streams
Synchronizing Hudi Table Data to Hive
Hudi Read Operation
Hudi Read
Reading the Hudi COW Table View
Reading the Hudi MOR Table View
Data Management and Maintenance
Hudi Clustering
Hudi Cleaning
Hudi Compaction
Hudi Savepoint
Historical Hudi Data Deletion
Hudi Payload
Hudi SQL Syntax Reference
Restrictions on Using Hudi SQL
Hudi DDL Syntax
CREATE TABLE
CREATE TABLE AS SELECT
DROP TABLE
SHOW TABLE
ALTER RENAME TABLE
ALTER ADD COLUMNS
ALTER COLUMN
TRUNCATE TABLE
Hudi DML Syntax
INSERT INTO
MERGE INTO
UPDATE
DELETE
COMPACTION
SET/RESET
ARCHIVELOG
CLEAN
CLEANARCHIVE
Hudi CALL COMMAND Syntax
CHANGE_TABLE
CLEAN_FILE
SHOW_TIME_LINE
SHOW_HOODIE_PROPERTIES
SAVE_POINT
ROLL_BACK
CLUSTERING
CLEANING
COMPACTION
SHOW_COMMIT_FILES
SHOW_FS_PATH_DETAIL
SHOW_LOG_FILE
SHOW_INVALID_PARQUET
Hudi Schema Evolution
Evolution Introduction
Schema Evolution Scenarios
Configuring SparkSQL for Hudi Schema Evolution
Hudi Schema Evolution and Syntax
ADD COLUMNS
ALTER COLUMN
DROP COLUMN
RENAME
SET
RENAME COLUMN
Concurrency in the Hudi Schema Evolution
Configuring Default Values for Hudi Data Columns
Typical Hudi Configuration Parameters
Hudi Performance Tuning
Hudi Troubleshooting
"Parquet/Avro schema" Is Reported When Updated Data Is Written
UnsupportedOperationException Is Reported When Updated Data Is Written
SchemaCompatabilityException Is Reported When Updated Data Is Written
What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
Hudi Fails to Write Decimal Data with Lower Precision
Data in ro and rt Tables Cannot Be Synchronized to a MOR Table Recreated After Being Deleted Using Spark SQL
IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
HoodieException Is Reported When Data Is Collected
HoodieKeyException Is Reported When Data Is Collected
SQLException Is Reported During Hive Data Synchronization
HoodieHiveSyncException Is Reported During Hive Data Synchronization
SemanticException Is Reported During Hive Data Synchronization
Using Hue
Accessing the Hue Web UI
Creating a Hue Job
Using Hue to Execute HiveQL
Using Hue to Execute SparkSQL
Viewing Hive Metadata Using Hue
Managing HDFS Files Using Hue
Managing Oozie Jobs Using Hue
Managing HBase Tables Using Hue
Using Hue to Execute HetuEngine SQL Statements
Configuring HDFS Cold and Hot Data Migration
Typical Hue Parameters
Hue Log Overview
Common Issues About Hue
Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
How Do I Solve the Problem of Setting the Time Zone of the Oozie Editor on the Hue Web UI?
Hue Troubleshooting
Why Does the use database Statement Become Invalid in Hive?
Why Do HDFS Files Fail to Access Through the Hue Web UI?
Why Do Large Files Fail to Upload on the Hue Page
Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
What Should I Do If It Takes a Long Time to Access the Native Hue UI and the File Browser Reports "Read timed out"?
Using IoTDB
Data Types and Encodings Supported by IoTDB
IoTDB User Permission Management
IoTDB User Permission Description
Creating an IoTDB Permission Role
Using the IoTDB Client
Getting Started with IoTDB
IoTDB UDFs
IoTDB UDF Overview
IoTDB UDF Sample Code and Operations
IoTDB Performance Tuning
IoTDB O&M Management
IoTDB Common Configuration Parameters
IoTDB Log Overview
Planning IoTDB Capacity
Manually Importing IoTDB Data
Manually Exporting IoTDB Data
Using JobGateway
Configuring JobGateway Parameters
JobGateway Log Overview
Using Kafka
Kafka User Permission Management
Kafka User Permissions
Creating a Kafka Role
Configuring Token Authentication Information for Kafka Users
Using the Kafka Client
Using Kafka to Produce Consumption Data
Creating a Kafka Topic
Accessing Messages in Kafka Topics
Managing Kafka Topics
Viewing Kafka Topic Information
Modifying Kafka Topic Configurations
Adding Kafka Topic Partitions
Managing Messages in Kafka Topics
Viewing Kafka Data Production and Consumption Details
Enterprise-Class Enhancements of Kafka
Configuring Kafka HA and High Reliability
Configuring a Secure Transmission Protocol for Kafka Data
Configuring the Kafka Data Balancing Tool
Configuring the Path for Extranet Clients to Access Kafka Broker
Kafka Performance Tuning
Kafka O&M Management
Kafka Common Configuration Parameters
Kafka Log Overview
Changing the Broker Storage Directory
Migrating Data Between Kafka Nodes
Common Issues About Kafka
Kafka Specifications
Kafka Feature Description
Synchronizing Binlog-based MySQL Data to the MRS Cluster
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using Loader
Overview of Importing and Exporting Loader Data
Loader User Permission Management
Creating a Loader Role
Uploading the MySQL Database Connection Driver
Creating a Loader Data Import Job
Using Loader to Import Data to an MRS Cluster
Using Loader to Import Data from an SFTP Server to HDFS or OBS
Using Loader to Import Data from an SFTP Server to HBase
Using Loader to Import Data from an SFTP Server to Hive
Using Loader to Import Data from an FTP Server to HBase
Using Loader to Import Data from a Relational Database to HDFS or OBS
Using Loader to Import Data from a Relational Database to HBase
Using Loader to Import Data from a Relational Database to Hive
Using Loader to Import Data from HDFS or OBS to HBase
Using Loader to Import Data from a Relational Database to ClickHouse
Using Loader to Import Data from HDFS to ClickHouse
Creating a Loader Data Export Job
Using Loader to Export Data from an MRS Cluster
Using Loader to Export Data from HDFS or OBS to an SFTP Server
Using Loader to Export Data from HBase to an SFTP Server
Using Loader to Export Data from Hive to an SFTP Server
Using Loader to Export Data from HDFS or OBS to a Relational Database
Using Loader to Export Data from HDFS to MOTService
Using Loader to Export Data from HBase to a Relational Database
Using Loader to Export Data from Hive to a Relational Database
Using Loader to Export Data from HBase to HDFS or OBS
Using Loader to Export Data from HDFS to ClickHouse
Managing Loader Jobs
Migrating Loader Jobs in Batches
Deleting Loader Jobs in Batches
Importing Loader Jobs in Batches
Exporting Loader Jobs in Batches
Viewing Historical Information About a Loader Job
Purging Historical Loader Data
Managing Loader Links
Loader O&M Management
Loader Common Configuration Parameters
Loader Log Overview
Loader Operator Help
Loader Operator Description
Loader Input Operators
CSV File Input
Fixed File Input
Table Input
HBase Input
HTML Input
Hive input
Spark Input
Loader Conversion Operators
Long Date Conversion
Null Value Conversion
Constant Field Addition
Random Value Conversion
Concat Fields
Extract Fields
Modulo Integer
String Cut
EL Operation
String Operations
String Reverse
String Trim
Filter Rows
Update Fields Operator
Loader Output Operators
Hive output
Spark Output
Table Output
File Output
HBase Output
ClickHouse Output
Managing Loader Operator Configurations
Using Macro Definitions in Configuration Items
Operator Data Processing Rules
Loader Client Tools
Running a Loader Job by Using Commands
loader-tool Usage Guide
loader-tool Usage Example
schedule-tool Usage Guide
schedule-tool Usage Example
Using loader-backup to Back Up Job Data
Open Source sqoop-shell Tool Usage Guide
Importing Data to HDFS Using sqoop-shell
Importing Data to HDFS Using sqoop-shell
Common Issues About Loader
Data Cannot Be Saved When Loader Jobs Are Configured
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Why Data Is Not Imported to HDFS After All Data Types of SQL Server Are Selected?
An Error Is Reported When a Large Amount of Data Is Written to HDFS
Failed to Run Jobs Related to the sftp-connector Connector
Using MapReduce
Configuring the Distributed Cache
Configuring the MapReduce Shuffle Address
Configuring the MapReduce Cluster Administrator List
Transmitting MapReduce Tasks from Windows to Linux
Configuring the Archiving and Clearing Mechanism for MapReduce Task Logs
MapReduce Performance Tuning
MapReduce Optimization Configuration for Multiple CPU Cores
Determining the Job Baseline
MapReduce Shuffle Tuning
AM Optimization for Big MapReduce Tasks
Speculative Execution
Using Slow Start
Optimizing Performance for Committing MR Jobs
Reducing Client Application Failure Rate
Mapreduce Log Overview
Common Issues About MapReduce
After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time
Why Does a MapReduce Task Stay Unchanged for a Long Time?
Why the Client Hangs During Job Running?
Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
MapReduce Job Failed in Multiple NameService Environment
Why a Fault MapReduce Node Is Not Blacklisted?
Using Oozie
Submitting a Job Using the Oozie Client
Oozie Client Configurations
Submitting a Hive Task Using the Oozie Client
Submitting a Spark2x Task Using the Oozie Client
Submitting a Loader Task Using the Oozie Client
Submitting a DistCp Task Using the Oozie Client
Submitting Other Tasks Using the Oozie Client
Using Hue to Submit an Oozie Job
Creating a Workflow Using Hue
Submitting an Oozie Hive2 Job Using Hue
Submitting an Oozie HQL Script Using Hue
Submitting an Oozie Spark2x Job Using Hue
Submitting an Oozie Java Job Using Hue
Submitting an Oozie Loader Job Using Hue
Submitting an Oozie MapReduce Job Using Hue
Submitting an Oozie Sub-workflow Job Using Hue
Submitting an Oozie Shell Job Using Hue
Submitting an Oozie HDFS Job Using Hue
Submitting an Oozie Streaming Job Using Hue
Submitting an Oozie DistCp Job Using Hue
Submitting an Oozie SSH Job Using Hue
Submitting a Coordinator Periodic Scheduling Job Using Hue
Submitting a Bundle Batch Processing Job Using Hue
Querying Oozie Job Results on the Hue Page
Configuring Mutual Trust Between Oozie Nodes
Enterprise-Class Enhancements of Oozie
Configuring Oozie High Availability (HA)
Checking Whether the JAR Package on Which Oozie Depends Is Correct Using Share Lib
Oozie Log Overview
Common Issues About Oozie
Oozie Scheduled Tasks Are Not Executed on Time
Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
Common Oozie Troubleshooting Methods
Using Ranger
Enabling Ranger Authentication for MRS Cluster Services
Logging In to the Ranger Web UI
Adding a Ranger Permission Policy
Configuration Examples for Ranger Permission Policy
Adding a Ranger Access Permission Policy for CDL
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark2x
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for HetuEngine
Adding a Ranger Access Permission Policy for OBS
Hive Tables Supporting Cascading Authorization
Viewing Ranger Audit Information
Configuring Ranger Security Zone
Viewing Ranger User Permission Synchronization Information
Ranger Performance Tuning
Ranger Log Overview
Common Issues About Ranger
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
Ranger Troubleshooting
Ranger Fails to Be Started During Cluster Installation
Existing HBase Tables Cannot Be Searched Using Wildcards When HBase Permission Policies Are Configured
Using Spark/Spark2x
Spark Usage Instruction
Spark User Permission Management
Introduction to SparkSQL User Permissions
Creating a Spark SQL Role
Configuring User Permissions for Spark Tables, Columns, and Databases
Configuring Permissions for Spark SQL Service User
Configuring Spark Web UI ACLs
Permission Parameters of the Spark Client and Server
Using the Spark Client
Accessing the Spark Web UI
Submitting a Spark Job as a Proxy User
Configuring Spark to Read HBase Data
Configuring Spark Tasks Not to Obtain HBase Token Information
Spark Core Enterprise-Class Enhancements
Configuring Spark HA to Enhance HA
Configuring Multi-active Instance Mode
Configuring the Spark Multi-Tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Spark Native Engine
Configuring the Size of the Spark Event Queue
Configuring the Compression Format of a Parquet Table
Adapting to the Third-party JDK When Ranger Is Used
Using the Spark Small File Combination Tool
Using the Spark Small File Combination Tool
Configuring Streaming Reading of Spark Driver Execution Results
Enabling a Spark Executor to Execute Custom Code When Exiting
Spark SQL Enterprise-Class Enhancements
Configuring Vector-based ORC Data Reading
Filtering Partitions Without Paths in a Partitioned Table
Configuring the Drop Partition Command to Support Batch Deletion
Configuring Dynamic Overwriting for Hive Table Partitions
Configuring Spark SQL to Enable the Adaptive Execution Feature
Spark Streaming Enterprise-Class Enhancements
Configuring the LIFO Function When Spark Streaming Interconnects with Kafka
Configuring Reliability of Interconnection Between Spark Streaming and Kafka
Configuring Structured Streaming to Use RocksDB for State Store
Spark Core Performance Tuning
Spark Core Data Serialization
Spark Core Memory Tuning
Setting Spark Core DOP
Configuring Spark Core Broadcasting Variables
Configuring Heap Memory Parameters for Spark Executor
Using the External Shuffle Service to Improve Spark Core Performance
Configuring Spark Dynamic Resource Scheduling in YARN Mode
Adjusting Spark Core Process Parameters
Spark DAG Design Specifications
Experience Summary
Spark SQL Performance Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the Spark INSERT SELECT Statement
Configuring Multiple Concurrent Clients to Connect to JDBCServer
Configuring the Default Number of Data Blocks Divided by SparkSQL
Optimizing Memory When Data Is Inserted into Spark Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Performance Tuning
Spark on OBS Performance Tuning
Spark O&M Management
Configuring Spark Parameters Rapidly
Spark Common Configuration Parameters
Spark Log Overview
Obtaining Container Logs of a Running Spark Application
Changing Spark Log Levels
Viewing Container Logs on the Web UI
Configuring the Number of Lost Executors Displayed on the Web UI
Configuring Local Disk Cache for JobHistory
Configuring Spark Event Log Rollback
Enhancing Stability in a Limited Memory Condition
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Broaden Support for Hive Partition Pruning Predicate Pushdown
Configuring the Column Statistics Histogram for Higher CBO Accuracy
Using CarbonData for First Query
Common Issues About Spark
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
"Failed to CREATE_FILE" Is Displayed When Data Is Inserted into the Dynamic Partitioned Table Again
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
NodeManager OOM Occurs During Spark Application Execution
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
How Do I Rectify the Exception Occurred When I Perform an Operation on the Table Named table?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Functions Cannot Be Used When Different JDBCServers Are Connected?
Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
Spark Streaming
Same DAG Log Is Recorded Twice for a Streaming Task
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why Does the Spark Streaming Application Fail to Be Started from the Checkpoint When the Input Stream Has No Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
What Should I Do If Recycle Bin Version I Set on the Spark Client Does Not Take Effect?
How Do I Change the Log Level to INFO When Using Spark yarn-client?
Spark Troubleshooting
Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Apps Cannot Be Displayed on the JobHistory Page When an Empty Part File Is Loaded
Why Does Spark2x Fail to Export a Table with the Same Field Name?
Why JRE fatal error after running Spark application multiple times?
Native Spark2x UI Fails to Be Accessed or Is Incorrectly Displayed when Internet Explorer Is Used for Access
How Does Spark2x Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
Spark Shuffle Exception Handling
Why Cannot Common Users Log In to the Spark Client When There Are Multiple Service Scenarios in Spark?
Why Does the Cluster Port Fail to Connect When a Client Outside the Cluster Is Installed or Used?
How Do I Handle the Exception Occurred When I Query Datasource Avro Formats?
What Should I Do If Statistics of Hudi or Hive Tables Created Using Spark SQLs Are Empty Before Data Is Inserted?
Failed to Query Table Statistics by Partition Using Non-Standard Time Format When the Partition Column in the Table Creation Statement is timestamp
How Do I Use Special Characters with TIMESTAMP and DATE?
Using Tez
Accessing the Tez Web UI to View the Task Execution Result
Typical Tez Configuration Parameters
Tez Log Overview
Common Issues About Tez
Tez Task Details Cannot Be Displayed on the Tez Web UI
Failed to Access the Tez Web UI
YARN Logs Cannot Be Viewed on the Tez Web UI
Table Data Is Empty on the TezUI HiveQueries Page
Using YARN
Yarn User Permission Management
Creating Yarn Roles
Submitting a Task Using the Yarn Client
Configuring Container Log Aggregation
Enabling Yarn CGroups to Limit the Container CPU Usage
Configuring HA for TimelineServer
Enterprise-Class Enhancements of Yarn
Configuring the Yarn Permission Control
Specifying the User Who Runs Yarn Tasks
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring ApplicationMaster Work Preserving
Configuring the Access Channel Protocol
Configuring the Additional Scheduler WebUI
Configuring Resources for a NodeManager Role Instance
Configuring Yarn Restart
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
Yarn O&M Management
YARN Common Configuration Parameters
Yarn Log Overview
Configuring the Localized Log Levels
Configuring Memory Usage Detection
Changing NodeManager Storage Directories
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Configuring the ZooKeeper Permissions
ZooKeeper Common Configuration Parameters
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Accessing FusionInsight Manager
Using an MRS Client
Installing a Client
Updating a Client
Component Operation Guide (Normal)
Using Alluxio
Configuring an Underlying Storage System
Accessing Alluxio Using a Data Application
Common Operations of Alluxio
Using CarbonData (for Versions Earlier Than MRS 3.x)
Using CarbonData from Scratch
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Using CarbonData (for MRS 3.x or Later)
CarbonData Data Types
CarbonData Table User Permissions
Creating a CarbonData Table Using the Spark Client
CarbonData Data Analytics
Creating a CarbonData Table
Deleting a CarbonData Table
Modify the CarbonData Table
Loading Data to a CarbonData Table
Deleting CarbonData Table Segments
Compacting CarbonData Table Segments
CarbonData Performance Tuning
CarbonData Tuning Approach
Typical CarbonData Performance Tuning Parameters
Suggestions for Creating CarbonData Tables
Typical CarbonData Configuration Parameters
CarbonData Syntax Reference
DDL
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
DML
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Concurrent CarbonData Table Operations
CarbonData Segment APIs
CarbonData Tablespace Index
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deteriorated Due to Insufficient Executor Memory
CarbonData FAQs
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why Data Load Performance Decreases due to Bad Records?
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Why Data loading Fails During off heap?
Why Do I Fail to Create a Hive Table?
How Do I Logically Split Data Across Different Namespaces?
Why Does the Missing Privileges Exception Occur When the Database Is Dropped?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Does CarbonData Become Abnormal After the Disk Space Quota of the HDFS Storage Directory Is Set?
Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
Using ClickHouse
ClickHouse Overview
ClickHouse User Permission Management
ClickHouse User and Permission Management
Interconnecting ClickHouse With OpenLDAP for Authentication
Using the ClickHouse Client
Creating a ClickHouse Table
ClickHouse Data Import
Interconnecting ClickHouse with RDS for MySQL
Interconnecting ClickHouse with OBS
Synchronizing Kafka Data to ClickHouse
Importing DWS Table Data to ClickHouse
Using ClickHouse to Import and Export Data
Enterprise-Class Enhancements of ClickHouse
Accessing ClickHouse Through ELB
Enabling the mysql_port Configuration for ClickHouse
ClickHouse Performance Tuning
Solution to the "Too many parts" Error in Data Tables
Accelerating Merge Operations
Accelerating TTL Operations
ClickHouse O&M Management
ClickHouse Log Overview
ClickHouse Cluster Management
ClickHouse Cluster Configuration
Expanding the Data Disk Capacity of a ClickHouse Node
Backing Up and Restoring ClickHouse Data Using a Data File
Migrating Data Between ClickHouseServer Nodes in a Cluster
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Structure
ALTER TABLE: Modifying Table Data
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
ClickHouse FAQ
How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
How Do I Migrate Data from Hive/HDFS to ClickHouse?
An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data
How Do I Grant the Select Permission at the Database Level to ClickHouse Users?
Using DBService
DBService Log Overview
Using Flink
Flink Job Engine
Flink User Permission Management
Flink Security Authentication
Flink User Permissions
Creating a FlinkServer Role
Configuring Security Authentication for Interconnecting with Kafka
Configuring Flink Authentication and Encryption
Using the Flink Client
Preparing for Creating a FlinkServer Job
Accessing the FlinkServer Web UI
Creating a FlinkServer Application
Creating aFlinkServer Cluster Connection
Creating a FlinkServerData Connection
Creating a FlinkServerStream TableSource
Creating a FlinkServerJob
Managing FlinkServer Jobs
Configuring the FlinkServer Job Restart Policy
Using UDFs in FlinkServer Jobs
Flink O&M Management
Typical Flink Configuration Parameters
Flink Log Overview
Flink Performance Tuning
Flink Memory GC Optimization Parameters
Flink Job Concurrency
Flink Job Process Parameters
Flink Netty Network Communication Parameters
Typical Commands of the Flink Client
Common Issues About Flink
Example of Issuing a Certificate
Using Flume
Flume Log Collection Overview
Flume Service Model Configuration
Installing the Flume Client
Installing the Flume Client on Clusters of Versions Earlier Than MRS 3.x
Installing the Flume Client on MRS 3.x or Later Clusters
Quickly Using Flume to Collect Node Logs
Configuring a Non-Encrypted Flume Data Collection Task
Generating Configuration Files for the Flume Server and Client
Using Flume Server to Collect Static Logs from Local Host to Kafka
Using Flume Server to Collect Static Logs from Local Host to HDFS
Using Flume Server to Collect Dynamic Logs from Local Host to HDFS
Using Flume Server to Collect Logs from Kafka to HDFS
Using Flume Client to Collect Logs from Kafka to HDFS
Using Cascaded Agents to Collect Static Logs from Local Host to HBase
Configuring an Encrypted Flume Data Collection Task
Configuring the Encrypted Transmission
Using Cascaded Agents to Collect Static Logs from Local Host to HDFS
Enterprise-Class Enhancements of Flume
Using the Encryption Tool of the Flume Client
Configuring Flume to Connect to Kafka in Security Mode
Flume O&M Management
Typical Flume Configuration Parameters
Flume Service Configuration Guide
Flume LogsOverview
Viewing Flume Client Logs
Viewing Flume Client Monitoring Information
Stopping or Uninstalling the Flume Client
Common Issues About Flume
How Do I View Flume Logs
How Do I Use Environment Variables in the Flume Configuration File
How Do I Develop a Third-Party Flume Plug-in
How Do I Configure a Custom Flume Script
Using HBase
Creating HBase Roles
Using the HBase Client
Quickly Using HBase for Offline Data Analysis
Migrating Data to HBase Using BulkLoad
HBase Data Operations
Creating HBase Indexes for Data Query
Configuring HBase Data Compression and Encoding Formats
Enterprise-Class Enhancements of HBase
Configuring HBase Local Secondary Indexes for Faster Queries
About HBase Local Secondary Indexes
Loading HBase Data in Batches and Generating Local Secondary Indexes
Using TableIndexer to Generate a Local HBase Secondary Index
Migrating HBase Index Data
Improving HBase BulkLoad Data Migration
Importing HBase Data in Batches Using BulkLoad
Updating HBase Data in Batches Using BulkLoad
Deleting HBase Data in Batches Using BulkLoad
Counting Rows in an HBase Table Using BulkLoad
BulkLoad Configuration File
Configuring RSGroup to Manage RegionServer Resources
HBase Performance Tuning
Improving HBase BulkLoad Performance
Improving HBase Continuous Put Performance
Optimizing Put and Scan Performance
Improving HBase Real-time Write Performance
Improving HBase Real-Time Read Efficiency
Tuning HBase JVM Parameters
HBase O&M Management
HBase Log Overview
HBase Common Configuration Parameters
Configuring Region In Transition Recovery Chore Service
Enabling Inter-Cluster Copy to Back Up Data
Configuring Automatic Data Backup for Active and Standby HBase Clusters
Configuring HBase Cluster HA and DR
Configuring HBase Active/Standby DR
Switching Between Active and Standby HBase Clusters
Switching Between Active and DR HBase Clusters
Common Issues About HBase
Operation Failures Occur in Stopping BulkLoad On the Client
How Do I Restore a Region in the RIT State for a Long Time?
What Should I Do If HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
What Should I Do If Error Message "java.lang.UnsatisfiedLinkError: Permission denied" Is Displayed When I Start the HBase Shell?
When Will the" Dead Region Servers" Information Displayed on the HMaster Web UI Be Cleared After a RegionServer Is Stopped?
What Can I Do If a Message Indicating Insufficient Permission Is Displayed When I Access HBase Phoenix?
What Can I Do If a Message Indicating Insufficient Permission Is Displayed When a Tenant Uses HBase BulkLoad?
How Do I Restore an HBase Region in Overlap State?
Phoenix BulkLoad Use Restrictions
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
HBase Troubleshooting
The HBase Client Failed to Connect to the Server for a Long Time
An Exception Occurred When HBase Deletes and Creates a Table Consecutively
Other Services Are Unstable When Too Many HBase Connections Occupy the Network Ports
HBase BulkLoad Tasks of 210,000 Map Tasks and 10,000 Reduce Tasks failed To Be Executed
Modified and Deleted Data Can Still Be Queried by the Scan Command
Failed to Create Tables When the Region is in FAILED_OPEN State
How to Delete the residual Table Name on the ZooKeeper table-lock Node After a Table Creation Failure
HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS
HMaster Failed to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata
FileNotFoundException Is Frequently Printed in HMaster Logs
"Permission denied" Was Displayed When the ImportTsv Tool Failed to Run
Data Is Successfully Imported Using HBase BulkLoad, but Different Results May Be Returned To the Same Query
HBase Data Restoration Task Failed to Be Rolled Back
RegionServer Failed to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB
When LoadIncrementalHFiles Is Used to Import Data in Batches on Cluster Nodes, the Insufficient Permission Error Is Reported
"import argparse" Is Reported When the Phoenix Sqlline Script Is Used
Using HDFS
Overview of HDFS File System Directories
HDFS User Permission Management
Creating an HDFS Role
Granting HDFS Users the Permission to Access HDFS Files
Using the HDFS Client
Using Hadoop
Configuring the Recycle Bin Mechanism
Configuring HDFS DataNode Data Balancing
Configuring HDFS Disk Balancing
Using HDFS Mover to Migrate Data
Configuring the Label Policy (NodeLabel) for HDFS File Directories
Configuring NameNode Memory Parameters
Setting the Number Limit of HBase and HDFS Handles
Configuring the Number of Files in a Single HDFS Directory
Enterprise-Class Enhancements of HDFS
Configuring Replica Replacement Policy for DataNodes with Inconsistent Capacity
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring the Observer NameNode to Process Read Operations
Enabling the NameNode Blacklist
Configuring Hadoop Data Encryption During Transmission
HDFS Performance Tuning
Improving HDFS Write Performance
Improving Read Performance By HDFS Client Metadata Caching
Improving the HDFS Client Connection Performance with Active NameNode Caching
Optimization for Unstable HDFS Network
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Performing Concurrent Operations on HDFS Files
Using the LZC Compression Algorithm to Store HDFS Files
HDFS O&M Management
HDFS Common Configuration Parameters
HDFS Log Overview
Viewing the HDFS Capacity
Changing the DataNode Storage Directory
Adjusting Parameters Related to Damaged DataNode Disk Volumes
Configuring the Maximum Lifetime of an HDFS Token
Using DistCp to Copy HDFS Data Across Clusters
Configuring the NFS Server to Store NameNode Metadata
Common Issues About HDFS
What Should I Do If an Error Is Reported When I Run DistCp Commands?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
What Should I Do If the HDFS Web UI Cannot Update the Information About the Damaged Data?
What Should I Do If the HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
Why Does DataNode Fail to Report Data Blocks?
Can I Modify the DataNode Data Storage Directory?
What Can I Do If the DataNode Capacity Is Incorrectly Calculated?
Why Is Data in the Cache Lost When Small Files Are Stored?
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
Why Some Blocks Are Missing on the NameNode UI?
HDFS Troubleshooting
Why Is "java.net.SocketException" Reported When Data Is Written to HDFS
It Takes a Long Time to Restart NameNode After a Large Number of Files Are Deleted
NameNode Fails to Be Restarted Due to EditLog Discontinuity
The standby NameNode Fails to Be Started After It Is Powered Off During Metadata Storage
DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated
"ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat
Using Hive
Hive User Permission Management
About Hive User Permissions
Creating a Hive Role
Granting Hive User Permissions on Tables, Columns, or Databases
Granting Hive User Permissions to Use Other Components
Using the Hive Client
Using Hive for Data Analysis
Configuring Hive Data Storage and Encryption
Using HDFS Colocation to Store Hive Tables
Configuring Cold-Hot Separation for Hive Partition Metadata
Hive Supporting ZSTD Compression Formats
Configuring the Hive Column Encryption
Hive on HBase
Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
Deleting Single-Row Records from Hive on HBase
Using Hive to Read Data in a Relational Database
Enterprise-Class Enhancement of Hive
Configuring Automatic Removal of Old Data in the Hive Directory to the Recycle Bin
Configuring Hive to Insert Data to a Directory That Does Not Exist
Forbidding Location Specification When Hive Internal Tables Are Created
Creating a Foreign Table in a Directory (Read and Execute Permission Granted)
Configuring HTTPS/HTTP-based REST APIs
Configuring Hive Transform
Switching the Hive Execution Engine to Tez
Hive Load Balancing
Configuring the Maximum Number of Maps for a Hive Task
Configuring User Lease Isolation to Access HiveServer on a Specified Node
Configuring Access Control Permission for the Dynamic View of a Hive Single Table
Allowing Users without ADMIN Permission to Create Temporary Functions
Allowing Users with Select Permission to View the Table Structure
Allowing Only the Hive Administrator to Create Databases and Tables in the Default Database
Configuring Hive to Support More Than 32 Roles
Creating User-Defined Hive Functions
Configuring High Reliability for Hive Beeline
Hive Performance Tuning
Creating Hive Table Partitions to for Faster Queries
Hive Join Optimization
Optimizing the Hive Group By Statement
Optimizing Hive ORC Data Storage
Optimizing Hive SQL Logic
Optimizing Query Performance with Hive CBO
Hive O&M Management
Hive Common Configuration Parameters
Hive Log Overview
Common Hive SQL Syntax
Extended Hive SQL Syntax
Customizing Row Separators in Hive Tables
Syntax of Traditional Relational Databases Supported by Hive
Common Issues About Hive
How Do I Delete All Permanent Functions from HiveServer?
Why Cannot the DROP Operation Be Performed on a Backed Up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
What Are the Special Characters Not Supported by Hive in Complex Field Names?
How Do I Monitor the Hive Table Size?
How Do I Prevent Data Loss Caused by Misoperations of the insert overwrite Statement?
How Do I Handle a Slow Hive on Spark Task When HBase Is Not Installed?
What Should I Do If an Error Is Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in Hive?
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Does the Location of a Hive Table Support Cross-OBS and Cross-HDFS Paths?
What Should I Do If the MapReduce Engine Cannot Query the Data Written by the Union Statement Running on Tez?
Does Hive Support Concurrent Data Writing to the Same Table or Partition?
Does Hive Support Vectorized Query?
What Should I Do If the Task Fails When the HDFS Data Directory of the Hive Table Is Deleted By Mistake, But The Metadata Still Exists?
How Do I Disable the Logging Function of Hive?
Why Is the OBS Quick Deletion Directory Not Applied After Being Added to the Custom Hive Configuration?
Hive Configuration Problems
Hive Troubleshooting
How Do I Optimize the INSERT OVERWRITE for Reading and Writing in Same Table?
How Do I Troubleshoot Slow Hive SQL Execution?
Using Hudi
Hudi Table Overview
Creating a Hudi Table Using Spark Shell
Operating a Hudi Table Using hudi-cli.sh
Hudi Write Operation
Writing Data to Hudi Tables In Batches
Writing Data to Hudi Tables in Streams
Synchronizing Hudi Table Data to Hive
Hudi Read Operation
Read Hudi Data
Reading the Hudi COW Table View
Reading the Hudi MOR Table View
Data Management and Maintenance
Hudi Clustering
Hudi Cleaning
Hudi Compaction
Hudi Savepoint
Typical Hudi Configuration Parameters
Write Configuration
Configuration of Hive Table Synchronization
Index Configuration
Storage Configuration
Compaction and Cleaning Configurations
Single-Table Concurrency Control Configuration
Hudi Performance Tuning
Common Issues About Hudi
Data Write
Parquet/Avro schema Is Reported When Updated Data Is Written
UnsupportedOperationException Is Reported When Updated Data Is Written
SchemaCompatabilityException Is Reported When Updated Data Is Written
What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
Hudi Fails to Write Decimal Data with Lower Precision
Data Collection
IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
HoodieException Is Reported When Data Is Collected
HoodieKeyException Is Reported When Data Is Collected
Hive Synchronization
SQLException Is Reported During Hive Data Synchronization
HoodieHiveSyncException Is Reported During Hive Data Synchronization
SemanticException Is Reported During Hive Data Synchronization
Using Hue (Versions Earlier Than MRS 3.x)
Accessing the Hue Web UI
Using Hue WebUI to Operate Hive Tables
Using HiveQL Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Typical Hue Configurations
Using Hue (MRS 3.x or Later)
Accessing the Hue Web UI
Using Hue WebUI to Operate Hive Tables
Creating a Hue Job
Using HiveQL Editor on the Hue Web UI
Using the SparkSql Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using HBase on the Hue Web UI
Typical Application Scenarios of the Hue Web UI
HDFS on Hue
Configuring HDFS Cold and Hot Data Migration
Hive on Hue
Oozie on Hue
Typical Hue Configurations
Hue Log Overview
Common Issues About Hue
Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
Why Does the use database Statement Become Invalid in Hive?
Why Do HDFS Files Fail to Access Through the Hue Web UI?
Why Do Large Files Fail to Upload on the Hue Page
Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
How Do I Solve the Problem of Setting the Time Zone of the Oozie Editor on the Hue Web UI?
What Should I Do If It Takes a Long Time to Access the Native Hue UI and the File Browser Reports "Read timed out"?
Using Impala
Using the Impala Client
Accessing the Impala Web UI
Using Impala to Operate Kudu Tables
Interconnecting Impala with External LDAP
Enabling and Configuring a Dynamic Resource Pool for Impala
Using the Impala Query Management Page
Typical Impala Configurations
Impala FAQ
Does Impala Support Disk Hot Swapping?
Using Kafka
Kafka Data Consumption
Kafka User Permission Management
Kafka User Permissions
Creating a Kafka Permission Role
Configuring Token Authentication Information for Kafka Users
Using the Kafka Client
Quickly Using Kafka to Produce and Consume Data
Creating a Kafka Topic
Checking the Consumption Information of Kafka Topics
Managing Kafka Topics
Viewing Kafka Topic Information
Modifying Kafka Topic Configurations
Adding Kafka Topic Partitions
Managing Messages in Kafka Topics
Viewing Kafka Data Production and Consumption Details
Enterprise-Class Enhancements of Kafka
Configuring Kafka HA and High Reliability
Configuring a Secure Transmission Protocol for Kafka Data
Configuring the Kafka Data Balancing Tool
Kafka Performance Tuning
Kafka O&M Management
Kafka Common Configuration Parameters
Kafka Log Overview
Changing the Broker Storage Directory
Migrating Data on a Kafka Node
Balancing Data After Kafka Capacity Expansion
Common Issues About Kafka
Kafka Specifications
Kafka Feature Description
Synchronizing Binlog-based MySQL Data to the MRS Cluster
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using KafkaManager
Introduction to KafkaManager
Accessing the KafkaManager Web UI
Managing Kafka Clusters
Kafka Cluster Monitoring Management
Using Loader
Using Loader from Scratch
How to Use Loader
Common Loader Parameters
Creating a Loader Role
Loader Link Configuration
Managing Loader Links (Versions Earlier Than MRS 3.x)
Managing Loader Links (MRS 3.x and Later Versions)
Source Link Configurations of Loader Jobs
Destination Link Configurations of Loader Jobs
Managing Loader Jobs
Preparing a Driver for MySQL Database Link
Importing Data
Overview
Importing Data Using Loader
Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
Typical Scenario: Importing Data from an SFTP Server to HBase
Typical Scenario: Importing Data from an SFTP Server to Hive
Typical Scenario: Importing Data from an FTP Server to HBase
Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
Typical Scenario: Importing Data from a Relational Database to HBase
Typical Scenario: Importing Data from a Relational Database to Hive
Typical Scenario: Importing Data from HDFS or OBS to HBase
Typical Scenario: Importing Data from a Relational Database to ClickHouse
Typical Scenario: Importing Data from HDFS to ClickHouse
Exporting Data
Overview
Using Loader to Export Data
Typical Scenario: Exporting Data from HDFS or OBS to an SFTP Server
Typical Scenario: Exporting Data from HBase to an SFTP Server
Typical Scenario: Exporting Data from Hive to an SFTP Server
Typical Scenario: Exporting Data from HDFS or OBS to a Relational Database
Typical Scenario: Exporting Data from HBase to a Relational Database
Typical Scenario: Exporting Data from Hive to a Relational Database
Typical Scenario: Importing Data from HBase to HDFS or OBS
Managing Jobs
Migrating Loader Jobs in Batches
Deleting Loader Jobs in Batches
Importing Loader Jobs in Batches
Exporting Loader Jobs in Batches
Viewing Historical Job Information
Operator Help
Overview
Input Operators
CSV File Input
Fixed File Input
Table Input
HBase Input
HTML Input
Hive input
Spark Input
Conversion Operators
Long Date Conversion
Null Value Conversion
Constant Field Addition
Random Value Conversion
Concat Fields
Extract Fields
Modulo Integer
String Cut
EL Operation
String Operations
String Reverse
String Trim
Filter Rows
Update Fields Operator
Output Operators
Hive output
Spark Output
Table Output
File Output
HBase Output
ClickHouse Output
Associating, Editing, Importing, or Exporting the Field Configuration of an Operator
Using Macro Definitions in Configuration Items
Operator Data Processing Rules
Client Tools
Running a Loader Job by Using Commands
loader-tool Usage Guide
loader-tool Usage Example
schedule-tool Usage Guide
schedule-tool Usage Example
Using loader-backup to Back Up Job Data
Open Source sqoop-shell Tool Usage Guide
Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
Loader Log Overview
Example: Using Loader to Import Data from OBS to HDFS
Common Issues About Loader
Data Cannot Be Saved in Internet Explorer 10 or 11
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Using Kudu
Using Kudu from Scratch
Accessing the Kudu Web UI
Using MapReduce
Configuring the Distributed Cache to Execute MapReduce Jobs
Configuring the MapReduce Shuffle Address
Configuring the MapReduce Cluster Administrator List
Submitting a MapReduce Task on Windows
Configuring the Archiving and Clearing Mechanism for MapReduce Task Logs
MapReduce Performance Tuning
MapReduce Optimization Configuration for Multiple CPU Cores
Configuring the Baseline Parameters for MapReduce Jobs
MapReduce Shuffle Tuning
AM Optimization for Big MapReduce Tasks
Configuring Speculative Execution for MapReduce Tasks
Tuning MapReduce Tasks Using Slow Start
Optimizing the Commit Phase of MapReduce Tasks
Improving MapReduce Client Task Reliability
MapReduce Log Overview
Common Issues About MapReduce
After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time
How Do I Handle the Problem that MapReduce Task Has No Progress for a Long Time?
Why Is the Client Unavailable When a Task Is Running?
What Should I Do If HDFS_DELEGATION_TOKEN Cannot Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
What Should I Do If MapReduce Job Information Cannot Be Opened Through Tracking URL on the ResourceManager Web UI?
Why MapReduce Tasks Fails in the Environment with Multiple NameServices?
What Should I Do If the Partition-based Task Blacklist Is Abnormal?
Using OpenTSDB
Using an MRS Client to Operate OpenTSDB Metric Data
Running the curl Command to Operate OpenTSDB
Using Oozie
Using Oozie Client to Submit an Oozie Job
Oozie Client Configurations
Submitting a Hive Task Using the Oozie Client
Submitting a Spark2xTask Using the Oozie Client
Submitting a Loader Task Using the Oozie Client
Submitting a DistCp Task Using the Oozie Client
Submitting Other Tasks Using the Oozie Client
Using Hue to Submit an Oozie Job
Creating a Workflow Using Hue
Submitting an Oozie Hive2 Job Using Hue
Submitting an Oozie HQL Script Using Hue
Submitting an Oozie Spark2x Job Using Hue
Submitting an Oozie Java Job Using Hue
Submitting an Oozie Loader Job Using Hue
Submitting an Oozie MapReduce Job Using Hue
Submitting an Oozie Sub Workflow Job Using Hue
Submitting an Oozie Shell Job Using Hue
Submitting an Oozie HDFS Job Using Hue
Submitting an Oozie Streaming Job Using Hue
Submitting an Oozie Distcp Job Using Hue
Submitting an Oozie SSH Job Using Hue
Submitting a Coordinator Periodic Scheduling Job Using Hue
Submitting a Bundle Batch Processing Job Using Hue
Querying Oozie Job Results on the Hue Web UI
Configuring Mutual Trust Between Oozie Nodes
Enabling Oozie High Availability (HA)
Oozie Log Overview
Common Issues About Oozie
Oozie Scheduled Tasks Are Not Executed on Time
Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
Common Troubleshooting Methods for Oozie Job Execution Failures
Using Presto
Accessing the Presto Web UI
Using a Client to Execute Query Statements
Presto FAQ
How Do I Configure Multiple Hive Connections for Presto?
Using Ranger (MRS 1.9.2)
Creating a Ranger Cluster
Accessing the Ranger Web UI and Synchronizing Unix Users to the Ranger Web UI
Configuring Hive/Impala Access Permissions in Ranger
Configuring HBase Access Permissions in Ranger
Using Ranger (MRS 3.x)
Logging In to the Ranger Web UI
Enabling Ranger Authentication for MRS Cluster Services
Adding a Ranger Permission Policy
Configuration Examples for Ranger Permission Policy
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Impala
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark2x
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for Storm
Viewing Ranger Audit Information
Configuring Ranger Security Zone
Changing the Ranger Data Source to LDAP for a Normal Cluster
Viewing Ranger User Permission Synchronization Information
Ranger Log Overview
Common Issues About Ranger
Why Ranger Startup Fails During the Cluster Installation?
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
Why Can't I View the Created MRS User on the Ranger Management Page?
What Should I Do If MRS Users Failed to Be Synchronized to the Ranger Web UI?
Using Spark (for Versions Earlier Than MRS 3.x)
Getting Started with Spark
Getting Started with Spark SQL
Using the Spark Client
Accessing the Spark Web UI
Interconnecting Spark with OpenTSDB
Creating a Table and Associating It with OpenTSDB
Inserting Data to the OpenTSDB Table
Querying an OpenTSDB Table
Modifying the Default Configuration Data
Using Spark2x (for MRS 3.x or Later)
Spark User Permission Management
Spark SQL Permissions
Creating a Spark SQL Role
Configuring User Permissions for Spark Tables, Columns, and Databases
Configuring Permissions for Spark SQL Service User
Configuring Spark2x Web UI ACLs
Permission Parameters of the Spark Client and Server
Using the Spark Client
Configuring Spark to Read HBase Data
Configuring Spark Tasks Not to Obtain HBase Token Information
Spark Core Enterprise-Class Enhancements
Configuring Spark HA to Enhance HA
Configuring Multi-active Instance Mode
Configuring the Spark Multi-Tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Size of the Spark Event Queue
Configuring the Compression Format of a Parquet Table
Adapting to the Third-party JDK When Ranger Is Used
Using the Spark Small File Combination Tool
Configuring Streaming Reading of Spark Driver Execution Results
Spark SQL Enterprise-Class Enhancements
Configuring Vector-based ORC Data Reading
Filtering Partitions without Paths in Partitioned Tables
Configuring Dynamic Overwriting for Hive Table Partitions
Configuring Spark SQL to Enable the Adaptive Execution Feature
Configuring the Default Number of Data Blocks Divided by SparkSQL
Spark Streaming Enterprise-Class Enhancements
Configuring LIFO for Kafka
Configuring Reliability for Connected Kafka
Spark Core Performance Tuning
Spark Core Data Serialization
Spark Core Memory Tuning
Spark Core Memory Tuning
Configuring Spark Core Broadcasting Variables
Configuring Heap Memory Parameters for Spark Executor
Using the External Shuffle Service to Improve Spark Core Performance
Configuring Spark Dynamic Resource Scheduling in YARN Mode
Adjusting Spark Core Process Parameters
Spark DAG Design Specifications
Experience
Spark SQL Performance Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the Spark INSERT SELECT Statement
Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Performance Tuning
Spark O&M Management
Configuring Parameters Rapidly
Common Parameters
Spark2x Logs
Changing Spark Log Levels
Viewing Container Logs on the Web UI
Obtaining Container Logs of a Running Spark Application
Configuring Spark Event Log Rollback
Configuring the Number of Lost Executors Displayed in WebUI
Configuring Local Disk Cache for JobHistory
Enhancing Stability in a Limited Memory Condition
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Broaden Support for Hive Partition Pruning Predicate Pushdown
Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
Using CarbonData for First Query
Common Issues About Spark2x
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
"Failed to CREATE_FILE" Is Displayed When Data Is Inserted into the Dynamic Partitioned Table Again
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
NodeManager OOM Occurs During Spark Application Execution
Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
How Do I Rectify the Exception Occurred When I Perform an Operation on the Table Named table?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Does the --hivevar Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Functions Cannot Be Used When Different JDBCServers Are Connected?
Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
Spark Streaming
Same DAG Log Is Recorded Twice for a Streaming Task
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why Does the Spark Streaming Application Fail to Be Started from the Checkpoint When the Input Stream Has No Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Why Is not an Application Displayed When I Run the Application with the Empty Part File?
Why Does Spark2x Fail to Export a Table with the Same Field Name?
Why JRE fatal error after running Spark application multiple times?
Native Spark2x UI Fails to Be Accessed or Is Incorrectly Displayed when Internet Explorer Is Used for Access
How Does Spark2x Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
Spark Shuffle Exception Handling
Using Sqoop
Using Sqoop from Scratch
Adapting Sqoop 1.4.7 to MRS 3.x Clusters
Common Sqoop Commands and Parameters
Common Issues About Sqoop
What Should I Do If Class QueryProvider Is Unavailable?
What Should I Do If Method getHiveClient Does Not Exist?
How Do I Do If PostgreSQL or GaussDB Fails to Connect?
What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
What Should I Do If Data Failed to Be Synchronized Using hive-table?
What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
Using Storm
Using Storm from Scratch
Using the Storm Client
Submitting Storm Topologies on the Client
Accessing the Storm Web UI
Managing Storm Topologies
Querying Storm Topology Logs
Storm Common Parameters
Configuring a Storm Service User Password Policy
Migrating Storm Services to Flink
Overview
Completely Migrating Storm Services
Performing Embedded Service Migration
Migrating Services of External Security Components Interconnected with Storm
Storm Log Introduction
Performance Tuning
Storm Performance Tuning
Using Tez
Accessing the Tez Web UI to View the Task Execution Result
Common Tez Parameters
Log Overview
Common Issues
TezUI Cannot Display Tez Task Execution Details
Error Occurs When a User Switches to the Tez Web UI
Yarn Logs Cannot Be Viewed on the TezUI Page
Table Data Is Empty on the TezUI HiveQueries Page
Using YARN
YARN User Permission Management
Creating Yarn Roles
Submitting a Task Using the Yarn Client
Configuring Container Log Aggregation
Enabling Yarn CGroups to Limit the Container CPU Usage
Enterprise-Class Enhancement of YARN
Configuring the Yarn Permission Control
Specifying the User Who Runs Yarn Tasks
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring ApplicationMaster Work Preserving
Configuring the Access Channel Protocol
Configuring the Additional Scheduler WebUI
Configuring Resources for a NodeManager Role Instance
Configuring Yarn Restart
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
YARN O&M Management
YARN Common Configuration Parameters
Yarn Log Overview
Configuring the Localized Log Levels
Configuring Memory Usage Detection
Changing NodeManager Storage Directories
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Configuring the ZooKeeper Permissions
Common ZooKeeper Parameters
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Accessing Manager
Accessing MRS Manager (Versions Earlier Than MRS 3.x)
Accessing FusionInsight Manager (MRS 3.x or Later)
Using an MRS Client
Installing a Client (MRS 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
Best Practices
Data Analytics
Using Spark2x to Analyze IoV Drivers' Driving Behavior
Using Hive to Load HDFS Data and Analyze Book Scores
Using Hive to Load OBS Data and Analyze Enterprise Employee Information
Using Flink Jobs to Process OBS Data
Consuming Kafka Data Using Spark Streaming Jobs
Using Flume to Collect Log Files from a Specified Directory to HDFS
Kafka-based WordCount Data Flow Statistics Case
Data Migration
Data Migration Solution
Making Preparations
Exporting Metadata
Copying Data
Restoring Data
Information Collection Before Data Migrated to MRS
Preparing the Network Before Data Migration to MRS
Migrating Data from Hadoop to MRS
Migrating Data from HBase to MRS
Migrating Data from Hive to MRS
Using BulkLoad to Import Data to HBase in Batches
Migrating MySQL Data to MRS Hive with CDM
Migrating Data from MRS HDFS to OBS with CDM
Interconnection with Other Cloud Services
Using MRS Spark SQL to Access GaussDB(DWS)
Interconnecting Hive with CSS
Connecting to the OBS File System with an MRS Hive Table
Interconnection with Ecosystem Components
Using DBeaver to Access Phoenix
Using DBeaver to Access HetuEngine
Using Tableau to Access HetuEngine
Using Yonghong BI to Access HetuEngine
Interconnecting Hive with External Self-Built Relational Databases
Interconnecting Hive with External LDAP
Interconnecting MRS Kafka with Kafka Eagle
Using Jupyter Notebook to Connect to MRS Spark
MRS Cluster Management
Configuring Thresholds for Alarms
Submitting Spark Tasks to New Task Nodes
Configuring Auto Scaling for an MRS Cluster
Developer Guide
Developer Guide (LTS)
Introduction to MRS Application Development
Obtaining the MRS Application Development Sample Project
MRS Application Security Authentication Description
Preparing MRS Application Development User
Rapidly Develop MRS Component Applications
HBase Application Development
HDFS Application Development
Hive JDBC Application Development
Hive HCatalog Application Development
Kafka Application Development
Flink Application Development
ClickHouse Application Development
Spark Application Development
ClickHouse Development Guide (Security Mode)
ClickHouse Application Development Overview
ClickHouse Application Development Process
Preparing a ClickHouse Application Development Environment
Preparing a ClickHouse Application Development and Runtime Environment
Importing and Configuring ClickHouse Sample Projects
Developing a ClickHouse Application
ClickHouse Application Development Approach
Configuring ClickHouse Connection Properties
Establishing a Connection
Creating a ClickHouse Database
Creating a ClickHouse Table
Inserting ClickHouse Data
Querying ClickHouse Data
Deleting a ClickHouse Table
Commissioning a ClickHouse Application
Commissioning the ClickHouse Application in the Local Windows Environment
Commissioning the ClickHouse Application in a Linux Environment
ClickHouse Development Guide (Normal Mode)
ClickHouse Application Development Overview
ClickHouse Application Development Process
Preparing a ClickHouse Application Development Environment
Preparing the ClickHouse Development and Runtime Environment
Importing and Configuring ClickHouse Sample Projects
Developing a ClickHouse Application
ClickHouse Application Development Approach
Configuring ClickHouse Connection Properties
Establishing a ClickHouse Connection
Creating a ClickHouse Database
Creating a ClickHouse Table
Inserting ClickHouse Data
Querying ClickHouse Data
Deleting a ClickHouse Table
Commissioning a ClickHouse Application
Commissioning the ClickHouse Application in the Local Windows Environment
Commissioning the ClickHouse Application in a Linux Environment
Flink Development Guide (Security Mode)
Flink Application Development Overview
Flink Application Development Process
Environment Preparation
Preparing for Development and Operating Environment
Installing the Client and Preparing for Security Authentication
Configuring and Importing a Sample Project
Configuring a Spring Boot Sample Project
Developing a Flink Application
Flink DataStream Sample Program
Flink DataStream Sample Program Development Roadmap
Flink DataStream Sample Program (Java)
Flink DataStream Sample Program (Scala)
Flink Kafka Sample Program
Flink Kafka Sample Application Development Roadmap
Flink Kafka Sample Application (Java)
Flink Kafka Sample Application (Scala)
Sample Program for Starting Checkpoint on Flink
Flink Checkpoint Sample Program Development Roadmap
Sample Program for Starting Checkpoint on Flink (Java)
Sample Program for Starting Checkpoint on Flink (Scala)
Flink Job Pipeline Sample Program
Flink Job Pipeline Sample Program Development Roadmap
Flink Job Pipeline Sample Program (Java)
Flink Job Pipeline Sample Program (Scala)
Flink Join Sample Program
Flink Join Sample Program Development Roadmap
Flink Join Sample Program (Java)
Flink Join Sample Program (Scala)
Flink Jar Job Submission SQL Sample Program
Flink Jar Job Submission SQL Sample Program Development Roadmap
Flink Jar Job Submission SQL Sample Program (Java)
FlinkServer REST API Sample Program
FlinkServer REST API Sample Program Development Roadmap
FlinkServer REST API Sample Program (Java)
Using a Proxy User to Access the FlinkServer REST API Sample Program (Java)
Flink Sample Program for Reading HBase Tables
Flink HBase Sample Program Development Roadmap
Flink HBase Sample Program (Java)
Sample Program for Reading Hudi Tables on Flink
Flink Hudi Sample Program Development Roadmap
Flink Hudi Sample Program (Java)
PyFlink Sample Program
PyFlink Sample Program Development Roadmap
PyFlink Sample Program Code Description
Using Python to Submit a Common Flink Job
Using Python to Submit a Flink SQL Job
Commissioning the Flink Application
Compiling and Commissioning the Flink Application
Viewing the Flink Application Commissioning Result
Commissioning the Flink SpringBoot Sample Program
FAQs in Flink Application Development
Common Flink APIs
Flink Java APIs
Flink Scala APIs
Flink REST APIs
Flink Savepoints CLI
Flink Client CLI
What If the Chrome Browser Cannot Display the Title
What If the Page Is Displayed Abnormally on Internet Explorer 10/11
What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
What If yarn-session Start Fails When blob.storage.directory Is Set to /home
Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
Why Cannot I Access the Apache Flink Dashboard?
How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
Incorrect GLIBC Version
Flink Development Guide (Normal Mode)
Overview
Application Development
Basic Concepts
Development Process
Environment Preparation
Preparing for Development and Operating Environment
Configuring and Importing a Sample Project
Creating a Project (Optional)
Configuring a Spring Boot Sample Project
Developing an Application
DataStream Application
Scenarios
Java Sample Code
Scala Sample Code
Interconnecting with Kafka
Scenarios
Java Sample Code
Scala Sample Code
Asynchronous Checkpoint Mechanism
Scenarios
Java Sample Code
Scala Sample Code
Job Pipeline Program
Scenario
Java Sample Code
Scala Sample Code
Stream SQL Join Program
Scenario
Java Sample Code
Flink Join Sample Program (Scala)
Flink Jar Job Submission SQL Sample Program
Flink Jar Job Submission SQL Sample Program Development Roadmap
Flink Jar Job Submission SQL Sample Program (Java)
FlinkServer REST API Sample Program
Using a Proxy User to Access the FlinkServer REST API Sample Program (Java)
Flink Reading Data from and Writing Data to HBase
Scenario Description
Java Sample Code
Sample Program for Reading Hudi Tables on Flink
Flink Hudi Sample Program Development Roadmap
Flink Hudi Sample Program (Java)
PyFlink Sample Program
Submitting a Regular Job Using Python
PyFlink Sample Program Development Roadmap
PyFlink Sample Program Code Description
Using Python to Submit a Common Flink Job
Submitting a SQL Job Using Python
Description
Python Sample Code
Using Python to Submit a Flink SQL Job
Debugging the Application
Compiling and Running the Application
Viewing the Debugging Result
Commissioning the Flink SpringBoot Sample Program
More Information
Introduction to Common APIs
Java
Scala
Overview of RESTful APIs
Overview of Savepoints CLI
Introduction to Flink Client CLI
FAQ
Savepoints-related Problems
What If the Chrome Browser Cannot Display the Title
What If the Page Is Displayed Abnormally on Internet Explorer 10/11
What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
What If yarn-session Start Fails When blob.storage.directory Is Set to /home
Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
Why Cannot I Access the Apache Flink Dashboard?
How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
Incorrect GLIBC Version
HBase Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Preparing for Development and Operating Environment
Configuring and Importing Sample Projects
Preparing for Security Authentication
Preparing Authentication Mechanism Code
Multi-Instance Authentication in Mutual Trust Scenarios
Authentication for accessing the HBase REST Service
Authentication for Accessing the ThriftServer Service
Authentication for Accessing Multiple ZooKeepers
Developing an Application
HBase data read/write sample program
Typical Scenario Description
Development Idea
Creating Configuration
Creating Connection
Creating a Table
Deleting a Table
Inserting Data
Deleting Data
Modifying a Table
Reading Data Using Get
Reading Data Using Scan
Filtering Data
Creating a Secondary Index
Deleting an Index
Secondary Index-based Query
Multi-Point Region Division
Creating a Phoenix Table
Writing Data to the PhoenixTable
Reading the PhoenixTable
Using HBase Dual-Read
Configuring Log4j Log Output
HBase Rest API Invoking Sample Program
Querying Cluster Information Using REST
Obtaining All Tables Using REST
Operate Namespaces Using REST
Operate Tables Using REST
Accessing the HBase ThriftServer Sample Program
Accessing the ThriftServer Operation Table
Accessing ThriftServer to Write Data
Accessing ThriftServer to Read Data
Sample Program for HBase to Access Multiple ZooKeepers
Accessing Multiple ZooKeepers
Application Commissioning
Commissioning an Application in Windows
Compiling and Running an Application
Viewing Windows Commissioning Results
Commissioning an Application in Linux
Compiling and Running an Application When a Client Is Installed
Compiling and Running an Application When No Client Is Installed
Viewing Linux Commissioning Results
More Information
SQL Query
HBase Dual-Read Configuration Items
External Interfaces
Shell
Java API
Sqlline
JDBC APIs
WebUI
Phoenix Command Line
FAQs
How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
What Are the Application Scenarios of the Bulkload and put Data-loading Modes?
An Error Occurred When Building a JAR Package
HBase Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Preparing for Development and Operating Environment
Configuring and Importing Sample Projects
Developing an Application
HBase Data Read/Write Sample Program
Typical Scenario Description
Development Idea
Creating Configuration
Creating Connection
Creating a Table
Deleting a Table
Modifying a Table
Inserting Data
Deleting Data
Reading Data Using Get
Reading Data Using Scan
Filtering Data
Creating a Secondary Index
Deleting an Index
Secondary Index-based Query
Multi-Point Region Division
Creating a Phoenix Table
Writing Data to the PhoenixTable
Reading the PhoenixTable
Using HBase Dual-Read
Configuring Log4j Log Output
HBase Rest API Invoking Sample Program
Querying Cluster Information Using REST
Obtaining All Tables Using REST
Operate Namespaces Using REST
Operate Tables Using REST
Accessing the HBase ThriftServer Sample Program
Accessing the ThriftServer Operation Table
Accessing ThriftServer to Write Data
Accessing ThriftServer to Read Data
Sample Program for HBase to Access Multiple ZooKeepers
Accessing Multiple ZooKeepers
Application Commissioning
Commissioning an Application in Windows
Compiling and Running an Application
Viewing Windows Commissioning Results
Commissioning an Application in Linux
Compiling and Running an Application When a Client Is Installed
Compiling and Running an Application When No Client Is Installed
Viewing Linux Commissioning Results
More Information
SQL Query
HBase Dual-Read Configuration Items
External Interfaces
Shell
Java APIs
Sqlline
JDBC APIs
WebUI
Phoenix Command Line
FAQs
How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
What Are the Application Scenarios of the bulkload and put Data-loading Modes?
An Error Occurred When Building a JAR Package
HDFS Development Guide (Security Mode)
Introduction to HDFS
Development Process
HDFS Sample Project
Environment Preparation
Preparing Development and Operating Environment
Configuring and Importing Sample Projects
Preparing the Authentication Mechanism
Developing the Project
Development Idea
Initializing the HDFS
Creating Directories
Writing Data into a File
Appending Data to a File
Reading Data from a File
Deleting a File
Deleting Directories
Multi-Thread Tasks
Setting Storage Policies
Colocation
Commissioning the Application
Commissioning an Application in the Windows Environment
Commissioning an Application in the Linux Environment
More Information
HDFS Common API Introduction
Java API Introduction
C API Introduction
HTTP REST API Introduction
HDFS Shell Command Introduce
Access HDFS of the Cluster in Security Mode on Windows Using EIPs
HDFS Development Guide (Normal Mode)
Overview
Introduction to HDFS
Basic Concepts
Development Process
Environment Preparation
Development and Operating Environment
Configuring and Importing Sample Projects
Developing the Project
Scenario
Development Idea
Declare the Example Codes
Initializing the HDFS
Creating Directories
Writing Data into a File
Appending Data to a File
Reading Data from a File
Deleting a File
Deleting Directories
Multi-Thread Tasks
Setting Storage Policies
Colocation
Commissioning the Application
Commissioning an Application in the Windows Environment
Compiling and Running an Application
Checking the Commissioning Result
Commissioning an Application in the Linux Environment
Compiling and Running an Application with the Client Installed
Compiling and Running an Application with the Client Not Installed
Checking the Commissioning Result
More Information
Common API Introduction
Java API
C API
HTTP REST API
Shell Command Introduce
HDFS Access Configuration on Windows Using EIPs
HetuEngine Development Guide (Security Mode)
Overview
Introduction to HetuEngine
Concepts
Connection Modes
Development Process
Environment Preparation
Preparing Development and Running Environments
Configuring and Importing a Sample Project
Configuring the Python3 Sample Project
Preparing for Security Authentication
KeyTab File Authentication Using HSFabric
Username and Password Authentication Using HSFabric
Username and Password Authentication Using HSBroker
Application Development
Typical Application Scenario
Java Sample Code
KeyTab File Authentication Using HSFabric
Username and Password Authentication Using HSFabric
Querying the Execution Progress and Status of an SQL Statement Using JDBC
Username and Password Authentication Using HSBroker
Python3 Sample Code
Username and Password Authentication Using HSBroker
Username and Password Authentication Using HSFabric
KeyTab File Authentication Using HSFabric
Application Commissioning
Commissioning Applications on Windows
Commissioning Applications on Linux
Commissioning the Python3 Sample Project
HetuEngine Development Guide (Normal Mode)
Introduction to HetuEngine
Development Process
Preparing Environment
Preparing Development and Running Environments
Configuring and Importing a Sample Project
Configuring the Python3 Sample Project
Application Development
Typical Application Scenario
Java Sample Code
Accessing Hive Data Sources Using HSFabric
Accessing Hive Data Sources Using HSBroker
Querying the Execution Progress and Status of an SQL Statement Using JDBC
Python3 Sample Code
Accessing Hive Data Sources Using HSBroker
Accessing Hive Data Sources Using HSFabric
Application Commissioning
Commissioning Applications on Windows
Commissioning Applications on Linux
Commissioning the Python3 Sample Project
Hive Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Required Permissions
Development Process
Preparing the Environment
Preparing Development and Operating Environment
Configuring the JDBC Sample Project
Configuring the Hcatalog Sample Project
Configuring the Python Sample Project
Configuring the Python3 Sample Project
Developing an Application
Typical Scenario Description
Example Codes
Creating a Table
Loading Data
Querying Data
UDF
Example Program Guide
Accessing Multiple ZooKeepers
Commissioning Applications
Running JDBC and Viewing Results
Running HCatalog and Viewing Results
Running Python and Viewing Results
Running Python3 and Viewing Results
More Information
Interface Reference
JDBC
Hive SQL
WebHCat
Hive of the Cluster in Security Mode Access Configuration on Windows Using EIPs
FAQ
A Message Is Displayed Stating "Unable to read HiveServer2 configs from ZooKeeper" During the Use of the Secondary Development Program
Problem performing GSS wrap Message Is Displayed Due to IBM JDK Exceptions
Hive SQL Is Incompatible with SQL2003 Standards
Hive Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Preparing the Environment
Preparing Development and Operating Environment
Configuring the JDBC Sample Project
Configuring the Hcatalog Sample Project
Configuring the Python Sample Project
Configuring the Python3 Sample Project
Developing an Application
Typical Scenario Description
Example Codes
Creating a Table
Loading Data
Querying Data
UDF
Example Program Guide
Accessing Multiple ZooKeepers
Commissioning Applications
Running JDBC and Viewing Results
Running HCatalog and Viewing Results
Running Python and Viewing Results
Running Python3 and Viewing Results
More Information
Interface Reference
JDBC
Hive SQL
WebHCat
Hive of the Cluster in Normal Mode Access Configuration on Windows Using EIPs
FAQ
Problem performing GSS wrap Message Is Displayed Due to IBM JDK Exceptions
IoTDB Development Guide (Security Mode)
Overview
Application Development Overview
Basic Concepts
Development Process
IoTDB Sample Project
Environment Preparations
Preparing the Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing a Sample Projects
Application Development
IoTDB JDBC
Java Example Code
Using the keytab File for JDBC Authentication
IoTDB Session
Java Example Code
Using the Keytab File for Session Authentication
IoTDB Flink
FlinkIoTDBSink
FlinkIoTDBSource
IoTDB Kafka
Java Example Code
IoTDB UDF Program
IoTDB UDF Sample Code
Application Commissioning
Commissioning Applications on Windows
Compiling and Running Applications
Viewing Commissioning Results
Commissioning JDBC and Session Applications on Linux
Compiling and Running Applications
Viewing Commissioning Results
Commissioning Flink Applications on Flink Web UI and Linux
Compiling and Running Applications
Viewing Commissioning Results
Commissioning Kafka Applications on Linux
Compiling and Running Applications
Viewing Commissioning Results
Using a UDF
Registering a UDF
Querying a UDF
Deregistering a UDF
More Information
Common APIs
Java API
IoTDB Development Guide (Normal Mode)
Overview
Application Development Overview
Basic Concepts
Development Process
IoTDB Sample Project
Environment Preparations
Preparing the Development and Running Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing a Sample Project
Application Development
IoTDB JDBC
Java Example Code
IoTDB Session
Java Example Code
IoTDB Flink
FlinkIoTDBSink
FlinkIoTDBSource
IoTDB Kafka
Java Sample Code
IoTDB UDF Program
IoTDB UDF Sample Code
Application Commissioning
Commissioning Applications on Windows
Compiling and Running Applications
Viewing Commissioning Results
Commissioning JDBC and Session Applications on Linux
Compiling and Running Applications
Viewing Commissioning Results
Commissioning Flink Applications on Flink Web UI and Linux
Compiling and Running Applications
Viewing Commissioning Results
Commissioning Kafka Applications on Linux
Compiling and Running Applications
Viewing Commissioning Results
Using a UDF
Registering a UDF
Querying a UDF
Deregistering a UDF
More Information
Common APIs
Java API
Kafka Development Guide (Security Mode)
Overview
Development Environment Preparation
Common Concepts
Development Process
Kafka Sample Project
Environment Preparation
Preparing for Development and Operating Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing a Sample Project
Preparing for Security Authentication
SASL Kerberos Authentication
SASL/PLAINTEXT Authentication
Kafka Token Authentication
Developing an Application
Typical Scenario Description
Typical Scenario Sample Code Description
Producer API Sample
Consumer API Sample
Multi-thread Producer Sample
Multi-thread Consumer Sample
KafkaStreams Sample
Application Commissioning
Producer Sample Commissioning
Consumer Sample Commissioning
High Level Streams Sample Commissioning
Low level Streams API Sample Usage Guide
Sample Code Running Guide for the Kafka Token Authentication Mechanism
More Information
External Interfaces
Shell
Java API
SSL Encryption Function Used by a Client
Kafka Access Configuration on Windows Using EIPs
FAQ
Topic Authentication Fails During Sample Running and "example-metric1=TOPIC_AUTHORIZATION_FAILED" Is Displayed
Running the Producer.java Sample to Obtain Metadata Fails and "ERROR fetching topic metadata for topics..." Is Displayed, Even the Access Permission for the Related Topic
Kafka Development Guide (Normal Mode)
Overview
Development Environment Preparation
Common Concepts
Development Process
Kafka Sample Project
Environment Preparation
Preparing for Development Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Developing an Application
Typical Scenario Description
Example Code Description
Producer API Usage Sample
Consumer API Usage Sample
Multi-thread Producer Sample
Multi-thread Consumer Sample
KafkaStreams Sample
Application Commissioning
Producer Sample Commissioning
Consumer Sample Commissioning
High Level Streams Sample Commissioning
Low Level Streams Sample Commissioning
More Information
External Interfaces
Shell
Java API
Kafka Access Configuration on Windows Using EIPs
FAQ
Running the Producer.java Sample to Obtain Metadata Fails and "ERROR fetching topic metadata for topics..." Is Displayed, Even the Access Permission for the Related Topic
MapReduce Development Guide (Security Mode)
Overview
MapReduce Overview
Basic Concepts
Development Process
Environment Preparation
Preparing for Development and Operating Environment
Configuring and Importing Sample Projects
Creating a New Project (Optional)
Preparing the Authentication Mechanism
Developing the Project
MapReduce Statistics Sample Project
Typical Scenarios
Example Code
MapReduce Accessing Multi-Component Example Project
Instance
Example Code
Commissioning the Application
Commissioning the Application in the Windows Environment
Compiling and Running the Application
Checking the Commissioning Result
Commissioning an Application in the Linux Environment
Compiling and Running the Application
Checking the Commissioning Result
More Information
Common APIs
Java API
REST API
FAQ
No Response from the Client When Submitting the MapReduce Application
When an Application Is Run, An Abnormality Occurs Due to Network Faults
How to Perform Remote Debugging During MapReduce Secondary Development?
MapReduce Development Guide (Normal Mode)
Overview
MapReduce Overview
Basic Concepts
Development Process
Environment Preparation
Preparing Development and Operating Environment
Configuring and Importing Sample Projects
Creating a New Project (Optional)
Developing the Project
MapReduce Statistics Sample Project
Typical Scenarios
Example Codes
MapReduce Accessing Multi-Component Example Project
Instance
Example Code
Commissioning the Application
Commissioning the Application in the Windows Environment
Compiling and Running the Application
Checking the Commissioning Result
Commissioning the Application in the Linux Environment
Compiling and Running the Application
Checking the Commissioning Result
More Information
Common APIs
Java API
REST API
FAQ
No Response from the Client When Submitting the MapReduce Application
How to Perform Remote Debugging During MapReduce Secondary Development?
Oozie Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Preparing Development and Operating Environment
Downloading and Importing Sample Projects
Preparing Authentication Mechanism Code
Developing the Project
Development of Configuration Files
Description
Development Procedure
Example Codes
job.properties
workflow.xml
Start Action
End Action
Kill Action
FS Action
MapReduce Action
coordinator.xml
Development of Java
Description
Sample Code
Scheduling Spark2x to Access HBase and Hive Using Oozie
Commissioning the Application
Commissioning an Application in the Windows Environment
Compiling and Running Applications
Checking the Commissioning Result
More Information
Common API Introduce
Shell
Java
REST
Oozie Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Development and Operating Environment
Downloading and Importing Sample Projects
Developing the Project
Development of Configuration Files
Description
Development Procedure
Example Codes
job.properties
workflow.xml
Start Action
End Action
Kill Action
FS Action
MapReduce Action
coordinator.xml
Development of Java
Description
Sample Code
Scheduling Spark2x to Access HBase and Hive Using Oozie
Commissioning the Application
Commissioning an Application in the Windows Environment
Compiling and Running Applications
Checking the Commissioning Result
More Information
Common API Introduce
Shell
Java
REST
Spark2x Development Guide (Security Mode)
Spark Application Development Overview
Spark Application Development Process
Preparing a Spark Application Development Environment
Preparing a Local Application Development Environment
Configuring Security Authentication for Spark Applications
Importing and Configuring Spark Sample Projects
(Optional) Creating Spark Sample Projects
Configuring the Spark Python3 Sample Project
Developing Spark Applications
Spark Core Sample Projects
Development Plan
Spark Core Sample Projects (Java)
Spark Core Sample Projects (Scala)
Spark Core Sample Projects (Python)
Spark SQL Sample Projects
Development Plan
Spark SQL Sample Projects (Java)
Spark SQL Sample Projects (Scala)
Spark SQL Sample Projects (Python)
Sample Projects for Accessing Spark SQL Through JDBC
Development Plan
Accessing Spark SQL Sample Projects Through JDBC (Java)
Accessing Spark SQL Sample Projects Through JDBC (Scala)
Sample Projects for Spark to Read HBase Tables
Operating Data in Avro Format
Performing Operations on the HBase Data Source
Using the BulkPut API
Using the BulkGet API
Using the BulkDelete API
Using the BulkLoad API
Using the foreachPartition API
Distributedly Scanning HBase Tables
Using the mapPartition API
Writing Data to HBase Tables In Batches Using Spark Streaming
Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
Development Plan
Implementing Bidirectional Data Exchange with HBase (Java)
Implementing Bidirectional Data Exchange with HBase (Scala)
Implementing Bidirectional Data Exchange with HBase (Python)
Sample Projects for Spark to Implement Data Transition Between Hive and HBase
Development Plan
Implementing Data Transition Between Hive and HBase (Java)
Implementing Data Transition Between Hive and HBase (Scala)
Implementing Data Transition Between Hive and HBase (Python)
Sample Projects for Connecting Spark Streaming to Kafka0-10
Development Plan
Connecting Spark Streaming to Kafka0-10 (Java)
Connecting Spark Streaming to Kafka0-10 (Scala)
Spark Structured Streaming Sample Projects
Development Plan
Spark Structured Streaming Sample Project (Java)
Spark Structured Streaming Sample Project (Scala)
Spark Structured Streaming Sample Project (Python)
Sample Project for Interconnecting Spark Structured Streaming with Kafka
Development Plan
Interconnecting Spark Structured Streaming with Kafka (Scala)
Sample Project for Spark Structured Streaming Status Operations
Development Plan
Sample Project for Spark Structured Streaming Status Operations (Scala)
Sample Project for Spark Concurrent Access to Two HBase Sample Projects
Development Plan
Spark Concurrent Access to Two HBase Sample Projects (Scala)
Sample Project for Spark to Synchronize HBase Sata to CarbonData
Development Plan
Synchronizing HBase Data from Spark to CarbonData (Java)
Using Spark to Execute the Hudi Sample Project
Development Plan
Using Spark to Execute the Hudi Sample Project (Java)
Using Spark to Execute the Hudi Sample Project (Scala)
Using Spark to Execute the Hudi Sample Project (Python)
Sample Project for Customizing Configuration Items in Hudi
HoodieDeltaStreamer
User-defined Partitioner
Commissioning a Spark Application
Spark Access Configuration on Windows Using EIPs
Commissioning a Spark Application in a Local Windows Environment
Commissioning a Spark Application in a Linux Environment
FAQs About Spark Application Development
Common Spark APIs
Spark Java APIs
Spark Scala APIs
Spark Python APIs
Spark REST APIs
Spark Client CLI
Spark JDBCServer APIs
Structured Streaming Functions and Reliability
How to Add a User-Defined Library
How to Automatically Load Jars Packages?
Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
Privilege Control Mechanism of SparkSQL UDF Feature
Why Does Kafka Fail to Receive the Data Written Back by SLog in to the node where the client is installed as the client installation user.park Streaming?
Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
How Do I Perform Remote Debugging Using IDEA?
How Do I Submit the Spark Application Using Java Commands?
A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?
Restrictions on Restoring the Spark Application from the Checkpoint
Support for Third-party JAR Packages on x86 and TaiShan Platforms
What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
What Should I Do If the Method of Submitting Structured Streaming Tasks Is Changed?
Common JAR File Conflicts
Spark2x Development Guide (Common Mode)
Spark Application Development Overview
Spark Application Development Process
Preparing a Spark Application Development Environment
Preparing a Local Application Development Environment
Importing and Configuring Spark Sample Projects
(Optional) Creating Spark Sample Projects
Configuring the Spark Python3 Sample Project
Developing Spark Applications
Spark Core Sample Projects
Development Plan
Spark Core Sample Projects (Java)
Spark Core Sample Projects (Scala)
Spark Core Sample Projects (Python)
Spark SQL Sample Projects
Development Plan
Spark SQL Sample Projects (Java)
Spark SQL Sample Projects (Scala)
Spark SQL Sample Projects (Python)
Sample Projects for Accessing Spark SQL Through JDBC
Development Plan
Accessing Spark SQL Sample Projects Through JDBC (Java)
Accessing Spark SQL Sample Projects Through JDBC (Scala)
Sample Projects for Spark to Read HBase Tables
Operating Data in Avro Format
Performing Operations on the HBase Data Source
Using the BulkPut API
Using the BulkGet API
Using the BulkDelete API
Using the BulkLoad API
Using the foreachPartition API
Distributedly Scanning HBase Tables
Using the mapPartition API
Writing Data to HBase Tables In Batches Using Spark Streaming
Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
Development Plan
Implementing Bidirectional Data Exchange with HBase (Java)
Implementing Bidirectional Data Exchange with HBase (Scala)
Implementing Bidirectional Data Exchange with HBase (Python)
Sample Projects for Spark to Implement Data Transition Between Hive and HBase
Development Plan
Implementing Data Transition Between Hive and HBase (Java)
Implementing Data Transition Between Hive and HBase (Scala)
Implementing Data Transition Between Hive and HBase (Python)
Sample Projects for Connecting Spark Streaming to Kafka0-10
Development Plan
Connecting Spark Streaming to Kafka0-10 (Java)
Connecting Spark Streaming to Kafka0-10 (Scala)
Spark Structured Streaming Sample Projects
Development Plan
Spark Structured Streaming Sample Project (Java)
Spark Structured Streaming Sample Project (Scala)
Spark Structured Streaming Sample Project (Python)
Sample Project for Interconnecting Spark Structured Streaming with Kafka
Development Plan
Interconnecting Spark Structured Streaming with Kafka (Scala)
Sample Project for Spark Structured Streaming Status Operations
Development Plan
Sample Project for Spark Structured Streaming Status Operations (Scala)
Sample Project for Spark to Synchronize HBase Sata to CarbonData
Development Plan
Synchronizing HBase Data from Spark to CarbonData
Using Spark to Execute the Hudi Sample Project
Development Plan
Using Spark to Execute the Hudi Sample Project (Scala)
Using Spark to Execute the Hudi Sample Project (Python)
Using Spark to Execute the Hudi Sample Project (Java)
Sample Project for Customizing Configuration Items in Hudi
HoodieDeltaStreamer
User-defined Partitioner
Commissioning a Spark Application
Spark Access Configuration on Windows Using EIPs
Commissioning a Spark Application in a Local Windows Environment
Commissioning a Spark Application in a Linux Environment
FAQs About Spark Application Development
Common Spark APIs
Spark Java APIs
Spark Scala APIs
Spark Python APIs
Spark REST APIs
Spark Client CLI
Spark JDBCServer APIs
Structured Streaming Functions and Reliability
How to Add a User-Defined Library
How to Automatically Load Jars Packages?
Why the "Class Does not Exist" Error Is Reported While the SparkStresmingKafka Project Is Running?
Why Does Kafka Fail to Receive the Data Written Back by Spark Streaming?
Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
How Do I Perform Remote Debugging Using IDEA?
How Do I Submit the Spark Application Using Java Commands?
A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?
Restrictions on Restoring the Spark Application from the Checkpoint
Support for Third-party JAR Packages on x86 and TaiShan Platforms
What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
What Should I Do If the Method of Submitting Structured Streaming Tasks Is Changed?
Common JAR File Conflicts
YARN Development Guide (Security Mode)
Overview
Interfaces
Command
Java API
REST API
REST APIs of Superior Scheduler
YARN Development Guide (Normal Mode)
Overview
Interfaces
Command
Java API
REST API
REST APIs of Superior Scheduler
Manager Management Development Guide
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Preparing Development and Running Environments
Configuring and Importing a Sample Project
Developing an Application
Typical Scenario Description
Development Guideline
Example Code Description
Login Authentication
Adding Users
Searching for Users
Modifying Users
Deleting Users
Exporting a User List
Application Commissioning
Commissioning an Application in the Windows OS
Compiling and Running an Application
Viewing Windows Commissioning Results
More Information
External Interfaces
Java API
FAQ
JDK1.6 Fails to Connect to the FusionInsight System Using JDK1.8
An Operation Fails and "authorize failed" Is Displayed in Logs
An Operation Fails and "log4j:WARN No appenders could be found for logger(basicAuth.Main)" Is Displayed in Logs
An Operation Fails and "illegal character in path at index 57" Is Displayed in Logs
Run the curl Command to Access REST APIs
Using Open-source JAR File Conflict Lists
HBase
HDFS
Kafka
Spark2x
Mapping Between Maven Repository JAR Versions and MRS Component Versions
Developer Guide (Normal_3.x)
Introduction to MRS Application Development
Obtaining the MRS Application Development Sample Project
Sample Projects of MRS Components
Using Open-source JAR File Conflict Lists
HBase
HDFS
Kafka
Spark2x
Mapping Between Maven Repository JAR Versions and MRS Component Versions
Security Authentication
Security Authentication Principles and Mechanisms
Preparing a Developer Account
Handling an Authentication Failure
ClickHouse Development Guide (Security Mode)
ClickHouse Application Development Overview
Introduction to ClickHouse
Common Concepts of ClickHouse Application Development
ClickHouse Application Development Process
ClickHouse Sample Project
Preparing a ClickHouse Application Development Environment
Preparing a ClickHouse Application Development Environment
Preparing a ClickHouse Application Running Environment
Importing and Configuring ClickHouse Sample Projects
Developing a ClickHouse Application
ClickHouse Application Development Approach
Configuring ClickHouse Connection Properties
Setting Up a ClickHouse Connection
Creating a ClickHouse Database
Creating a ClickHouse Table
Inserting ClickHouse Data
Querying ClickHouse Data
Deleting a ClickHouse Table
Commissioning a ClickHouse Application
Commissioning a ClickHouse Application in a Local Windows Environment
Commissioning a ClickHouse Application in a Linux Environment
ClickHouse Development Guide (Normal Mode)
ClickHouse Application Development Overview
Introduction to ClickHouse
Common Concepts of ClickHouse Application Development
ClickHouse Application Development Process
ClickHouse Sample Project
Preparing a ClickHouse Application Development Environment
Preparing a ClickHouse Application Development Environment
Preparing a ClickHouse Application Running Environment
Importing and Configuring ClickHouse Sample Projects
Developing a ClickHouse Application
ClickHouse Application Development Approach
Configuring ClickHouse Connection Properties
Setting Up a ClickHouse Connection
Creating a ClickHouse Database
Creating a ClickHouse Table
Inserting ClickHouse Data
Querying ClickHouse Data
Deleting a ClickHouse Table
Commissioning a ClickHouse Application
Commissioning a ClickHouse Application in a Local Windows Environment
Commissioning a ClickHouse Application in a Linux Environment
Flink Development Guide (Security Mode)
Overview
Application Development
Basic Concepts
Development Process
Flink Sample Project
Environment Preparation
Preparing the Development Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Creating a Project (Optional)
Preparing for Security Authentication
Developing an Application
DataStream Application
Scenarios
Java Sample Code
Scala Sample Code
Interconnecting with Kafka
Scenario
Java Sample Code
Scala Sample Code
Asynchronous Checkpoint Mechanism
Scenarios
Java Sample Code
Scala Sample Code
Job Pipeline Program
Scenario
Java Sample Code
Scala Sample Code
Stream SQL Join Program
Scenario
Java Sample Code
Debugging the Application
Compiling and Running the Application
Viewing the Debugging Result
More Information
Introduction to Common APIs
Java
Scala
Overview of RESTful APIs
Overview of Savepoints CLI
Introduction to Flink Client CLI
FAQ
Savepoints-related Problems
What If the Chrome Browser Cannot Display the Title
What If the Page Is Displayed Abnormally on Internet Explorer 10/11
What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
What If yarn-session Start Fails When blob.storage.directory Is Set to /home
Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
Why Cannot I Access the Apache Flink Dashboard?
How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
Incorrect GLIBC Version
Flink Development Guide (Normal Mode)
Overview
Application Development
Basic Concepts
Development Process
Flink Sample Project
Environment Preparation
Preparing the Development and Operating Environment
Configuring and Importing Sample Projects
Creating a Project (Optional)
Developing an Application
DataStream Application
Scenarios
Java Sample Code
Scala Sample Code
Interconnecting with Kafka
Scenarios
Java Sample Code
Scala Sample Code
Asynchronous Checkpoint Mechanism
Scenarios
Java Sample Code
Scala Sample Code
Job Pipeline Program
Scenario
Java Sample Code
Scala Sample Code
Stream SQL Join Program
Scenario
Java Sample Code
Interconnecting Flink with Cloud Search Service
Scenario Description
Java Sample Code
Debugging the Application
Compiling and Running the Application
Viewing the Debugging Result
More Information
Introduction to Common APIs
Java
Scala
Overview of RESTful APIs
Overview of Savepoints CLI
Introduction to Flink Client CLI
FAQ
Savepoints-related Problems
What If the Chrome Browser Cannot Display the Title
What If the Page Is Displayed Abnormally on Internet Explorer 10/11
What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
What If yarn-session Start Fails When blob.storage.directory Is Set to /home
Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
Why Cannot I Access the Apache Flink Dashboard?
How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
Incorrect GLIBC Version
HBase Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Development Process
HBase Sample Project
Environment Preparation
Preparing the Development Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing a Sample Project
Preparing for Security Authentication
Security Authentication for HBase Data Read and Write (Single-Cluster Scenario)
HBase Service Data Read/Write Example Security Authentication (Multi-Cluster Mutual Trust Scenario)
Accessing HBase REST Service Security Authentication
Authentication for Accessing the ThriftServer Service
Authentication for Accessing Multiple ZooKeepers
Developing an Application
Reading/Writing Data
Typical Scenario Description
Development Idea
Creating Configuration
Creating Connection
Creating a Table
Deleting a Table
Modifying a Table
Inserting Data
Deleting Data
Reading Data Using Get
Reading Data Using Scan
Filtering Data
Creating a Secondary Index
Deleting an Index
Secondary Index-based Query
Multi-Point Region Division
Creating a Phoenix Table
Writing Data to the PhoenixTable
Reading the PhoenixTable
Using HBase Dual-Read Capability
Configuring Log4j Log Output
Calling REST Interfaces
Querying Cluster Information Using REST
Obtaining All Tables Using REST
Operate Namespaces Using REST
Operate Tables Using REST
Accessing HBase ThriftServer
Accessing the ThriftServer Operation Table
Accessing ThriftServer to Write Data
Accessing ThriftServer to Read Data
Accessing Multiple ZooKeepers with HBase
Accessing Multiple ZooKeepers
Application Commissioning
Commissioning an Application in Windows
Compiling and Running an Application
Viewing Windows Commissioning Results
Commissioning an Application in Linux
Compiling and Running an Application When a Client Is Installed
Compiling and Running an Application When No Client Is Installed
Viewing Linux Commissioning Results
More Information
SQL Query
HBase Dual-Read Configuration Items
External Interfaces
Shell
Java API
SQLLine
JDBC APIs
WebUI
HBase Access Configuration on Windows Using EIPs
Phoenix Command Line
FAQs
How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
What Are the Application Scenarios of the Bulkload and put Data-loading Modes?
An Error Occurred When Building a JAR Package
HBase Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
HBase Sample Project
Environment Preparation
Preparing for Development Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Developing an Application
Reading/Writing Data
Service Scenario Description
Application Development Approach
Creating Configuration
Creating Connection
Creating a Table
Deleting a Table
Modifying a Table
Inserting Data
Deleting Data
Reading Data Using Get
Reading Data Using Scan
Filtering Data
Creating a Secondary Index
Deleting an Index
Secondary Index-based Query
Multi-Point Region Division
Creating a Phoenix Table
Writing Data to the PhoenixTable
Reading the PhoenixTable
Using HBase Dual-Read Capability
Configuring Log4j Log Output
Calling REST Interfaces
Querying Cluster Information Using REST
Obtaining All Tables Using REST
Operate Namespaces Using REST
Operate Tables Using REST
Accessing HBase ThriftServer
Accessing the ThriftServer Operation Table
Accessing ThriftServer to Write Data
Accessing ThriftServer to Read Data
Accessing Multiple ZooKeepers with HBase
Accessing Multiple ZooKeepers
Application Commissioning
Commissioning an Application in Windows
Compiling and Running an Application
Viewing Windows Commissioning Results
Commissioning an Application in Linux
Compiling and Running an Application When a Client Is Installed
Compiling and Running an Application When No Client Is Installed
Viewing Linux Commissioning Results
More Information
SQL Query
HBase Dual-Read Configuration Items
External Interfaces
Shell
Java APIs
SQLLine
JDBC APIs
WebUI
HBase of the Cluster in Normal Mode Access Configuration on Windows Using EIPs
Phoenix Command Line
FAQs
How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
What Are the Application Scenarios of the bulkload and put Data-loading Modes?
An Error Occurred When Building a JAR Package
HDFS Development Guide (Security Mode)
HDFS Application Development Overview
HDFS Application Development Process
HDFS Sample Project
Preparing an HDFS Application Development Environment
Preparing an HDFS Application Development and Runtime Environment
Importing and Configuring HDFS Sample Projects
Preparing the HDFS Authentication Mechanism
Developing an HDFS Application
HDFS Application Development Approach
Initializing the HDFS
Creating an HDFS Directories
Writing Data into an HDFS File
Appending Data to an HDFS File
Reading Data from an HDFS File
Deleting a File
Deleting Directories
Multi-Thread Tasks
Setting Storage Policies
Configuring the HDFS Colocation Policy
Commissioning an HDFS Application
Commissioning an HDFS Application in the Windows Environment
Commissioning an HDFS Application in the Linux Environment
FAQs in HDFS Application Development
Common API Introduction
HDFS Java APIs
HDFS C APIs
HDFS HTTP REST APIs
HDFS Shell Command Introduce
Access HDFS of the Cluster in Security Mode on Windows Using EIPs
HDFS Development Guide (Normal Mode)
HDFS Application Development Overview
HDFS Application Development Process
HDFS Sample Project
Preparing an HDFS Application Development Environment
Preparing an HDFS Application Development and Runtime Environment
Importing and Configuring HDFS Sample Projects
Developing an HDFS Application
HDFS Application Development Approach
Initializing the HDFS
Creating an HDFS Directories
Writing Data into an HDFS File
Writing Data into an HDFS File
Reading Data from an HDFS File
Deleting a File
Deleting Directories
Multi-Thread Tasks
Setting Storage Policies
Configuring the HDFS Colocation Policy
Commissioning an HDFS Application
Commissioning an HDFS Application in the Windows Environment
Commissioning an HDFS Application in the Linux Environment
FAQs in HDFS Application Development
Common API Introduction
HDFS Java APIs
HDFS C APIs
HDFS HTTP REST APIs
HDFS Shell Command Introduce
Access HDFS of the Cluster in Normal Mode on Windows Using EIPs
Hive Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Hive Sample Project
Preparing the Environment
Preparations
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Configuring and Importing the JDBC/HCatalog Sample Project
Configuring the Python Sample Project
Configuring the Python3 Sample Project
Configuring Security Authentication for JDBC to Access Hive
Developing an Application
Accessing Hive with JDBC
Typical Scenario
Creating a Table
Loading Data
Querying Data
Accessing Multiple ZooKeepers
Using the JDBC interface to submit a data analysis task
Access Hive with HCatalog
Accessing Hive Using Python
Accessing Hive Using Python 3
Debugging the Application
Debugging the Sample Application in Windows
Debugging the Sample Application in Linux
Debugging the HCatalog Sample Program
Debugging the Python Sample Program
Debugging the Python3 Sample Program
More Information
Interface Reference
JDBC
Hive SQL
WebHCat
How Do I Access Hive of the Cluster in Security Mode on Windows Using EIPs?
FAQ
A Message Is Displayed Stating "Unable to read HiveServer2 configs from ZooKeeper" During the Use of the Secondary Development Program
Problem performing GSS wrap Message Is Displayed Due to IBM JDK Exceptions
Hive SQL Is Incompatible with SQL2003 Standards
Hive Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Hive Sample Project
Preparing the Environment
Preparations
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Configuring and Importing the JDBC/HCatalog Sample Project
Configuring the Python Sample Project
Configuring the Python3 Sample Project
Developing an Application
Accessing Hive with JDBC
Typical Scenario Description
Creating a Table
Loading Data
Querying Data
Accessing Multiple ZooKeepers
Using the JDBC interface to submit a data analysis task
Access Hive with HCatalog
Accessing Hive Using Python
Accessing Hive Using Python 3
Debugging the Application
Debugging the Sample Application in Windows
Debugging the Sample Application in Linux
Debugging the HCatalog Sample Program
Debugging the Python Sample Program
Debugging the Python3 Sample Program
More Information
Interface Reference
JDBC
Hive SQL
WebHCat
Hive of the Cluster in Normal Mode Access Configuration on Windows Using EIPs
FAQ
Problem performing GSS wrap Message Is Displayed Due to IBM JDK Exceptions
Impala Development Guide (Security Mode)
Overview
Application Development Overview
Basic Concepts
Application Development Process
Environment Preparation
Preparing Development and Operating Environment
Application Development
Typical Application Scenario
Creating a Table
Loading Data
Querying Data
User-defined Functions
Sample Program Guide
Application Commissioning
Commissioning Applications on Windows
Commissioning Applications on Linux
Impala APIs
JDBC
Impala SQL
Development Specifications
Rules
Suggestions
Examples
Impala Development Guide (Normal Mode)
Overview
Application Development Overview
Basic Concepts
Application Development Process
Environment Preparation
Preparing Development and Operating Environment
Configuring and Importing a Sample Projects
Application Development
Typical Application Scenario
Creating a Table
Loading Data
Querying Data
User-defined Functions
Sample Program Guide
Application Commissioning
Commissioning Applications on Windows
Commissioning Applications on Linux
Impala APIs
JDBC
Impala SQL
Development Specifications
Rules
Suggestions
Examples
Kafka Development Guide (Security Mode)
Overview
Development Environment Preparation
Common Concepts
Development Process
Kafka Sample Project
Environment Preparation
Preparing the Development Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Preparing for Security Authentication
SASL Kerberos Authentication
Kafka Token Authentication
Developing an Application
Typical Application Scenarios
Typical Scenario Sample Code Description
Producer API Sample
Consumer API Sample
Multi-thread Producer Example
Multi-thread Consumer Sample
KafkaStreams Sample
Application Commissioning
Producer Sample Commissioning
Consumer Sample Commissioning
High Level Streams Sample Commissioning
Low Level Streams Sample Commissioning
Sample Code Running Guide for the Kafka Token Authentication Mechanism
More Information
External Interfaces
Shell
Java API
Security Ports
SSL Encryption Function Used by a Client
How Do I Access Kafka of the Cluster in Security Mode on Windows Using EIPs?
FAQ
Topic Authentication Fails During Sample Running and "example-metric1=TOPIC_AUTHORIZATION_FAILED" Is Displayed
Running the Producer.java Sample to Obtain Metadata Fails and "ERROR fetching topic metadata for topics..." Is Displayed, Even the Access Permission for the Related Topic
Kafka Development Guide (Normal Mode)
Overview
Development Environment Preparation
Common Concepts
Development Process
Kafka Sample Project
Environment Preparation
Preparing for Development and Operating Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Developing an Application
Typical Scenario Description
Example Code Description
Producer API Usage Sample
Consumer API Usage Sample
Multi-thread Producer Sample
Multi-thread Consumer Sample
KafkaStreams Sample
Application Commissioning
Producer Sample Commissioning
Consumer Sample Commissioning
High Level Streams Sample Commissioning
Low Level Streams Sample Commissioning
More Information
External Interfaces
Shell
Java API
Kafka Access Configuration on Windows Using EIPs
FAQ
Running the Producer.java Sample to Obtain Metadata Fails and "ERROR fetching topic metadata for topics..." Is Displayed, Even the Access Permission for the Related Topic
Kudu Development Guide (Security Mode)
Overview
Introduction to Kudu
Basic Concepts
Development Process
Environment Preparation
Preparing the Development and Running Environment
Preparing for Security Authentication
Developing an Application
Typical Application Scenario
Development Idea
Sample Code Description
Establish Connections
Creating a Table
Opening a Table
Modifying a Table
Writing Data
Read Data
Deleting a Table
Commissioning the Application
More Information
Common APIs
Java API
Kudu Development Guide (Normal Mode)
Overview
Introduction to Kudu
Basic Concepts
Development Process
Environment Preparation
Preparing the Development and Running Environment
Developing an Application
Typical Application Scenario
Development Idea
Sample Code Description
Establish Connections
Creating a Table
Opening a Table
Modifying a Table
Writing Data
Read Data
Deleting a Table
Commissioning the Application
More Information
Common APIs
Java API
MapReduce Development Guide (Security Mode)
Overview
MapReduce Overview
Basic Concepts
Development process description
MapReduce Sample Project
Environment Preparation
Preparing for Development Environment
Preparing the Configuration Files for Connecting to the Cluster
Configuring and Importing Sample Projects
Creating a New Project (Optional)
Preparing for Security Authentication
Developing the Project
MapReduce Statistics Sample Project
Typical Scenarios
Example Code
MapReduce Accessing Multi-Component Example Project
Instance
Example Code
Debugging the Application
Preparing Initial Data
Commissioning the Application in the Windows Environment
Compiling and Running Applications
Checking the Commissioning Result
Commissioning an Application in the Linux Environment
Compiling and Running Applications
Checking the Commissioning Result
More Information
Common APIs
Java API
REST API
FAQ
No Response from the Client When Submitting the MapReduce Application
When an Application Is Run, An Abnormality Occurs Due to Network Faults
How to Perform Remote Debugging During MapReduce Secondary Development?
MapReduce Development Guide (Normal Mode)
Overview
MapReduce Overview
Basic Concepts
Development Process
Environment Preparation
Preparing Development and Operating Environment
Configuring and Importing Sample Projects
Creating a New Project (Optional)
Developing the Project
MapReduce Statistics Sample Project
Typical Scenarios
Example Codes
MapReduce Accessing Multi-Component Example Project
Instance
Example Code
Commissioning the Application
Commissioning the Application in the Windows Environment
Compiling and Running the Application
Checking the Commissioning Result
Commissioning the Application in the Linux Environment
Compiling and Running the Application
Checking the Commissioning Result
More Information
Common APIs
Java API
REST API
FAQ
No Response from the Client When Submitting the MapReduce Application
How to Perform Remote Debugging During MapReduce Secondary Development?
Oozie Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Preparing Development and Operating Environment
Downloading and Importing Sample Projects
Preparing Authentication Mechanism Code
Developing the Project
Development of Configuration Files
Description
Development Procedure
Example Codes
job.properties
workflow.xml
Start Action
End Action
Kill Action
FS Action
MapReduce Action
coordinator.xml
Development of Java
Description
Sample Code
Scheduling Spark2x to Access HBase and Hive Using Oozie
Commissioning the Application
Commissioning an Application in the Windows Environment
Compiling and Running Applications
Checking the Commissioning Result
More Information
Common API Introduce
Shell
Java
REST
Oozie Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Preparing for Development and Operating Environment
Downloading and Importing Sample Projects
Developing the Project
Development of Configuration Files
Description
Development Procedure
Example Codes
job.properties
workflow.xml
Start Action
End Action
Kill Action
FS Action
MapReduce Action
coordinator.xml
Development of Java
Description
Sample Code
Scheduling Spark2x to Access HBase and Hive Using Oozie
Commissioning the Application
Commissioning an Application in the Windows Environment
Compiling and Running Applications
Checking the Commissioning Result
More Information
Common API Introduce
Shell
Java
REST
Spark2x Development Guide (Security Mode)
Introduction to Spark Application Development
Spark Application Development Process
Spark2x Sample Project
Preparing a Spark Application Development Environment
Preparing a Local Application Development Environment
Preparing the Configuration File for Connecting Spark to the Cluster
Importing and Configuring Spark Sample Projects
(Optional) Creating Spark Sample Projects
Configuring Security Authentication for Spark Applications
Configuring the Spark Python3 Sample Project
Developing a Spark Application
Spark Core Sample Projects
Development Plan
Spark Core Sample Projects (Java)
Spark Core Sample Projects (Scala)
Spark Core Sample Projects (Python)
Spark SQL Sample Projects
Development Plan
Spark SQL Sample Projects (Java)
Spark SQL Sample Projects (Scala)
Spark SQL Sample Projects (Python)
Sample Projects for Accessing Spark SQL Through JDBC
Development Plan
Accessing Spark SQL Sample Projects Through JDBC (Java)
Accessing Spark SQL Sample Projects Through JDBC (Scala)
Sample Projects for Spark to Read HBase Tables
Performing Operations on Data in Avro Format
Performing Operations on the HBase Data Source
Using the BulkPut Interface
Using the BulkGet Interface
Using the BulkDelete Interface
Using the BulkLoad Interface
Using the foreachPartition Interface
Distributedly Scanning HBase Tables
Using the mapPartition Interface
Writing Data to HBase Tables In Batches Using SparkStreaming
Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
Development Plan
Implementing Bidirectional Data Exchange with HBase (Java)
Implementing Bidirectional Data Exchange with HBase (Scala)
Implementing Bidirectional Data Exchange with HBase (Python)
Sample Projects for Spark to Implement Data Transition Between Hive and HBase
Development Plan
Implementing Data Transition Between Hive and HBase (Java)
Implementing Data Transition Between Hive and HBase (Scala)
Implementing Data Transition Between Hive and HBase (Python)
Sample Projects for Connecting Spark Streaming to Kafka0-10
Development Plan
Connecting Spark Streaming to Kafka0-10 (Java)
Connecting Spark Streaming to Kafka0-10 (Scala)
Spark Structured Streaming Sample Projects
Development Plan
Spark Structured Streaming Sample Project (Java)
Spark Structured Streaming Sample Project (Scala)
Spark Structured Streaming Sample Project (Python)
Sample Project for Interconnecting Spark Structured Streaming with Kafka
Development Plan
Interconnecting Spark Structured Streaming with Kafka (Scala)
Sample Project for Spark Structured Streaming Status Operations
Development Plan
Sample Project for Spark Structured Streaming Status Operations (Scala)
Sample Project for Spark Concurrent Access to Two HBase Sample Projects
Development Plan
Spark Concurrent Access to Two HBase Sample Projects (Scala)
Sample Project for Spark to Synchronize HBase Sata to CarbonData
Development Plan
Synchronizing HBase Data from Spark to CarbonData (Java)
Using Spark to Execute the Hudi Sample Project
Development Plan
Using Spark to Execute the Hudi Sample Project (Java)
Using Spark to Execute the Hudi Sample Project (Scala)
Using Spark to Execute the Hudi Sample Project (Python)
Sample Project for Customizing Configuration Items in Hudi
HoodieDeltaStreamer
User-defined Partitioner
Commissioning a Spark Application
Commissioning a Spark Application in a Local Windows Environment
Spark Access Configuration on Windows Using EIPs
Writing and Running the Spark Program in the Local Windows Environment
Viewing the Spark Program Debugging Result in the Local Windows Environment
Commissioning a Spark Application in a Linux Environment
Writing and Running the Spark Program in the Linux Environment
Viewing the Spark Program Commissioning Result in the Linux Environment
FAQs About Spark Application Development
Common Spark APIs
Spark Java APIs
Spark Scala APIs
Spark Python APIs
Spark REST APIs
Spark Client CLI
Spark JDBCServer APIs
Structured Streaming Functions and Reliability
How to Add a User-Defined Library
How to Automatically Load Jars Packages?
Why the "Class Does not Exist" Error Is Reported While the SparkStresmingKafka Project Is Running?
Privilege Control Mechanism of SparkSQL UDF Feature
Why Does Kafka Fail to Receive the Data Written Back by SLog in to the node where the client is installed as the client installation user.park Streaming?
Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
How to Perform Remote Debugging Using IDEA?
How to Submit the Spark Application Using Java Commands?
A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
Application Fails When ApplicationManager Is Terminated During Data Processing in the Cluster Mode of Structured Streaming
Restrictions on Restoring the Spark Application from the checkpoint
Support for Third-party JAR Packages on x86 and TaiShan Platforms
What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
What Should I Do If the Structured Streaming Task Submission Way Is Changed?
Common JAR File Conflicts
Spark2x Development Guide (Normal Mode)
Introduction to Spark Application Development
Spark Application Development Process
Spark2x Sample Project
Preparing a Spark Application Development Environment
Preparing a Local Application Development Environment
Preparing the Configuration File for Connecting Spark to the Cluster
Importing and Configuring Spark Sample Projects
(Optional) Creating Spark Sample Projects
Configuring the Spark Python3 Sample Project
Developing Spark Applications
Spark Core Sample Projects
Development Plan
Spark Core Sample Projects (Java)
Spark Core Sample Projects (Scala)
Spark Core Sample Projects (Python)
Spark SQL Sample Projects
Development Plan
Spark SQL Sample Projects (Java)
Spark SQL Sample Projects (Scala)
Spark SQL Sample Projects (Python)
Sample Projects for Accessing Spark SQL Through JDBC
Development Plan
Accessing Spark SQL Sample Projects Through JDBC (Java)
Accessing Spark SQL Sample Projects Through JDBC (Scala)
Sample Projects for Spark to Read HBase Tables
Performing Operation on Data in Avro Format
Performing Operations on the HBase Data Source
Using the BulkPut Interface
Using the BulkGet Interface
Using the BulkDelete Interface
Using the BulkLoad Interface
Using the foreachPartition Interface
Distributedly Scanning HBase Tables
Using the mapPartition Interface
Writing Data to HBase Tables In Batches Using SparkStreaming
Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
Implementing Bidirectional Data Exchange with HBase (Java)
Implementing Bidirectional Data Exchange with HBase (Java)
Implementing Bidirectional Data Exchange with HBase (Scala)
Implementing Bidirectional Data Exchange with HBase (Python)
Sample Projects for Spark to Implement Data Transition Between Hive and HBase
Development Plan
Implementing Data Transition Between Hive and HBase (Java)
Implementing Data Transition Between Hive and HBase (Scala)
Implementing Data Transition Between Hive and HBase (Python)
Sample Projects for Connecting Spark Streaming to Kafka0-10
Development Plan
Connecting Spark Streaming to Kafka0-10 (Java)
Connecting Spark Streaming to Kafka0-10 (Scala)
Spark Structured Streaming Sample Projects
Development Plan
Spark Structured Streaming Sample Project (Java)
Spark Structured Streaming Sample Project (Scala)
Spark Structured Streaming Sample Project (Python)
Sample Project for Interconnecting Spark Structured Streaming with Kafka
Development Plan
Interconnecting Spark Structured Streaming with Kafka (Scala)
Sample Project for Spark Structured Streaming Status Operations
Development Plan
Sample Project for Spark Structured Streaming Status Operations (Scala)
Sample Project for Spark to Synchronize HBase Sata to CarbonData
Development Plan
Synchronizing HBase Data from Spark to CarbonData (Java)
Using Spark to Execute the Hudi Sample Project
Development Plan
Using Spark to Execute the Hudi Sample Project (Java)
Using Spark to Execute the Hudi Sample Project (Scala)
Using Spark to Execute the Hudi Sample Project (Python)
Sample Project for Customizing Configuration Items in Hudi
HoodieDeltaStreamer
User-defined Partitioner
Commissioning a Spark Application
Commissioning a Spark Application in a Local Windows Environment
Spark Access Configuration on Windows Using EIPs
Writing and Running the Spark Program in the Local Windows Environment
Viewing the Spark Program Debugging Result in the Local Windows Environment
Commissioning a Spark Application in a Linux Environment
Writing and Running the Spark Program in the Linux Environment
Viewing the Spark Program Commissioning Result in the Linux Environment
FAQs About Spark Application Development
Common Spark APIs
Spark Java APIs
Spark Scala APIs
Spark Python APIs
Spark Client CLI
Spark JDBCServer APIs
Structured Streaming Functions and Reliability
How to Add a User-Defined Library
How to Automatically Load Jars Packages?
Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
Why Does Kafka Fail to Receive the Data Written Back by Spark Streaming?
Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
How to Perform Remote Debugging Using IDEA?
How to Submit the Spark Application Using Java Commands?
A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
Application Fails When ApplicationManager Is Terminated During Data Processing in the Cluster Mode of Structured Streaming
Restrictions on Restoring the Spark Application from the checkpoint
Support for Third-party JAR Packages on x86 and TaiShan Platforms
What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
What Should I Do If the Structured Streaming Task Submission Way Is Changed?
Common JAR File Conflicts
Storm Development Guide (Security Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Environment Preparation Overview
Preparing for Development and Operating Environment
Configuring and Importing Sample Projects
Developing an Application
Typical Scenario Description
Development Idea
Example Code Description
Creating a Spout
Creating a Bolt
Creating a Topology
Running an Application
Packaging IntelliJ IDEA Code
Packaging Services
Overview
Packaging Services on a Linux OS
Packaging Services on a Windows OS
Submitting a Topology
Submitting a Topology When a Client Is Installed on a Linux OS
Submitting a Topology When No Client Is Installed on a Linux OS
Submitting a Topology in IntelliJ IDEA Remotely
Viewing Results
More Information
Storm-Kafka Development Guideline
Storm-JDBC Development Guideline
Storm-HDFS Development Guideline
Storm-HBase Development Guideline
Flux Development Guideline
External Interfaces
FAQ
How Do I Use IDEA to Remotely Debug Services?
How Do I Handle the Error "Command line is too long" Reported When Main Is Executed for Remote Topology Submission in IntelliJ IDEA
Storm Development Guide (Normal Mode)
Overview
Application Development Overview
Common Concepts
Development Process
Environment Preparation
Environment Preparation Overview
Preparing for Development and Operating Environment
Configuring and Importing Sample Projects
Developing an Application
Typical Scenario Description
Development Idea
Example Code Description
Creating a Spout
Creating a Bolt
Creating a Topology
Running an Application
Packaging IntelliJ IDEA Code
Packaging Services
Overview
Packaging Services on a Linux OS
Packaging Services on a Windows OS
Submitting a Topology
Submitting a Topology When a Client Is Installed on a Linux OS
Submitting a Topology When No Client Is Installed on a Linux OS
Submitting a Topology in IntelliJ IDEA Remotely
Viewing Results
More Information
Storm-Kafka Development Guideline
Storm-JDBC Development Guideline
Storm-HDFS Development Guideline
Storm-HBase Development Guideline
Flux Development Guideline
External Interfaces
FAQ
How Do I Use IDEA to Remotely Debug Services?
How Do I Set the Offset Correctly when Using the Old Plug-in storm-kafka?
How Do I Handle the Error "Command line is too long" Reported When Main Is Executed for Remote Topology Submission in IntelliJ IDEA
YARN Development Guide (Security Mode)
Overview
Interfaces
Command
Java API
REST API
REST APIs of Superior Scheduler
YARN Development Guide (Normal Mode)
Overview
Interfaces
Command
Java API
REST API
REST APIs of Superior Scheduler
Developer Guide (Normal_Earlier Than 3.x)
Introduction to MRS Application Development
Obtaining the MRS Application Development Sample Project
Sample Projects of MRS Components
Alluxio Development Guide
Alluxio Application Development Overview
Introduction to Alluxio Application Development
Common Concepts of Alluxio
Alluxio Application Development Process
Preparing an Alluxio Application Development Environment
Alluxio Development Environment
Preparing an Alluxio Application Development Environment
Importing and Configuring Alluxio Sample Projects
Developing an Alluxio Application
Alluxio Development Plan
Initializing Alluxio
Writing Data to an Alluxio File
Reading an Alluxio File
Commissioning an Alluxio Application
Alluxio APIs
Flink Development Guide
Flink Application Development Overview
Introduction to Flink Application Development
Common Concepts of Flink Application Development
Flink Application Development Process
Preparing a Flink Application Development Environment
Preparing a Local Application Development Environment
Preparing a Flink Application Development User
Installing the Flink Client
Configuring and Importing a Flink Sample Project
(Optional) Creating Flink Sample Projects
Preparing the Flink Application Security Authentication
Developing Flink Applications
DataStream Application
Flink DataStream Development Plan
Flink DataStream Java Sample Code
Flink DataStream Scala Sample Code
Application for Producing and Consuming Data in Kafka
Development Plan of Kafka Data Producing and Consuming
Java Sample Code of Kafka Data Producing and Consuming
Scala Sample Code of Kafka Data Producing and Consuming
Asynchronous Checkpoint Mechanism Application
Development Plan of Flink Asynchronous Checkpoint
Java Sample Code of Flink Asynchronous Checkpoint
Scala Sample Code of Flink Asynchronous Checkpoint
Stream SQL Join Application
Development Plan of Flink Stream SQL Join
Flink Stream SQL Join Java Sample Code
Commissioning a Flink Application
Compiling and Running a Flink Application
Viewing the Running Result of a Flink Application
FAQs About Flink Application Development
Flink Savepoints CLI
Flink Client CLI
Flink Performance Tuning Suggestions
Savepoints FAQs
What Should I Do If Running a Checkpoint Is Slow When RocksDBStateBackend is Set for the Checkpoint and a Large Amount of Data Exists?
What Should I Do If yarn-session Failed to Be Started When blob.storage.directory Is Set to /home?
Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
When I Use the Newly-Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
Why Can't I Access the Flink Web Page?
HBase Development Guide
HBase Application Development Overview
Introduction to HBase Application Development
Common Concepts of HBase Application Development
HBase Application Development Process
Preparing an HBase Application Development Environment
Preparing a Local Application Development Environment
Preparing an HBase Application Development User
Importing and Configuring HBase Sample Projects
Developing an HBase Application
HBase Development Plan
Creating the Configuration Object
Creating a Connection Object
Creating an HBase Table
Deleting an HBase Table
Modifying an HBase Table
Inserting HBase Data
Deleting HBase Data
Reading HBase Data Using the GET Command
Reading HBase Data Using the Scan Command
Using an HBase Filter
Adding an HBase Secondary Index
Enabling or Disabling an HBase Secondary Index
Querying the HBase Secondary Index List
Using an HBase Secondary Index to Read Data
Deleting an HBase Secondary Index
HBase Multi-Point Region Splitting
Configuring HBase ACL Security Policies
Commissioning an HBase Application
Commissioning an HBase Application on Windows
Compiling and Running an HBase Application
Viewing the HBase Application Commissioning Result
Commissioning an HBase Application on Linux
Compiling and Running an HBase Application When a Client Is Installed
Compiling and Running an HBase Application When No Client Is Installed
Viewing the HBase Application Commissioning Result
Commissioning the HBase Phoenix Sample Program
Commissioning the HBase Python Sample Program
FAQs About HBase Application Development
HBase APIs
HBase Shell APIs
HBase Java APIs
HBase HFS Java APIs
HBase Phoenix APIs
HBase REST APIs
HBase SQL Query Sample Code
How Do I Configure HBase File Storage?
What Do I Do When There Is an HBase Application Running Exception?
Application Scenarios of HBase BulkLoad and Put
HDFS Development Guide
HDFS Application Development Overview
Introduction to HDFS Application Development
Common Concepts of HDFS Application Development
HDFS Application Development Process
Preparing an HDFS Application Development Environment
Preparing a Local Application Development Environment
Preparing an HDFS Application Development User
Preparing the Eclipse and JDK
Preparing an HDFS Application Running Environment
Importing and Configuring HDFS Sample Projects
Developing an HDFS Application
HDFS Development Plan
Initializing HDFS
Writing Data to an HDFS File
Appending HDFS File Content
Reading an HDFS File
Deleting an HDFS File
HDFS Colocation
Setting HDFS Storage Policies
Using HDFS to Access OBS
Commissioning an HDFS Application
Commissioning an HDFS Application on Linux
Viewing the HDFS Application Commissioning Result
FAQs About HDFS Application Development
HDFS Java APIs
HDFS C APIs
HDFS HTTP REST APIs
HDFS Shell Commands
Logging in to MRS Manager
Downloading an MRS Client
Hive Development Guide
Hive Application Development Overview
Introduction to Hive Application Development
Common Concepts of Hive Application Development
Hive Application Development Process
Preparing a Hive Application Development Environment
Hive Application Development Environment
Preparing a Local Application Development Environment
Preparing a Hive Application Development User
Preparing a Hive JDBC Development Environment
Preparing a Hive HCatalog Development Environment
Developing a Hive Application
Hive Development Plan
Creating a Hive Table
Loading Hive Data
Querying Hive Data
Analyzing Hive Data
Developing User-Defined Hive Functions
Commissioning a Hive Application
Commissioning a Hive JDBC Application on Windows
Commissioning a Hive JDBC Application on Linux
Commissioning a Hive HCatalog Application on Linux
FAQs About Hive Application Development
Hive JDBC APIs
HiveQL APIs
Hive WebHCat APIs
Impala Development Guide
Impala Application Development Overview
Introduction to Impala Application Development
Common Concepts of Impala Application Development
Impala Application Development Process
Preparing an Impala Application Development Environment
Impala Application Development Environment
Preparing a Local Application Development Environment
Preparing an Impala Application Development User
Preparing the Impala JDBC Client
Developing Impala Applications
Impala Development Plan
Creating an Impala Table
Loading Impala Data
Querying Impala Data
Analyzing Impala Data
Developing User-Defined Impala Functions
Commissioning an Impala Application
Commissioning an Impala JDBC Application on Windows
Commissioning an Impala JDBC Application on Linux
FAQs About Impala Application Development
Impala JDBC APIs
Impala SQL APIs
Kafka Development Guide
Kafka Application Development Overview
Introduction to Kafka Application Development
Common Concepts of Kafka Application Development
Kafka Application Development Process
Preparing a Kafka Application Development Environment
Kafka Application Development Environment
Preparing the Maven and JDK
Importing and Configuring Kafka Sample Projects
Preparing the Kafka Application Security Authentication
Developing a Kafka Application
Kafka Development Plan
Kafka Old Producer API Usage Sample
Kafka Old Consumer API Usage Sample
Kafka Producer API Usage Sample
Kafka Consumer API Usage Sample
Kafka Multi-Thread Producer API Usage Sample
Kafka Multi-Thread Consumer API Usage Sample
Kafka SimpleConsumer API Usage Sample
Kafka Configuration File
Commissioning a Kafka Application
FAQs About Kafka Application Development
Kafka APIs
Kafka Shell Commands
Kafka Java APIs
Kafka Security APIs
What Do I Do if Metadata Fails to Be Obtained by Running the Producer.java Sample?
MapReduce Development Guide
MapReduce Application Development Overview
Introduction to MapReduce Application Development
Common Concepts of MapReduce Application Development
MapReduce Application Development Process
Preparing a MapReduce Application Development Environment
MapReduce Application Development Environment
Preparing a MapReduce Application Development User
Preparing the Eclipse and JDK
Preparing a MapReduce Application Running Environment
Importing and Configuring MapReduce Sample Projects
Configuring Security Authentication for MapReduce Applications
Developing a MapReduce Application
MapReduce Development Plan
Development Plan of Accessing a Multi-Component Program
Commissioning a MapReduce Application
Compiling and Running Applications
Viewing the MapReduce Application Commissioning Result
FAQs About MapReduce Application Development
MapReduce APIs
MapReduce Java APIs
What Should I Do if the Client Has No Response after a MapReduce Job is Submitted?
OpenTSDB Development Guide
OpenTSDB Application Development Overview
Introduction to OpenTSDB Application Development
Common Concepts of OpenTSDB Application Development
OpenTSDB Application Development Process
Preparing an OpenTSDB Application Development Environment
OpenTSDB Application Development Environment
Preparing an OpenTSDB Application Development Environment
Preparing an OpenTSDB Application Development User
Importing and Configuring an OpenTSDB Sample Project
Developing an OpenTSDB Application
OpenTSDB Development Plan
Configuring OpenTSDB Parameters
Writing Data into OpenTSDB
Querying OpenTSDB Data
Deleting OpenTSDB Data
Commissioning an OpenTSDB Application
Commissioning Applications on Windows
Commissioning an OpenTSDB Application
Viewing the OpenTSDB Application Commissioning Result
Commissioning Applications on Linux
Commissioning an OpenTSDB Application
Viewing the OpenTSDB Application Commissioning Result
FAQs About OpenTSDB Application Development
OpenTSDB CLI Tools
OpenTSDB HTTP APIs
Presto Development Guide
Presto Application Development Overview
Introduction to Presto Application Development
Common Concepts of Presto Application Development
Presto Application Development Process
Preparing a Presto Application Development Environment
Presto Application Development Environment
Preparing a Presto Application Development Environment
Preparing a Presto Application Development User
Preparing a Presto JDBC Application Development Environment
Preparing a Presto HCatalog Application Development Environment
Developing a Presto Application
Presto Development Plan
Presto JDBC Usage Example
Commissioning a Presto Application
Commissioning a Presto Application on Windows
Commissioning a Presto Application on Linux
FAQs About Presto Application Development
Presto APIs
No Certificate Is Available When PrestoJDBCExample Run on a Node Outside the Cluster
When a Node Outside a Cluster Is Connected to a Cluster with Kerberos Authentication Enabled, HTTP Cannot Find the Corresponding Record in the Kerberos Database
Spark Development Guide
Spark Application Development Overview
Introduction to Spark Application Development
Basic Concepts
Spark Application Development Process
Preparing a Spark Application Development Environment
Spark Application Development Environment
Preparing a Spark Application Development User
Preparing a Java Development Environment for Spark
Preparing a Scala Development Environment for Spark
Preparing a Python Development Environment for Spark
Preparing a Spark Application Running Environment
Importing and Configuring Spark Sample Projects
(Optional) Creating a Spark Application Development Project
Configuring Security Authentication for Spark Applications
Developing a Spark Application
Spark Core Application
Scenario Description
Java Sample Code
Scala Sample Code
Python Sample Code
Spark SQL Application
Scenario Description
Java Sample Code
Scala Sample Code
Spark Streaming Application
Scenario Description
Java Sample Code
Scala Sample Code
Application for Accessing Spark SQL Through JDBC
Scenario Description
Java Sample Code
Scala Sample Code
Python Sample Code
Spark on HBase Application
Scenario Description
Java Sample Code
Scala Sample Code
Reading Data from HBase and Writing Data Back to HBase
Scenario Description
Java Sample Code
Scala Sample Code
Reading Data from Hive and Write Data to HBase
Scenario Description
Java Sample Code
Scala Sample Code
Using Streaming to Read Data from Kafka and Write Data to HBase
Scenario Description
Java Sample Code
Scala Sample Code
Application for Connecting Spark Streaming to Kafka0-10
Scenario Description
Java Sample Code
Scala Sample Code
Structured Streaming Application
Scenario Description
Java Sample Code
Scala Sample Code
Commissioning a Spark Application
Compiling and Running a Spark Application
Viewing the Spark Application Commissioning Result
FAQs About Spark Application Development
Spark APIs
Spark Java APIs
Spark Scala APIs
Spark Python APIs
Spark REST APIs
Spark ThriftServer APIs
Common Spark Commands
Spark Application Tuning
Spark Core Tuning
Data Serialization
Memory Configuration Optimization
Setting a Degree of Parallelism
Using Broadcast Variables
Using the External Shuffle Service to Improve Performance
Configuring Dynamic Resource Scheduling in Yarn Mode
Configuring Process Parameters
Designing a Direction Acyclic Graph (DAG)
Experience Summary
SQL and DataFrame Tuning
Optimizing the Spark SQL Join Operation
Optimizing INSERT...SELECT Operation
Spark Streaming Tuning
Spark CBO Tuning
How Do I Add a Dependency Package with Customized Codes?
How Do I Handle the Dependency Package That Is Automatically Loaded?
Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
How Do I Submit the Spark Application Using Java Commands?
How Does the Permission Control Mechanism Work for the UDF Function in SparkSQL?
Why Does Kafka Fail to Receive the Data Written Back by Spark Streaming?
How Do I Perform Remote Debugging Using IDEA?
A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
What Should I Do If FileNotFoundException Occurs When spark-submit Is Used to Submit a Job in Spark on Yarn Client Mode?
What Should I Do If the "had a not serializable result" Error Is Reported When a Spark Task Reads HBase Data?
How Do I Connect to Hive and HDFS of an MRS Cluster when the Spark Program Is Running on a Local Host?
Storm Development Guide
Storm Application Development Overview
Introduction to Storm Application Development
Common Concepts of Storm Application Development
Storm Application Development Process
Preparing a Storm Application Development Environment
Storm Application Development Environment
Preparing the Eclipse and JDK
Preparing a Linux Client Environment
Configuring and Importing a Project
Developing a Storm Application
Storm Development Plan
Creating a Storm Spout
Creating a Storm Bolt
Creating a Storm Topology
Commissioning a Storm Application
Generating the JAR Package of the Storm Application
Commissioning a Storm Application on Linux
Viewing the Storm Application Commissioning Result
FAQs About Storm Application Development
Storm APIs
Storm-Kafka Development Guideline
Storm-JDBC Development Guideline
Storm-HDFS Development Guideline
Storm-OBS Development Guideline
Storm-HBase Development Guideline
Flux Development Guideline
Component Development Specifications
ClickHouse
ClickHouse Application Development Rules
ClickHouse Application Development Suggestions
Doris
Table Creation Rules
Data Change
Naming Conventions
Data Query
Data Import
UDF Development
Connection and Running
Flink
Flink Specification Overview
FlinkSQL Connector Development
ClickHouse Table
Development Rules
Development Suggestions
Doris Table
Development Rules
Kafka Table
Development Rules
Development Suggestions
HBase Table
Development Rules
Development Suggestions
Flink on Hudi
Hudi Table Streaming Reads
Development Rules
Suggestions
Hudi Table Streaming Writes
Development Rules
Development Suggestions
Flink Job Parameters
Configuration Rules
Configuration Suggestions
Flink Jobs
Development Rules
Development Suggestions
Flink SQL Logic
Development Rules
Development Suggestions
Flink Performance Tuning
Performance Tuning Rules
Performance Tuning Suggestions
Typical Flink Parameters
Development Examples
HBase
HBase Application Development Rules
HBase Application Development Suggestions
HDFS
HDFS Application Development Rules
HDFS Application Development Suggestions
Hive
Hive Application Development Rules
Hive Application Development Suggestions
Hudi
Hudi Development Specifications Overview
Hudi Data Sheet Design Specification
Hudi Table Model Design Specifications
Hudi Table Index Design Specifications
Hudi Table Partition Design Specifications
Hudi Data Table Management Operation Specifications
Hudi Data Table Compaction Specifications
Hudi Data Table Clean Specifications
Hudi Data Table Archive Specifications
Spark on Hudi Development Specifications
Spark Read/Write Hudi Development Specifications
SparkSQL table creation parameter specifications
Specifications for Spark to read Hudi parameters in incremental mode
Specifications for setting the compaction parameter in the Spark asynchronous task execution table
Spark Table Data Maintenance Specifications
Suggestions for Spark Concurrently Write Hudi Data
Suggestions on configuring resources for Spark read and write Hudi resources
Spark On Hudi Performance Optimization
Bucket Tuning Example
Creating a Bucket Index Table
Hudi table initialization
Real-time Task Access
Offline Compaction Configuration
IoTDB
IoTDB Application Development Rules
IoTDB Application Development Suggestions
Kafka
Kafka Application Development Rules
Kafka Application Development Suggestions
Mapreduce
MapReduce Application Development Rules
MapReduce Application Development Suggestions
Spark
Spark Application Development Rules
Spark Application Development Suggestions
API Reference
Before You Start
API Overview
Selecting an API Type
Calling APIs
Making an API Request
Authentication
Response
Application Cases
Creating an MRS Cluster
Scaling Out a Cluster
Scaling in a Cluster
Creating a Job
Terminating a Job
Terminating a Cluster
API V2
Cluster Management APIs
Creating a Cluster
Changing a Cluster Name
Creating a Cluster and Submitting a Job
Scaling Out a Cluster
Scaling In a Cluster
Adding Components to a Cluster
Querying the Cluster Node List
Job Management APIs
Adding and Executing a Job
Querying Information About a Job
Querying a List of Jobs
Terminating a Job
Obtaining SQL Results
Deleting Jobs in Batches
Auto Scaling APIs
Viewing Auto Scaling Policies
Updating an Auto Scaling Policy
Deleting an AS policy
Creating an AS policy
Cluster HDFS File API
Obtaining the List of Files from a Specified Directory
SQL APIs
Submitting a SQL Statement
Querying SQL Results
Canceling a SQL Execution Task
Agency Management
Querying the Mapping Between a User (Group) and an IAM Agency
Updating the Mapping Between a User (Group) and an IAM Agency
Data Connection Management
Creating a Data Connection
Querying the Data Connection List
Updating a Data Connection
Deleting a Data Connection
Querying Version Metadata
Obtaining MRS Version List
Querying Available Specifications of an MRS Cluster Version
IAM Synchronization
Obtaining Synchronized IAM Users and User Groups
Synchronizing an IAM User and User Group
Cancelling Synchronization of Specified Users and User Groups
Tag Management APIs
Enabling or Disabling the Default Tag of a Cluster
Querying the Status of Default Cluster Tags
Querying Tag Quotas
API V1.1
Cluster Management APIs
Creating a Cluster and Executing a Job
Resizing a Cluster
Querying a Cluster List
Querying Cluster Details
Querying a Host List
Terminating a Cluster
Auto Scaling APIs
Configuring an Auto Scaling Rule
Tag Management APIs
Adding Tags to a Specified Cluster
Querying Tags of a Specified Cluster
Deleting Tags from a Specified Cluster
Adding Tags to a Cluster in Batches
Deleting Tags from a Cluster in Batches
Querying All Tags
Querying a List of Clusters with Specified Tags
Availability Zones
Querying AZ Information
Version Metadata
Querying the Metadata of a Cluster Version
Out-of-Date APIs
Job API Management (Deprecated)
Adding and Executing a Job (Deprecated)
Querying the exe Object List of Jobs (Deprecated)
Querying exe Object Details (Deprecated)
Deleting a Job Execution Object (Deprecated)
Permissions Policies and Supported Actions
Introduction
Appendix
ECS Specifications Used by MRS
BMS Specifications Used by MRS
Status Codes
Error Codes
Obtaining a Project ID
Obtaining Account ID
Obtaining the MRS Cluster Information
Roles and components supported by MRS
SDK Reference
SDK Overview
FAQs
MRS Basics
What Is MRS Used For?
What Types of Distributed Storage Does MRS Support?
What Are Regions and AZs?
Can I Change the Network Segment of Nodes in an MRS Cluster?
Can I Downgrade the Specifications of an MRS Cluster Node?
Are Hive Components of Different Versions Compatible with Each Other?
What Are the Differences Between OBS and HDFS in Data Storage?
What Are the Solutions for Processing 1 Billion Data Records?
What are the advantages of the compression ratio of zstd?
Billing
Why Is the Price Not Displayed During MRS Cluster Creation?
How Is Auto Scaling Billed for an MRS Cluster?
How Are Task Nodes in an MRS Cluster Billed?
Why Does My Unsubscription from ECS Fail After I Unsubscribe from MRS?
Cluster Creation
How Do I Create an MRS Cluster Using a Custom Security Group?
What Should I Do If HDFS, Yarn, and MapReduce Components Are Unavailable When I Buy an MRS Cluster?
What Should I Do If the ZooKeeper Component Is Unavailable When I Buy an MRS Cluster?
What Should I Do If Invalid Authentication Is Reported When I Submit an Order for Purchasing an MRS Cluster?
Web Page Access
How Do I Change the Session Timeout Duration for an Open Source Component Web UI?
What Can I Do If the Dynamic Resource Plan Page in MRS Tenant Management Cannot Be Refreshed?
What Do I Do If the Kafka Topic Monitoring Tab Is Not Displayed on Manager?
What CAN I DO IF an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of Components such as HDFS, Hue, Yarn, Flink, and HetuEngine?
How Do I Switch the Methods to Access MRS Manager?
Why Cannot I Find the User Management Page on MRS Manager?
What Can I Do If the Excel File Downloaded by Hue Cannot Be Opened?
Authentication and Permission
What Is the User for Logging in to FusionInsight Manager?
How Do I Query and Change the Password Validity Period of a User In a Cluster?
Does an MRS Cluster Support Access Permission Control If Kerberos Authentication Is not Enabled?
How Do I Add Tenant Management Permission to Users in a Cluster?
Does Hue Provide the Function of Configuring Account Permissions?
Why Can't I Submit Jobs on the Console After My IAM Account Is Assigned with MRS Permissions?
How Do I View the Hive Table Created by Another User?
How Do I Prevent Kerberos Authentication Expiration?
How Do I Enable or Disable Kerberos Authentication for an Existing MRS Cluster?
What Are the Ports of the Kerberos Authentication Service?
Client Usage
How Do I Disable SASL Authentication for ZooKeeper?
What Can I Do If the Error Message "Permission denied" Is Displayed When kinit Is Executed on a Client Outside the MRS Cluster?
What Should I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
How Do I Connect to Spark Shell from MRS?
How Do I Connect to Spark Beeline from MRS?
What Should I Do If the Connection to the ClickHouse Server Fails and Error Code 516 Is Reported?
Component Configurations
Does MRS Support Running Hive on Kudu?
Does an MRS Cluster Support Hive on Spark?
Can I Change the IP address of DBService?
What Access Protocols Does Kafka Support?
What Python Versions Are Supported by Spark Tasks in an MRS Cluster?
What Are the Restrictions on the Storm Log Size in an MRS 2.1.0 Cluster?
How Do I Modify the HDFS fs.defaultFS of an Existing Cluster?
Can MRS Run Multiple Flume Tasks at a Time?
How Do I Change FlumeClient Logs to Standard Logs?
Where Are the JAR Files and Environment Variables of Hadoop Stored?
How Do I View HBase Logs?
How Do I Set the TTL for an HBase Table?
How Do I Change the Number of HDFS Replicas?
How Do I Modify the HDFS Active/Standby Switchover Class?
What Data Type in Hive Tables Is Recommended for the Number Type of DynamoDB?
Can I Export the Query Result of Hive Data?
What Should I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
What Do I Do If "over max user connections" Is Displayed When Hue Connects to HiveServer?
How Do I View MRS Hive Metadata?
How Do I Reset Kafka Data?
What Access Protocols Are Supported by Kafka?
What Should I Do If the Error Message "Not Authorized to access group XXX" Is Displayed When Kafka Topics Are Consumed?
What Compression Algorithms Does Kudu Support?
How Do I View Kudu Logs?
How Do I Handle the Kudu Service Exceptions Generated During Cluster Creation?
How Do I Configure Other Data Sources on Presto?
How Do I Update the Ranger Certificate in MRS 1.9.3?
How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
How Do I Check the ResourceManager Configuration of Yarn?
How Do I Modify the allow_drop_detached Parameter of ClickHouse?
How Do I Add a Periodic Deletion Policy to Prevent Large ClickHouse System Table Logs?
How Do I Change the Time Zone of the ClickHouse Service?
Cluster Management
How Do I View All Clusters?
How Do I View MRS Operation Logs?
How Do I View MRS Cluster Configuration Information?
How Do I Add Components to an MRS Cluster?
How Do I Cancel Message Notification for Cluster Alarms?
Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
What Is the Python Version Installed for an MRS Cluster?
How Do I Upload a Local File to a Node Inside a Cluster?
What Can I Do If the Time Information of an MRS Cluster Node Is Incorrect?
What Are the Differences and Relationships Between the MRS Management Console and MRS Manager?
How Do I Unbind an EIP from FusionInsight Manager of an MRS Cluster?
How Do I Stop the Firewall Service?
How Do I Switch the Login Mode of a Node in an MRS Cluster?
How Do I Access an MRS Cluster from a Node Outside the Cluster?
In an MRS Streaming Cluster, Can the Kafka Topic Monitoring Function Send Alarm Notifications?
Where can I view the running resource queues when ALM-18022 Insufficient Yarn Queue Resources is generated?
How Do I Understand the Multi-Level Chart Statistics in the HBase Operation Requests Metric?
Node Management
What are the Operating Systems of Hosts in MRS Clusters of Different Versions?
Do I Need to Shut Down a Master Node Before Upgrading It?
Can I Change MRS Cluster Nodes on the MRS Console?
How Do I Query the Startup Time of an MRS Node?
What Do I Do If Trust Relationships Between Nodes Are Abnormal?
Can Master Node Specifications Be Adjusted in an MRS Cluster?
Can Sudo Logs of Nodes in an MRS Cluster Be Cleared?
How Do I Partition Disks in an MRS Cluster?
Does an MRS Cluster Support System Reinstallation?
Can I Change the OS of an MRS Cluster?
Component Management
Can I Delete Components Installed in an MRS Cluster?
How Do I View the Configuration File Directory of Each Component?
Will Upper-Layer Services Be Affected If the Hive Service Status Is Partially Healthy?
How Can I Obtain the IP Address and Port Number of a ZooKeeper Instance?
Job Management
What Types of Spark Jobs Can Be Submitted in a Cluster?
What Should I Do If Error 408 Is Reported When an MRS Node Accesses OBS?
How Do I Enable Different Service Programs to Use Different Yarn Queues?
What Should I Do If a Job Fails to Be Submitted and the Error Is Related to OBS?
Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
What Should I Do If Job Parameters Separated By Spaces Cannot Be Identified?
What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
How Do I View MRS Job Logs?
What Can I Do If the System Displays a Message Indicating that the Current User Does Not Exist on Manager When I Submit a Job?
What Can I Do If LauncherJob Fails to Be Executed and the Error Message "jobPropertiesMap is null" Is Displayed?
What Should I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
What Can I Do If a SparkStreaming Job Fails After Running for Dozens of Hours and Error 403 Is Reported for OBS Access?
What Should I Do If Error Message "java.io.IOException: Connection reset by peer" Is Displayed During the Execution of a Spark Job?
What Should I Do If the Error Message "requestId=XXX" Is Displayed When a Spark Job Accesses OBS?
What Should I Do If the Error Message "UnknownScannerExeception" Is Displayed for Spark Jobs?
What Can I Do If DataArts Studio Occasionally Fails to Schedule Spark Jobs?
What Should I Do If a Flink Job Fails to Execute and the Error Message "java.lang.NoSuchFieldError: SECURITY_SSL_ENCRYPT_ENABLED" Is Displayed?
What Should I Do If Submitted Yarn Jobs Cannot Be Viewed on the Web UI?
What Can I Do If launcher-job Is Terminated by Yarn When a Flink Task Is Submitted?
What Should I Do If the Error Message "slot request timeout" Is Displayed When I Submit a Flink Job?
FAQs About Importing and Exporting Data Using DistCP Jobs
How Do I View SQL Statements of Hive Jobs on the Yarn Web UI?
How Do I View Logs of a Specified Yarn Task?
What Should I Do If a HiveSQL/HiveScript Job Fails to be Submitted After Hive Is Added?
Where Are the Execution Logs of Spark Jobs Stored?
What Should I Do If an Alarm Indicating Insufficient Memory Is Reported During Spark Task Execution?
What Can I Do If an Alarm is Generated Because the NameNode Is not Restarted on Time After the hdfs-site.xml File Is Modified?
What Should I Do If It Takes a Long Time for Spark SQL to Access Hive Partitioned Tables Before a Job Starts?
Performance Tuning
How Do I Obtain the Hadoop Pressure Test Tool?
How Do I Improve the Resource Utilization of Core Nodes in a Cluster?
How Do I Configure the knox Memory?
How Do I Adjust the Memory Size of the manager-executor Process?
How Do I Configure Spark Jobs to Automatically Obtain More Resources During Execution?
What Should I Do If the spark.yarn.executor.memoryOverhead Setting Does Not Take Effect?
Application Development
How Do I Get My Data into OBS or HDFS?
Can MRS Write Data to HBase Through an HBase External Table of Hive?
Where Can I Download the Dependency Package (com.huawei.gaussc10) in the Hive Sample Project?
Does MRS Support Python Code?
Does OpenTSDB Support Python APIs?
How Do I Obtain a Spark JAR File?
How Do I Configure the node_id Parameter When Calling the API for Adjusting Cluster Nodes?
Peripheral Service Interconnection
Does MRS Support Read and Write Operations on DLI Service Tables?
Does OBS Support the ListObjectsV2 Protocol?
Can a Crawler Service Be Deployed on Nodes in an MRS Cluster?
Does MRS Support Secure Deletion?
How Do I Use PySpark to Connect MRS Spark?
Why Mapped Fields Do not Exist in the Database After HBase Synchronizes Data to CSS?
Can MRS Connect to an External KDC?
What Can I Do If Jetty Is Incompatible with MRS when Open-Source Kylin 3.x Is Interconnected with MRS 1.9.3?
What Should I Do If Data Failed to Be Exported from MRS to an OBS Encrypted Bucket?
How Do I Interconnect MRS with LTS?
How Do I Install HSS on MRS Cluster Nodes?
How Do I Connect to HBase of MRS Through HappyBase?
Can the Hive Driver Be Interconnected with DBCP2?
Upgrade and Patching
How Do I Upgrade an MRS Cluster?
Does MRS Support Kernel Version Upgrade of Components in a Cluster?
Troubleshooting
Account Passwords
Resetting or Changing the Password of Manager User admin
Failed to Download Authentication Credentials If the Username Is Too Long
Account Permissions
A Message Is Displayed Indicating That the User Does Not Have the Permission to Obtain the MRS Cluster Hosts
Failed to View MRS Cluster Details
Common Exceptions in Logging In to the Cluster Manager
Failed to Access Manager of an MRS Cluster
Accessing the Web Pages
Error "502 Bad Gateway" Is Reported During the Access to MRS Manager
An Error Message Occurs Indicating that the VPC Request Is Incorrect During the Access
Error 503 Is Reported When Manager Is Accessed Through Direct Connect
Error Message "You have no right to access this page." Is Displayed When Users log in to the Cluster Page
Error Message "Invalid credentials" Is Displayed When a User Logs In to Manager
Failed to Log In to the Manager After Timeout
Failed to Log In to MRS Manager After the Python Upgrade
Failed to Log In to MRS Manager After Changing the Domain Name
Manager Page Is Blank After a Success Login
Cluster Login Fails Because Native Kerberos Is Installed on Cluster Nodes
Using Google Chrome to Access MRS Manager on macOS
How Do I Unlock a User Who Logs in to Manager?
Why Does the Manager Page Freeze?
Common Exceptions in Accessing the MRS Web UI
How Do I Do If an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of HDFS, Hue, YARN, HetuEngine, and Flink?
Error 500 Is Reported When a User Accesses the Component Web UI
[HBase WebUI] Users cannot switch from the HBase WebUI to the RegionServer WebUI
[HDFS WebUI] When users access the HDFS WebUI, an error message is displayed indicating that the number of redirections is too large
[HDFS WebUI] Failed to access the HDFS WebUI using the Internet Explorer
[Hue Web UI] A "No Permission" Error Is Displayed When a User Log In to the Hue Web UI
[Hue Web UI] Failed to Access the Hue Web UI
[Hue WebUI] The error "Proxy Error" is reported when a user accesses the Hue WebUI
[Hue WebUI] Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
Hue (Active) Cannot Open Web Pages
[Ranger WebUI] Why Cannot a New User Log In to Ranger After Changing the Password?
[Tez WebUI] Error 404 is reported when users access the Tez WebUI
[Spark WebUI] Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
[Spark WebUI] What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
[Spark WebUI] Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
[Spark WebUI] The Spark2x WebUI fails to be accessed using the Internet Explorer
[Yarn Web UI] Failed to Access the Yarn Web UI
APIs
Failed to Call an API to Create a Cluster
Cluster Management
Failed to Reduce Task Nodes
OBS Certificate in a Cluster Expired
Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
Replacing a Disk in an MRS Cluster (Applicable to 3.x)
Failed to Execute an MRS Backup Task
Inconsistency Between df and du Command Output on the Core Node
Disassociating a Subnet from a Network ACL
MRS Cluster Becomes Abnormal After the Hostname of a Node Is Changed
Processes Are Terminated Unexpectedly
Failed to Configure Cross-Cluster Mutual Trust for MRS
Network Is Unreachable When Python Is Installed on an MRS Cluster Node Using Pip3
Connecting the Open-Source confluent-kafka-go to an MRS Security Cluster
Failed to Execute the Periodic Backup Task of an MRS Cluster
Failed to Download the MRS Cluster Client
An Error Is Reported When a Flink Job Is Submitted in an MRS Cluster with Kerberos Authentication Enabled
An Error Is Reported When the Insert Command Is Executed on the Hive Beeline CLI
Upgrading the OS to Fix Vulnerabilities for an MRS Cluster Node
Failed to Migrate Data to MRS HDFS Using CDM
Alarms Indicating Heartbeat Interruptions Between Nodes Are Frequently Generated in the MRS Cluster
High Memory Usage of the PMS Process
High Memory Usage of the Knox Process
It Takes a Long Time to Access HBase from a Client Outside a Security Cluster
Failed to Submit Jobs
OS Disk Space Is Insufficient Due to Oversized HBase Log Files
OS Disk Space Is Insufficient Due to Oversized HDFS Log Files
An Exception Occurs During Specifications Upgrade of Nodes in an MRS Cluster
Failed to Delete a New Tenant on FusionInsight Manager
MRS Cluster Becomes Unavailable After the VPC Is Changed
Failed to Submit Jobs on the MRS Console
Error "symbol xxx not defined in file libcrypto.so.1.1" Is Displayed During HA Certificate Generation
Some Instances Fail to Be Started After Core Nodes Are Added to the MRS Cluster
Using Alluixo
Error Message "Does not contain a valid host:port authority" Is Reported When Alluixo Is in HA Mode
Using ClickHouse
ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
An Exception Occurs When ClickHouse Consumes Kafka Data
Using DBService
DBServer Instance Is in Abnormal Status
DBServer Instance Remains in the Restoring State
Default Port 20050 or 20051 of DBService Is Occupied
DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
Failed to Execute a DBService Backup Task
Components Failed to Connect to DBService in Normal State
DBServer Failed to Start
DBService Backup Failed Because the Floating IP Address Is Unreachable
DBService Failed to Start Due to the Loss of the DBService Configuration File
Using Flink
Error Message "Error While Parsing YAML Configuration File: Security.kerberos.login.keytab" Is Displayed When a Command Is Executed on the Flink Client
Error Message "Error while parsing YAML configuration file : security.kerberos.login.principal:pippo" Is Displayed When a Command Is Executed on the Flink Client
Error Message "Could Not Connect to the Leading JobManager" Is Displayed When a Command Is Executed on the Flink Client
Failed to Create a Flink Cluster by Running yarn-session As Different Users
Flink Service Program Fails to Read Files on the NFS Disk
Failed to Customize the Flink Log4j Log Level
Using Flume
Class Cannot Be Found After Flume Submits Jobs to Spark Streaming
Failed to Install a Flume Client
A Flume Client Cannot Connect to the Server
Flume Data Fails to Be Written to the Component
Flume Server Process Fault
Flume Data Collection Is Slow
Failed to Start Flume
Using HBase
Slow Response to HBase Connection
Failed to Authenticate the HBase User
RegionServer Failed to Start Because the Port Is Occupied
HBase Failed to Start Due to Insufficient Node Memory
HBase Service Unavailable Due to Poor HDFS Performance
HBase Failed to Start Due to Inappropriate Parameter Settings
RegionServer Failed to Start Due to Residual Processes
HBase Failed to Start Due to a Quota Set on HDFS
HBase Failed to Start Due to Corrupted Version Files
High CPU Usage Caused by Zero-Loaded RegionServer
HBase Failed to Start with "FileNotFoundException" in RegionServer Logs
The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
RegionServer Instance Is in the Restoring State
HBase Failed to Start in a Newly Installed Cluster
HBase Failed to Start Due to the Loss of the ACL Table Directory
HBase Failed to Start After the Cluster Is Powered Off and On
Failed to Import HBase Data Due to Oversized File Blocks
Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
Failed to Run the hbase shell Command on the MRS Cluster Client
Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
HBase Failed to Start Due to Insufficient RegionServer Memory
Failed to Start HRegionServer on the Node Newly Added to the Cluster
Region in the RIT State for a Long Time Due to HBase File Loss
Using HDFS
HDFS NameNode Instances Become Standby After the RPC Port Is Changed
An Error Is Reported When the HDFS Client Is Connected Through a Public IP Address
Failed to Use Python to Remotely Connect to the Port of HDFS
HDFS Capacity Reaches 100%, Causing Unavailable Upper-Layer Services Such as HBase and Spark
Error Message "Permission denied" Is Displayed When HDFS and Yarn Are Started
HDFS Users Can Create or Delete Files in Directories of Other Users
A DataNode of HDFS Is Always in the Decommissioning State
HDFS NameNode Failed to Start Due to Insufficient Memory
A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
CPU Usage of DataNodes Is Close to 100% Occasionally, Causing Node Loss
Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
Error "Failed to place enough replicas" Is Reported When HDFS Reads or Writes Files
Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
HDFS Client File Fails to Be Closed After Data Writing
File Fails to Be Uploaded to HDFS Due to File Errors
After dfs.blocksize Is Configured on the UI and Data Is Uploaded, the Block Size Does Not Change
HDFS File Fails to Be Read, and Error Message "FileNotFoundException" Is Displayed
Failed to Write Files to HDFS, and Error Message "item limit of xxx is exceeded" Is Displayed
Adjusting the Log Level of the HDFS SHDFShell Client
HDFS File Read Fails, and Error Message "No common protection layer" Is Displayed
Failed to Write Files Because the HDFS Directory Quota Is Insufficient
Balancing Fails, and Error Message "Source and target differ in block-size" Is Displayed
Failed to Query or Delete HDFS Files
Uneven Data Distribution Due to Non-HDFS Data Residuals
Uneven Data Distribution Due to HDFS Client Installation on the DataNode
Unbalanced DataNode Disk Usages of a Node
Locating Common Balance Problems
HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
Error Message "error creating DomainSocket" Is Displayed When the HDFS Client Installed on the Core Node in a Normal Cluster Is Used
HDFS Files Fail to Be Uploaded When the Client Is Installed on a Node Outside the Cluster
Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
HDFS Client Failed to Delete Overlong Directories
An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
"ALM-12027 Host PID Usage Exceeds the Threshold" Is Generated for a NameNode
ALM-14012 JournalNode Is Out of Synchronization Is Generated in the Cluster
Failed to Decommission a DataNode Due to HDFS Block Loss
An Error Is Reported When DistCP Is Used to Copy an Empty Folder
Using Hive
Common Hive Logs
Failed to Start Hive
Error Message "Cannot modify xxx at runtime" Is Displayed When the set Command Is Executed in a Security Cluster
Specifying a Queue When Submitting a Hive Task
Setting the Map/Reduce Memory on the Client
Specifying the Output File Compression Format When Importing a Hive Table
Description of the Hive Table Is Too Long to Be Completely Displayed
NULL Is Displayed When Data Is Inserted After the Partition Column Is Added to a Hive Table
New User Created in the Cluster Does Not Have the Permission to Query Hive Data
An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
An Error Is Reported When the "load data inpath" Command Is Executed
An Error Is Reported When the "load data local inpath" Command Is Executed
An Error Is Reported When the create external table Command Is Executed
An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
Insufficient Permissions to Execute the set role admin Command
An Error Is Reported When a UDF Is Created on the Beeline Client
Hive Is Faulty
Hive Is Partially Healthy
Difference Between Hive Service Health Status and Hive Instance Health Status
"authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
Failed to Access ZooKeeper from the Client
"Invalid function" Is Displayed When a UDF Is Used
Hive Service Status Is Unknown
Health Status of a HiveServer or MetaStore Instance Is unknown
Health Status of a HiveServer or MetaStore Instance Is Concerning
Garbled Characters Returned Upon a Query If Text Files Are Compressed Using ARC4
Hive Task Failed to Run on the Client but Successful on Yarn
Error Message "Execution Error return code 2" Is Displayed When the SELECT Statement Is Executed
Failed to Perform drop partition When There Are a Large Number of Partitions
Failed to Start the Local Task When the Join Operation Is Performed
WebHCat Fails to Be Started After the Hostname Is Changed
An Error Is Reported When the Hive Sample Program Is Running After the Domain Name of a Cluster Is Changed
Hive MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
Error Message "Failed to execute session hooks: over max connections" Is Displayed on the Beeline Client
Error Message "OutOfMemoryError" Is Displayed on the Beeline Client
Task Execution Fails Because the Input File Number Exceeds the Threshold
Hive Task Execution Fails Because of Stack Memory Overflow
Task Failed Due to Concurrent Writes to One Table or Partition
Hive Task Failed Due to a Lack of HDFS Directory Permission
Failed to Load Data to Hive Tables
Failed to Run the Application Developed Based on the Hive JDBC Code Case
HiveServer and HiveHCat Process Faults
Error Message "ConnectionLoss for hiveserver2" Is Displayed When MRS Hive Connects to ZooKeeper
An Error Is Reported When Hive Executes the insert into Statement
Timeout Reported When Adding the Hive Table Field
Failed to Restart Hive
Failed to Delete a Table Due to Excessive Hive Partitions
An Error Is Reported When msck repair table Is Executed on Hive
Insufficient User Permission for Running the insert into Command on Hive
Releasing Disk Space After Dropping a Table in Hive
Abnormal Hive Query Due to Damaged Data in the JSON Table
Connection Timed Out During SQL Statement Execution on the Hive Client
WebHCat Failed to Start Due to Abnormal Health Status
WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
An SQL Error Is Reported When the Number of MetaStore Dynamic Partitions Exceeds the Threshold
Using Hue
An Unknown Job Is Running on the Hue Page
HQL Fails to Be Executed on Hue Using Internet Explorer
Failed to Access the Hue Web UI
HBase Tables Cannot Be Loaded on the Hue Web UI
Chinese Characters Entered in the Hue Text Box Are Displayed Incorrectly
An Error Is Reported If the Query Result of an Impala SQL Statement Executed on Hue Contains Chinese Characters
Using Impala
Failed to Connect to impala-shell
Failed to Create a Kudu Table
Installing Python2 on the Impala Client
Using Kafka
An Error Is Reported When the Kafka Client Is Run to Obtain Topics
Using Python3.x to Connect to Kafka in a Security Cluster
Flume Normally Connects to Kafka but Fails to Send Messages
Producer Fails to Send Data and Error Message "NullPointerException" Is Displayed
Producer Fails to Send Data and Error Message "TOPIC_AUTHORIZATION_FAILED" Is Displayed
Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
Consumer Is Initialized Successfully, but the Specified Topic Message Cannot Be Obtained from Kafka
Consumer Fails to Consume Data and Remains in the Waiting State
SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
Consumer Fails to Consume Data in a Newly Created Cluster, and Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
SparkStreaming Fails to Consume Kafka Messages, and Message "Couldn't find leader offsets" Is Displayed
Consumer Fails to Consume Data and Message "SchemaException: Error reading field" Is Displayed
Kafka Consumer Loses Consumed Data
Failed to Start Kafka Due to Account Lockout
Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
Kafka Topics Cannot Be Deleted
Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
Consumer Repeatedly Consumes Data
Leader for the Created Kafka Topic Partition Is Displayed as none
Safety Instructions on Using Kafka
Obtaining Kafka Consumer Offset Information
Adding or Deleting Configurations for a Topic
Reading the Content of the __consumer_offsets Internal Topic
Configuring Logs for Shell Commands on the Kafka Client
Obtaining Topic Distribution Information
Kafka HA Usage Description
Failed to Manage a Kafka Cluster Using the Kafka Shell Command
Kafka Producer Writes Oversized Records
Kafka Consumer Reads Oversized Records
High Usage of Multiple Disks on a Kafka Cluster Node
Kafka Is Disconnected from the ZooKeeper Client
Using Oozie
Oozie Jobs Do Not Run When a Large Number of Jobs Are Submitted Concurrently
An Error Is Reported When Oozie Schedules HiveSQL Jobs
Oozie Tasks Cannot Be Submitted on a Client Outside the MRS Cluster or Can Be Submitted Only Two Hours Later
Using Presto
During sql-standard-with-group Configuration, a Schema Fails to Be Created and the Error Message "Access Denied" Is Displayed
Coordinator Process of Presto Cannot Be Started
When Presto Queries a Kudu Table, an Error Is Reported Indicating That the Table Cannot Be Found
No Data is Found in the Hive Table Using Presto
Error Message "The node may have crashed or be under too much load" Is Displayed During MRS Presto Query
Accessing Presto from an MRS Cluster Through a Public Network
Using Spark
An Error Is Reported When the Split Size Is Changed for a Running Spark Application
Incorrect Parameter Format Is Displayed When a Spark Task Is Submitted
Spark, Hive, and Yarn Are Unavailable Due to Insufficient Disk Capacity
A Spark Job Fails to Run Due to Incorrect JAR File Import
Spark Job Suspended Due to Insufficient Memory or Lack of JAR Packages
Error "ClassNotFoundException" Is Reported When a Spark Task Is Submitted
Driver Displays a Message Indicating That the Running Memory Exceeds the Threshold When a Spark Task Is Submitted
Error "Can't get the Kerberos realm" Is Reported When a Spark Task Is Submitted in Yarn-Cluster Mode
Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
ApplicationMaster Fails to Start Twice When a Spark Task Is Submitted in Yarn-client Mode
Failed to Connect to ResourceManager When a Spark Task Is Submitted
DataArts Studio Failed to Schedule Spark Jobs
Job Status Is error After a Spark Job Is Submitted Through an API
ALM-43006 Is Repeatedly Reported for the MRS Cluster
Failed to Create or Delete a Table in Spark Beeline
Failed to Connect to the Driver When a Spark Job Is Submitted on a Node Outside the Cluster
Large Number of Shuffle Results Are Lost During Spark Task Execution
Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
Spark Task Submission Failure
Spark Task Execution Failure
JDBCServer Connection Failure
Failed to View Spark Task Logs
Spark Streaming Task Submission Issues
Authentication Fails When Spark Connects to Other Services
Authentication Fails When Spark Connects to Kafka
An Error Occurs When SparkSQL Reads the ORC Table
Failed to Switch to the Log Page from stderr and stdout on the Native Spark Web UI
An Error Is Reported When spark-beeline Is Used to Query a Hive View
Using Sqoop
Connecting Sqoop to MySQL
Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
An Error Is Reported When a Sqoop Task Is Created Using Hue to Import Data from HBase to HDFS
A Data Format Error Is Reported When Data Is Exported from Hive to MySQL 8.0 Using Sqoop
An Error Is Reported When the sqoop import Command Is Executed to Extract Data from PgSQL to Hive
Failed to Use Sqoop to Read MySQL Data and Write Parquet Files to OBS
An Error Is Reported When Database Data Is Migrated Using Sqoop
Using Storm
Invalid Hyperlink of Events on the Storm Web UI
Failed to Submit the Storm Topology
Failed to Submit the Storm Topology and Message "Failed to check principle for keytab" Is Displayed
Worker Logs Are Empty After the Storm Topology Is Submitted
Worker Runs Abnormally After the Storm Topology Is Submitted and Error "Failed to bind to XXX" Is Displayed
"well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
Data Cannot Be Written to Bolts When the Storm-JDBC Plug-in Is Used to Develop Oracle Databases
Internal Server Error Is Displayed When the User Queries Information on the Storm UI
Using Ranger
After Ranger Authentication Is Enabled for Hive, Unauthorized Tables and Databases Can Be Viewed on the Hue Page
Using Yarn
A Large Number of Jobs Occupying Resources After Yarn Is Started in a Cluster
Error "GC overhead" Is Reported When Tasks Are Submitted Using the hadoop jar Command on the Client
Disk Space of a Node Is Used Up Due to Oversized Aggregated Logs of Yarn
Temporary Files Are Not Deleted When a MapReduce Job Is Abnormal
Incorrect Port Information of the Yarn Client Causes Error "connection refused" After a Task Is Submitted
"Could not access logs page!" Is Displayed When Job Logs Are Queried on the Yarn Web UI
Error "ERROR 500" Is Displayed When Queue Information Is Queried on the Yarn Web UI
Error "ERROR 500" Is Displayed When Job Logs Are Queried on the Yarn Web UI
An Error Is Reported When a Yarn Client Command Is Used to Query Historical Jobs
Number of Files in the TimelineServer Directory Reaches the Upper Limit
Using ZooKeeper
An Error Is Reported When the MRS Client Is Used to Connect to ZooKeeper
ZooKeeper Is Unavailable Because of Non-synchronized Time Between Active and Standby Master Nodes
Storage-Compute Decoupling
A User Without the Permission on the /tmp Directory Failed to Execute a Job for Accessing OBS
When the Hadoop Client Is Used to Delete Data from OBS, It Does Not Have the Permission for the .Trash Directory
An MRS Cluster Fails Authentication When Accessing OBS Because the NTP Time of Cluster Nodes Is Not Synchronized
Videos
Glossary
More Documents
User Guide (ME-Abu Dhabi Region)
Overview
What Is MRS?
Application Scenarios
Components
Alluxio
CarbonData
ClickHouse
DBService
DBService Basic Principles
Relationship Between DBService and Other Components
Flink
Flink Basic Principles
Flink HA Solution
Relationship with Other Components
Flink Enhanced Open Source Features
Window
Job Pipeline
Configuration Table
Stream SQL Join
Flink CEP in SQL
Flume
Flume Basic Principles
Relationship Between Flume and Other Components
Flume Enhanced Open Source Features
HBase
HBase Basic Principles
HBase HA Solution
Relationship with Other Components
HBase Enhanced Open Source Features
HDFS
HDFS Basic Principles
HDFS HA Solution
Relationship Between HDFS and Other Components
HDFS Enhanced Open Source Features
Hive
Hive Basic Principles
Hive CBO Principles
Relationship Between Hive and Other Components
Enhanced Open Source Feature
Hue
Hue Basic Principles
Relationship Between Hue and Other Components
Hue Enhanced Open Source Features
Impala
Kafka
Kafka Basic Principles
Relationship Between Kafka and Other Components
Kafka Enhanced Open Source Features
KafkaManager
KrbServer and LdapServer
KrbServer and LdapServer Principles
KrbServer and LdapServer Enhanced Open Source Features
Kudu
Loader
Loader Basic Principles
Relationship Between Loader and Other Components
Loader Enhanced Open Source Features
Manager
Manager Basic Principles
Manager Key Features
MapReduce
MapReduce Basic Principles
Relationship Between MapReduce and Other Components
MapReduce Enhanced Open Source Features
Oozie
Oozie Basic Principles
Oozie Enhanced Open Source Feature
Presto
Ranger
Ranger Basic Principles
Relationship Between Ranger and Other Components
Spark
Basic Principles of Spark
Spark HA Solution
Relationship Among Spark, HDFS, and Yarn
Spark Enhanced Open Source Feature: Optimized SQL Query of Cross-Source Data
Spark2x
Basic Principles of Spark2x
Spark2x HA Solution
Spark2x Multi-active Instance
Spark2x Multi-tenant
Relationship Between Spark2x and Other Components
Spark2x Open Source New Features
Spark2x Enhanced Open Source Features
CarbonData Overview
Enhanced SQL Query of Multi-sourced Data
Storm
Storm Basic Principles
Relationship Between Storm and Other Components
Storm Enhanced Open Source Features
Tez
Yarn
Yarn Basic Principles
Yarn HA Solution
Relationship Between Yarn and Other Components
Yarn Enhanced Open Source Features
ZooKeeper
ZooKeeper Basic Principle
Relationship Between ZooKeeper and Other Components
ZooKeeper Enhanced Open Source Features
Functions
Multi-tenant
Security Hardening
Easy Access to Web UIs of Components
Reliability Enhancement
Job Management
Bootstrap Actions
Metadata
Cluster Management
Cluster Lifecycle Management
Manually Scale Out/In a Cluster
Auto Scaling
Task Node Creation
Isolating a Host
Managing Tags
Cluster O&M
Message Notification
Constraints
Permissions Management
Related Services
IAM Permissions Management
Creating a User and Granting Permissions
Creating MRS Custom Policies
Synchronizing IAM Users to MRS
MRS Quick Start
How to Use MRS
Creating a Cluster
Uploading Data and Programs
Creating a Job
Terminating a Cluster
Configuring a Cluster
Overview
Cluster List
Methods of Creating MRS Clusters
Quick Creation of a Hadoop Analysis Cluster
Quick Creation of an HBase Analysis Cluster
Quick Creation of a Kafka Streaming Cluster
Quick Creation of a ClickHouse Cluster
Quick Creation of a Real-time Analysis Cluster
Creating a Custom Cluster
Customizing a Topology Cluster
Adding a Tag to a Cluster
Communication Security Authorization
Installing the Third-Party Software Using Bootstrap Actions
Introduction to Bootstrap Actions
Preparing the Bootstrap Action Script
View Execution Records
Adding a Bootstrap Action
Managing an Existing Cluster
Managing and Monitoring a Cluster
Viewing Basic Cluster Information
Viewing Cluster Patch Information
Viewing and Customizing Cluster Monitoring Metrics
Managing Components and Monitoring Hosts
Manually Scaling Out a Cluster
Manually Scaling In a Cluster
Configuring an Auto Scaling Rule
Configuring Auto Scaling Rules When Creating a Cluster
Changing the Subnet of a Cluster
Configuring Message Notification
O&M
Authorizing O&M
Sharing Logs
Terminating a Cluster
Deleting a Failed Task
Job Management
Introduction to MRS Jobs
Running a MapReduce Job
Running a SparkSubmit Job
Running a HiveSQL Job
Running a SparkSql Job
Running a Flink Job
Running a Kafka Job
Viewing Job Configuration and Logs
Stopping a Job
Deleting a Job
Using Encrypted OBS Data for Job Running
Configuring Job Notification Rules
Importing and Exporting Data
Component Management
Object Management
Viewing Configuration
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configuration
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Managing a Host (Node)
Isolating a Host
Canceling Host Isolation
Starting and Stopping a Cluster
Synchronizing Cluster Configuration
Exporting Cluster Configuration
Performing Rolling Restart
Alarm Management
Viewing the Alarm List
Viewing the Event List
Viewing and Manually Clearing an Alarm
Patch Management
Patch Operation Guide for Versions Earlier Than MRS 3.x
Rolling Patches
Restoring Patches for the Isolated Hosts
Health Check Management
Before You Start
Performing a Health Check
Viewing and Exporting a Health Check Report
DBService Health Check Indicators
Flume Health Check Indicators
HBase Health Check Indicators
Host Health Check Indicators
HDFS Health Check Indicators
Hive Health Check Indicators
Kafka Health Check Indicators
KrbServer Health Check Indicators
LdapServer Health Check Indicators
Loader Health Check Indicators
MapReduce Health Check Indicators
OMS Health Check Indicators
Spark Health Check Indicators
Storm Health Check Indicators
Yarn Health Check Indicators
ZooKeeper Health Check Indicators
Tenant Management
Before You Start
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a Tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Backup and Restoration
Before You Start
Introduction
Backing Up Metadata
Restoring Metadata
Modifying Backup Tasks
Viewing Backup and Restoration Tasks
MRS Multi-User Permission Management
Users and Permissions of MRS Clusters
Default Users of Clusters with Kerberos Authentication Enabled
Creating a Role
Creating a User Group
Creating a User
Modifying User Information
Locking a User
Unlocking a User
Deleting a User
Changing the Password of an Operation User
Initializing the Password of a System User
Downloading a User Authentication File
Modifying a Password Policy
Configuring Cross-Cluster Mutual Trust Relationships
Configuring Users to Access Resources of a Trusted Cluster
Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
Managing Historical Clusters
Viewing Basic Information of a Historical Cluster
Viewing Operation Logs
Metadata
Configuring Data Connections
Configuring Ranger Data Connections
Connecting to Clusters
Logging In to a Cluster
Cluster Node Overview
Logging In to an ECS
Determining Active and Standby Management Nodes of Manager
Accessing Manager
Accessing Manager
Accessing FusionInsight Manager (MRS 3.x or Later)
Accessing Web Pages of Open Source Components Managed in MRS Clusters
Web UIs of Open Source Components
List of Open Source Component Ports
Access Through Direct Connect
EIP-based Access
Access Using a Windows ECS
Creating an SSH Channel for Connecting to an MRS Cluster and Configuring the Browser
Using an MRS Client
Installing a Client
Installing a Client (Version 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
Using the Client of Each Component
Using a ClickHouse Client
Using a Flink Client
Using a Flume Client
Using an HBase Client
Using an HDFS Client
Using a Hive Client
Using an Impala Client
Using a Kafka Client
Using a Kudu Client
Using the Oozie Client
Using a Storm Client
Using a Yarn Client
MRS Manager Operation Guide (Applicable to 2.x and Earlier Versions)
Introduction to MRS Manager
Checking Running Tasks
Monitoring Management
Dashboard
Managing Services and Monitoring Hosts
Managing Resource Distribution
Configuring Monitoring Metric Dumping
Alarm Management
Viewing and Manually Clearing an Alarm
Configuring an Alarm Threshold
Configuring Syslog Northbound Interface Parameters
Configuring SNMP Northbound Interface Parameters
Object Management
Managing Objects
Viewing Configurations
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configurations
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Managing a Host
Isolating a Host
Canceling Host Isolation
Starting or Stopping a Cluster
Synchronizing Cluster Configurations
Exporting Configuration Data of a Cluster
Log Management
About Logs
Manager Log List
Viewing and Exporting Audit Logs
Exporting Service Logs
Configuring Audit Log Exporting Parameters
Health Check Management
Performing a Health Check
Viewing and Exporting a Health Check Report
Configuring the Number of Health Check Reports to Be Reserved
Managing Health Check Reports
DBService Health Check Indicators
Flume Health Check Indicators
HBase Health Check Indicators
Host Health Check Indicators
HDFS Health Check Indicators
Hive Health Check Indicators
Kafka Health Check Indicators
KrbServer Health Check Indicators
LdapServer Health Check Indicators
Loader Health Check Indicators
MapReduce Health Check Indicators
OMS Health Check Indicators
Spark Health Check Indicators
Storm Health Check Indicators
Yarn Health Check Indicators
ZooKeeper Health Check Indicators
Static Service Pool Management
Viewing the Status of a Static Service Pool
Configuring a Static Service Pool
Tenant Management
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Backup and Restoration
Introduction
Backing Up Metadata
Restoring Metadata
Modifying a Backup Task
Viewing Backup and Restoration Tasks
Security Management
Default Users of Clusters with Kerberos Authentication Disabled
Default Users of Clusters with Kerberos Authentication Enabled
Changing the Password of an OS User
Changing the password of user admin
Changing the Password of the Kerberos Administrator
Changing the Passwords of the LDAP Administrator and the LDAP User
Changing the Password of a Component Running User
Changing the Password of the OMS Database Administrator
Changing the Password of the Data Access User of the OMS Database
Changing the Password of a Component Database User
Updating Cluster Keys
Permissions Management
Creating a Role
Creating a User Group
Creating a User
Modifying User Information
Locking a User
Unlocking a User
Deleting a User
Changing the Password of an Operation User
Initializing the Password of a System User
Downloading a User Authentication File
Modifying a Password Policy
Patch Operation Guide
Patch Operation Guide for Versions
Supporting Rolling Patches
Restoring Patches for the Isolated Hosts
Rolling Restart
FusionInsight Manager Operation Guide (Applicable to 3.x)
Getting Started
FusionInsight Manager Introduction
Querying the FusionInsight Manager Version
Logging In to FusionInsight Manager
Logging In to the Management Node
Homepage
Overview
Managing the Monitoring Indicator Report
Cluster
Cluster Management
Overview
Performing a Rolling Restart of a Cluster
Managing Expired Configurations
Downloading the Client
Modifying Cluster Properties
Management Cluster Configuration
Static Service Pool
Static Service Resources
Configuring Cluster Static Resources
Viewing Cluster Static Resources
Client Management
Managing the Client
Batch Upgrading Clients
Updating the hosts File in Batches
Managing a Service
Overview
Other Service Management Operations
Service Details Page
Performing Active/Standby Switchover of a Role Instance
Resource Monitoring
Collecting Stack Information
Switching Ranger Authentication
Service Configuration
Modifying Service Configuration Parameters
Modifying Customized Configuration Parameters of a Service
Instance Management
Instance Management Overview
Decommissioning and Recommissioning an Instance
Managing Instance Configurations
Viewing the Instance Configuration File
Instance Group
Managing Instance Groups
Viewing Information About an Instance Group
Configuring Instantiation Group Parameters
Hosts
Host Management Page
Viewing the Host List
Viewing the Host Dashboard
Checking Processes and Resources on the Active Node
Host Maintenance Operations
Starting and Stopping All Instances on a Host
Performing a Host Health Check
Configuring Racks for Hosts
Isolating a Host
Exporting Host Information
Resource Overview
Distribution
Trend
Cluster
Host
O&M
Alarms
Overview of Alarms and Events
Configuring the Threshold
Configuring the Alarm Masking Status
Log
Online Log Searching
Log Downloadind
Perform a Health Check
Viewing a Health Check Task
Managing Health Check Reports
Modifying Health Check Configuration
Configuring Backup and Backup Restoration
Creating a Backup Task
Creating a Backup Restoration Task
Managing Backup and Backup Restoration Tasks
Audit
Overview
Configuring Audit Log Dumping
Tenant Resources
Introduction to Multi-Tenant
Overview
Technical Principles
Multi-Tenant Management
Models Related to Multi-Tenant
Resource Overview
Dynamic Resources
Storage Resource
Multi-Tenant Use
Overview
Process Overview
Using the Superior Scheduler in Multi-Tenant Scenarios
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing a Tenant Directory
Restoring Tenant Data
Deleting a Tenant
Managing Resources
Add a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Managing Global User Policies
Using the Capacity Scheduler in Multi-Tenant Scenarios
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing a Tenant Directory
Restoring Tenant Data
Deleting a Tenant
Clearing Unassociated Queues of a Tenant in Capacity Scheduler Mode
Managing Resources
Add a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Switching the Scheduler
System Configuration
Configuring Permission
Managing Users
Creating a User
Modifying User Information
Exporting User Information
Locking a User
Unlocking a User
Deleting a User
Changing a User Password
Initializing a Password
Exporting an Authentication Credential File
Managing User Groups
Managing Roles
Security Policy
Configuring Password Policies
Configuring the Independent Attribute
Configuring Interconnections
Configuring SNMP Northbound Parameters
Configuring Syslog Northbound Parameters
Configuring Monitoring Indicator Data Dump
Importing a Certificate
OMS Management
Overview of the OMS Maintenance Page
Changing the OMS Database Password
Modifying OMS Service Configuration Parameters
Component Management
Viewing Component Packages
Cluster Management
Configuring Client
Installing a Client
Using a Client
Updating the Configuration of the Installed Client
Managing Mutual Trust Relationships Between Managers
Introduction to Mutual Trust Relationships Between Clusters
Changing Manager System Domain Name
Configuring Cross-Manager Cluster Mutual Trust Relationships
Assigning User Permissions After Cross-Cluster Mutual Trust Is Configured
Configuring Periodical Alarm and Audit Information Backup
Modifying the Manager Routing Table
Switching to Maintenance Mode
Routine Maintenance
Log Management
About Logs
Manager Log List
Configuring the Log Level and Log File Size
Configuring the Number of Local Backup Audit Log Files
Viewing Role Instance Logs
Backup and Recovery Management
Introduction
Enabling Cross-Cluster Replication
Backing Up Data
Backing Up Manager Data
Backing Up DBService Data
Backing Up HBase Metadata
Backing Up HBase Service Data
Backing Up NameNode Data
Backing Up HDFS Service Data
Backing Up Hive Service Data
Backing Up Kafka Metadata
Backing Up Yarn Resource Pool Data
Recovering Data
Recovering Manager Data
Recovering DBService Data
Recovering HBase Metadata
Recovering HBase Service Data
Recovering NameNode Data
Recovering HDFS Service Data
Recovering Hive Service Data
Recovering Kafka Metadata
Recovering Yarn Data
Managing Local Quick Recovery Tasks
Modifying a Backup Task
Viewing Backup and Recovery Tasks
Security Management
Security Overview
Rights Model
Rights Mechanism
Authentication Policies
Permission Verification Policies
User Information Overview
Definitions
FusionInsight Manager Security Functions
Account Management
Account Security Settings
Unlocking LDAP Users and Management Accounts
Unlocking an Internal System User
Enabling and Disabling Permission Verification on Cluster Components
Logging In to a Non-Cluster Node Using a Cluster User in Normal Mode
Changing the Password for a System User
Changing the Password for User admin
Changing the Password for an OS User
Changing the Password for a System Internal User
Changing the Password for the Kerberos Administrator
Changing the Password for the OMS Kerberos Administrator
Changing the Passwords of the LDAP Administrator and the LDAP User (Including OMS LDAP)
Changing the Password for the LDAP Administrator
Changing the Password for a Component Running User
Changing the Password for a Database User
Changing the Password for the OMS Database Administrator
Changing the Password for the OMS Database Data Access User
Changing the Password for a Component Database User
Changing the Password for User omm in DBService
Security Hardening
Hardening Policy
Configuring a Trusted IP Address to Access LDAP
HFile and WAL Encryption
Security Configuration
Configuring an IP Address Whitelist for Modifications Allowed by HBase
Updating a Key for a Cluster
Hardening the LDAP
Configuring Kafka Data Encryption During Transmission
Configuring HDFS Data Encryption During Transmission
Configuring Communication Authentication for Storm Processes
Encrypting the Communication Between Controller and Agent
Updating SSH Keys for User omm
Security Maintenance
Account Maintenance Suggestions
Password Maintenance Suggestions
Logs Maintenance Suggestions
Security Statement
Data Backup and Restoration
HDFS Data
Hive Metadata
Hive Data
HBase Data
Kafka Data
Storage-Compute Decoupling Operation Guide
Configuring a Storage-Compute Decoupled Cluster (Agency)
Configuring a Storage-Compute Decoupled Cluster (AK/SK)
Using a Storage-Compute Decoupled Cluster
Interconnecting Hive with OBS
Interconnecting Flink with OBS
Interconnecting Spark2x with OBS
Interconnecting HDFS with OBS
Interconnecting MapReduce with OBS
Security
Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled
Security Authentication Principles and Mechanisms
High-Risk Operations Overview
FAQs
MRS Overview
What Is MRS Used For?
What Types of Distributed Storage Does MRS Support?
How Do I Create an MRS Cluster Using a Custom Security Group?
How Do I Use MRS?
How Does MRS Ensure Security of Data and Services?
Can I Configure a Phoenix Connection Pool?
Does MRS Support Change of the Network Segment?
Can I Downgrade the Specifications of an MRS Cluster Node?
What Is the Relationship Between Hive and Other Components?
Does an MRS Cluster Support Hive on Spark?
What Are the Differences Between Hive Versions?
Which MRS Cluster Version Supports Hive Connection and User Synchronization?
What Are the Differences Between OBS and HDFS in Data Storage?
How Do I Obtain the Hadoop Pressure Test Tool?
What Is the Relationship Between Impala and Other Components?
Statement About the Public IP Addresses in the Open-Source Third-Party SDK Integrated by MRS
What Is the Relationship Between Kudu and HBase?
Does MRS Support Running Hive on Kudu?
What Are the Solutions for processing 1 Billion Data Records?
Can I Change the IP address of DBService?
Can I Clear MRS sudo Logs?
Is the Storm Log also limited to 20 GB in MRS cluster 2.1.0?
What Is Spark ThriftServer?
What Access Protocols Are Supported by Kafka?
What If Error 408 Is Reported When an MRS Node Accesses OBS?
What Is the Compression Ratio of zstd?
Why Are the HDFS, YARN, and MapReduce Components Unavailable When an MRS Cluster Is Created?
Why Is the ZooKeeper Component Unavailable When an MRS Cluster Is Created?
Which Python Versions Are Supported by Spark Tasks in an MRS 3.1.0 Cluster?
How Do I Enable Different Service Programs to Use Different YARN Queues?
Differences and Relationships Between the MRS Management Console and Cluster Manager
How Do I Unbind an EIP from an MRS Cluster Node?
Account and Password
What Is the Account for Logging In to Manager?
How Do I Query and Change the Password Validity Period of an Account?
Accounts and Permissions
Does an MRS Cluster Support Access Permission Control If Kerberos Authentication Is not Enabled?
How Do I Assign Tenant Management Permission to a New Account?
How Do I Customize an MRS Policy?
Why Is the Manage User Function Unavailable on the System Page on MRS Manager?
Does Hue Support Account Permission Configuration?
Client Usage
How Do I Configure Environment Variables and Run Commands on a Component Client?
How Do I Disable ZooKeeper SASL Authentication?
An Error Is Reported When the kinit Command Is Executed on a Client Node Outside an MRS Cluster
Web Page Access
How Do I Change the Session Timeout Duration for an Open Source Component Web UI?
Why Cannot I Refresh the Dynamic Resource Plan Page on MRS Tenant Tab?
What Do I Do If the Kafka Topic Monitoring Tab Is Unavailable on Manager?
How Do I Do If an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of HDFS, Hue, YARN, and Flink?
Alarm Monitoring
In an MRS Streaming Cluster, Can the Kafka Topic Monitoring Function Send Alarm Notifications?
Where Can I View the Running Resource Queues When the Alarm "ALM-18022 Insufficient Yarn Queue Resources" Is Reported?
How Do I Understand the Multi-Level Chart Statistics in the HBase Operation Requests Metric?
Performance Tuning
Does an MRS Cluster Support System Reinstallation?
Can I Change the OS of an MRS Cluster?
How Do I Improve the Resource Utilization of Core Nodes in a Cluster?
How Do I Stop the Firewall Service?
Job Development
How Do I Get My Data into OBS or HDFS?
What Types of Spark Jobs Can Be Submitted in a Cluster?
Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
How Do I View MRS Job Logs?
How Do I Do If the Message "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page." Is Displayed?
LauncherJob Job Execution Is Failed And the Error Message "jobPropertiesMap is null." Is Displayed
How Do I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
How Do I Do If a SparkStreaming Job Fails After Being Executed Dozens of Hours and the OBS Access 403 Error is Reported?
How Do I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
How Do I Do If Error Message "java.io.IOException: Connection reset by peer" Is Displayed During the Execution of a Spark Job?
How Do I Do If Error Message "requestId=4971883851071737250" Is Displayed When a Spark Job Accesses OBS?
Why DataArtsStudio Occasionally Fail to Schedule Spark Jobs and the Rescheduling also Fails?
How Do I Do If a Flink Job Fails to Execute and the Error Message "java.lang.NoSuchFieldError: SECURITY_SSL_ENCRYPT_ENABLED" Is Displayed?
Why Submitted Yarn Job Cannot Be Viewed on the Web UI?
How Do I Modify the HDFS NameSpace (fs.defaultFS) of an Existing Cluster?
How Do I Do If the launcher-job Queue Is Stopped by YARN due to Insufficient Heap Size When I Submit a Flink Job on the Management Plane?
How Do I Do If the Error Message "slot request timeout" Is Displayed When I Submit a Flink Job?
Data Import and Export of DistCP Jobs
Cluster Upgrade/Patching
Can I Upgrade an MRS Cluster?
Can I Change the MRS Cluster Version?
Cluster Access
Can I Switch Between the Two Login Modes of MRS?
How Can I Obtain the IP Address and Port Number of a ZooKeeper Instance?
How Do I Access an MRS Cluster from a Node Outside the Cluster?
Big Data Service Development
Can MRS Run Multiple Flume Tasks at a Time?
How Do I Change FlumeClient Logs to Standard Logs?
Where Are the .jar Files and Environment Variables of Hadoop Located?
What Compression Algorithms Does HBase Support?
Can MRS Write Data to HBase Through the HBase External Table of Hive?
How Do I View HBase Logs?
How Do I Set the TTL for an HBase Table?
How Do I Balance HDFS Data?
How Do I Change the Number of HDFS Replicas?
What Is the Port for Accessing HDFS Using Python?
How Do I Modify the HDFS Active/Standby Switchover Class?
What Is the Recommended Number Type of DynamoDB in Hive Tables?
Can the Hive Driver Be Interconnected with DBCP2?
How Do I View the Hive Table Created by Another User?
Can I Export the Query Result of Hive Data?
How Do I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
How Do I Do If a "hivesql/hivescript" Job Fails to Submit After Hive Is Added?
What If an Excel File Downloaded on Hue Failed to Open?
How Do I Do If Sessions Are Not Released After Hue Connects to HiveServer and the Error Message "over max user connections" Is Displayed?
How Do I Reset Kafka Data?
How Do I Obtain the Client Version of MRS Kafka?
What Access Protocols Are Supported by Kafka?
How Do I Do If Error Message "Not Authorized to access group xxx" Is Displayed When a Kafka Topic Is Consumed?
What Compression Algorithms Does Kudu Support?
How Do I View Kudu Logs?
How Do I Handle the Kudu Service Exceptions Generated During Cluster Creation?
Does OpenTSDB Support Python APIs?
How Do I Configure Other Data Sources on Presto?
How Do I Connect to Spark Shell from MRS?
How Do I Connect to Spark Beeline from MRS?
Where Are the Execution Logs of Spark Jobs Stored?
How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
How Do I Check Whether the ResourceManager Configuration of Yarn Is Correct?
How Do I Modify the allow_drop_detached Parameter of ClickHouse?
How Do I Do If an Alarm Indicating Insufficient Memory Is Reported During Spark Task Execution?
How Do I Do If ClickHouse Consumes Excessive CPU Resources?
How Do I Enable the Map Type on ClickHouse?
A Large Number of OBS APIs Are Called When Spark SQL Accesses Hive Partitioned Tables
API
How Do I Configure the node_id Parameter When Using the API for Adjusting Cluster Nodes?
Cluster Management
How Do I View All Clusters?
How Do I View Log Information?
How Do I View Cluster Configuration Information?
How Do I Install Kafka and Flume in an MRS Cluster?
How Do I Stop an MRS Cluster?
Can I Expand Data Disk Capacity for MRS?
Can I Add Components to an Existing Cluster?
Can I Delete Components Installed in an MRS Cluster?
Can I Change MRS Cluster Nodes on the MRS Console?
How Do I Shield Cluster Alarm/Event Notifications?
Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
How Do I Configure the knox Memory?
What Is the Python Version Installed for an MRS Cluster?
How Do I View the Configuration File Directory of Each Component?
How Do I Do If the Time on MRS Nodes Is Incorrect?
How Do I Query the Startup Time of an MRS Node?
How Do I Do If Trust Relationships Between Nodes Are Abnormal?
How Do I Adjust the Memory Size of the manager-executor Process?
Kerberos Usage
How Do I Change the Kerberos Authentication Status of a Created MRS Cluster?
What Are the Ports of the Kerberos Authentication Service?
How Do I Deploy the Kerberos Service in a Running Cluster?
How Do I Access Hive in a Cluster with Kerberos Authentication Enabled?
How Do I Access Presto in a Cluster with Kerberos Authentication Enabled?
How Do I Access Spark in a Cluster with Kerberos Authentication Enabled?
How Do I Prevent Kerberos Authentication Expiration?
Metadata Management
Where Can I View Hive Metadata?
Troubleshooting
Accessing the Web Pages
Failed to Access MRS Manager
Failed to Log In to MRS Manager After the Python Upgrade
Failed to Log In to MRS Manager After Changing the Domain Name
A Blank Page Is Displayed Upon Login to Manager
Failed to Download Authentication Credentials When the Username Is Too Long
Cluster Management
Failed to Reduce Task Nodes
OBS Certificate in a Cluster Expired
Adding a New Disk to an MRS Cluster
Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
Replacing a Disk in an MRS Cluster (Applicable to 3.x)
MRS Backup Failure
Inconsistency Between df and du Command Output on the Core Node
Disassociating a Subnet from the ACL Network
MRS Becomes Abnormal After hostname Modification
DataNode Restarts Unexpectedly
Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
Failed to Download the MRS Cluster Client
Failed to Scale Out an MRS Cluster
Error Occurs When MRS Executes the Insert Command Using Beeline
How Do I Upgrade EulerOS to Fix Vulnerabilities in an MRS Cluster?
Using CDM to Migrate Data to HDFS
Alarms Are Frequently Generated in the MRS Cluster
Memory Usage of the PMS Process Is High
High Memory Usage of the Knox Process
It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
How Do I Locate a Job Submission Failure?
OS Disk Space Is Insufficient Due to Oversized HBase Log Files
Failed to Delete a New Tenant on FusionInsight Manager
Using Alluixo
Error Message "Does not contain a valid host:port authority" Is Reported When Alluixo Is in HA Mode
Using ClickHouse
ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
Using DBService
DBServer Instance Is in Abnormal Status
DBServer Instance Remains in the Restoring State
Default Port 20050 or 20051 Is Occupied
DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
DBService Backup Failure
Components Failed to Connect to DBService in Normal State
DBServer Failed to Start
DBService Backup Failed Because the Floating IP Address Is Unreachable
DBService Failed to Start Due to the Loss of the DBService Configuration File
Using Flink
"IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
"IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
Flink Service Program Fails to Read Files on the NFS Disk
Failed to Customize the Flink Log4j Log Level
Using Flume
Class Cannot Be Found After Flume Submits Jobs to Spark Streaming
Failed to Install a Flume Client
A Flume Client Cannot Connect to the Server
Flume Data Fails to Be Written to the Component
Flume Server Process Fault
Flume Data Collection Is Slow
Failed to Start Flume
Using HBase
Slow Response to HBase Connection
Failed to Authenticate the HBase User
RegionServer Failed to Start Because the Port Is Occupied
HBase Failed to Start Due to Insufficient Node Memory
HBase Service Unavailable Due to Poor HDFS Performance
HBase Failed to Start Due to Inappropriate Parameter Settings
RegionServer Failed to Start Due to Residual Processes
HBase Failed to Start Due to a Quota Set on HDFS
HBase Failed to Start Due to Corrupted Version Files
High CPU Usage Caused by Zero-Loaded RegionServer
HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
RegionServer Instance Is in the Restoring State
HBase Failed to Start in a Newly Installed Cluster
HBase Failed to Start Due to the Loss of the ACL Table Directory
HBase Failed to Start After the Cluster Is Powered Off and On
Failed to Import HBase Data Due to Oversized File Blocks
Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
Failed to Run the hbase shell Command on the MRS Cluster Client
Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
HBase Failed to Start Due to Insufficient RegionServer Memory
Using HDFS
All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
Failed to Use Python to Remotely Connect to the Port of HDFS
HDFS Capacity Usage Reaches 100%, Causing Unavailable Upper-layer Services Such as HBase and Spark
An Error Is Reported During HDFS and Yarn Startup
HDFS Permission Setting Error
A DataNode of HDFS Is Always in the Decommissioning State
HDFS Failed to Start Due to Insufficient Memory
A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
Common File Read/Write Faults
Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
A Client File Fails to Be Closed After Data Writing
File Fails to Be Uploaded to HDFS Due to File Errors
After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
Failed to Read Files, and "FileNotFoundException" Is Displayed
Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
Adjusting the Log Level of the Shell Client
File Read Fails, and "No common protection layer" Is Displayed
Failed to Write Files Because the HDFS Directory Quota Is Insufficient
Balancing Fails, and "Source and target differ in block-size" Is Displayed
A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
Uneven Data Distribution Due to Non-HDFS Data Residuals
Uneven Data Distribution Due to the Client Installation on the DataNode
Handling Unbalanced DataNode Disk Usage on Nodes
Locating Common Balance Problems
HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
HDFS Client Failed to Delete Overlong Directories
An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
Using Hive
Content Recorded in Hive Logs
Causes of Hive Startup Failure
"Cannot modify xxx at runtime" Is Reported When the set Command Is Executed in a Security Cluster
How to Specify a Queue When Hive Submits a Job
How to Set Map and Reduce Memory on the Client
Specifying the Output File Compression Format When Importing a Table
desc Table Cannot Be Completely Displayed
NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
A Newly Created User Has No Query Permissions
An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
An Error Is Reported When the "load data inpath" Command Is Executed
An Error Is Reported When the "load data local inpath" Command Is Executed
An Error Is Reported When the "create external table" Command Is Executed
An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
Insufficient Permissions to Execute the set role admin Command
An Error Is Reported When UDF Is Created Using Beeline
Difference Between Hive Service Health Status and Hive Instance Health Status
Hive Alarms and Triggering Conditions
"authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
Failed to Access ZooKeeper from the Client
"Invalid function" Is Displayed When a UDF Is Used
Hive Service Status Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Concerning
Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
Hive Task Failed to Run on the Client But Successful on Yarn
An Error Is Reported When the select Statement Is Executed
Failed to Drop a Large Number of Partitions
Failed to Start a Local Task
Failed to Start WebHCat
Sample Code Error for Hive Secondary Development After Domain Switching
MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
"Failed to execute session hooks: over max connections" Reported by Beeline
beeline Reports the "OutOfMemoryError" Error
Task Execution Fails Because the Input File Number Exceeds the Threshold
Task Execution Fails Because of Stack Memory Overflow
Task Failed Due to Concurrent Writes to One Table or Partition
Hive Task Failed Due to a Lack of HDFS Directory Permission
Failed to Load Data to Hive Tables
HiveServer and HiveHCat Process Faults
An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
Timeout Reported When Adding the Hive Table Field
Failed to Restart the Hive Service
Hive Failed to Delete a Table
An Error Is Reported When msck repair table table_name Is Run on Hive
How Do I Release Disk Space After Dropping a Table in Hive?
Connection Timeout During SQL Statement Execution on the Client
WebHCat Failed to Start Due to Abnormal Health Status
WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
Using Hue
A Job Is Running on Hue
HQL Fails to Be Executed on Hue Using Internet Explorer
Hue (Active) Cannot Open Web Pages
Failed to Access the Hue Web UI
HBase Tables Cannot Be Loaded on the Hue Web UI
Using Impala
Failed to Connect to impala-shell
Failed to Create a Kudu Table
Failed to Log In to the Impala Client
Using Kafka
An Error Is Reported When Kafka Is Run to Obtain a Topic
Flume Normally Connects to Kafka But Fails to Send Messages
Producer Failed to Send Data and Threw "NullPointerException"
Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
Consumer Fails to Consume Data and Remains in the Waiting State
SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
Checking Whether Data Consumed by a Customer Is Lost
Failed to Start a Component Due to Account Lock
Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
Kafka Topics Cannot Be Deleted
Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
Consumer Repeatedly Consumes Data
Leader for the Created Kafka Topic Partition Is Displayed as none
Safety Instructions on Using Kafka
Obtaining Kafka Consumer Offset Information
Adding or Deleting Configurations for a Topic
Reading the Content of the __consumer_offsets Internal Topic
Configuring Logs for Shell Commands on the Client
Obtaining Topic Distribution Information
Kafka HA Usage Description
Kafka Producer Writes Oversized Records
Kafka Consumer Reads Oversized Records
High Usage of Multiple Disks on a Kafka Cluster Node
Using Oozie
Oozie Jobs Do Not Run When a Large Number of Jobs Are Submitted Concurrently
Using Presto
During sql-standard-with-group Configuration, a Schema Fails to Be Created and the Error Message "Access Denied" Is Displayed
The Presto coordinator cannot be started properly.
An Error Is Reported When Presto Is Used to Query a Kudu Table
No Data is Found in the Hive Table Using Presto
Using Spark
An Error Occurs When the Split Size Is Changed in a Spark Application
An Error Is Reported When Spark Is Used
A Spark Job Fails to Run Due to Incorrect JAR File Import
A Spark Job Is Pending Due to Insufficient Memory
An Error Is Reported During Spark Running
Executor Memory Reaches the Threshold Is Displayed in Driver
Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
ApplicationMaster Failed to Start Twice in Yarn-client Mode
Failed to Connect to ResourceManager When a Spark Task Is Submitted
DataArts Studio Failed to Schedule Spark Jobs
Submission Status of the Spark Job API Is Error
Alarm 43006 Is Repeatedly Generated in the Cluster
Failed to Create or Delete a Table in Spark Beeline
Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
Large Number of Shuffle Results Are Lost During Spark Task Execution
Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
Spark Task Submission Failure
Spark Task Execution Failure
JDBCServer Connection Failure
Failed to View Spark Task Logs
Authentication Fails When Spark Connects to Other Services
An Error Occurs When Spark Connects to Redis
An Error Is Reported When spark-beeline Is Used to Query a Hive View
Using Sqoop
Connecting Sqoop to MySQL
Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
Failed to Export HBase Data to HDFS Through Hue's Sqoop Task
A Format Error Is Reported When Sqoop Is Used to Export Data from Hive to MySQL 8.0
An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
Using Storm
Invalid Hyperlink of Events on the Storm UI
Failed to Submit a Topology
Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
The Worker Log Is Empty After a Topology Is Submitted
Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
"well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
The GC Parameter Configured for the Service Topology Does Not Take Effect
Internal Server Error Is Displayed When the User Queries Information on the UI
Using Ranger
After Ranger Authentication Is Enabled for Hive, Unauthorized Tables and Databases Can Be Viewed on the Hue Page
Using Yarn
Plenty of Jobs Are Found After Yarn Is Started
"GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
Temporary Files Are Not Deleted When an MR Job Is Abnormal
ResourceManager of Yarn (Port 8032) Throws Error "connection refused"
Failed to View Job Logs on the Yarn Web UI
An Error Is Reported When a Queue Name Is Clicked on the Yarn Page
Using ZooKeeper
Accessing ZooKeeper from an MRS Cluster
Accessing OBS
When Using the MRS Multi-user Access to OBS Function, a User Does Not Have the Permission to Access the /tmp Directory
When the Hadoop Client Is Used to Delete Data from OBS, It Does Not Have the Permission for the .Trash Directory
Appendix
Precautions for MRS 3.x
Component Operation Guide (ME-Abu Dhabi Region)
Using Alluxio
Configuring an Underlying Storage System
Accessing Alluxio Using a Data Application
Common Operations of Alluxio
Using CarbonData (for Versions Earlier Than MRS 3.x)
Using CarbonData from Scratch
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Using CarbonData (for MRS 3.x or Later)
Overview
CarbonData Overview
Main Specifications of CarbonData
Configuration Reference
CarbonData Operation Guide
CarbonData Quick Start
CarbonData Table Management
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Modify the CarbonData Table
CarbonData Table Data Management
Loading Data
Deleting Segments
Combining Segments
CarbonData Data Migration
Migrating Data on CarbonData from Spark 1.5 to Spark2x
CarbonData Performance Tuning
Tuning Guidelines
Suggestions for Creating CarbonData Tables
Configurations for Performance Tuning
CarbonData Access Control
CarbonData Syntax Reference
DDL
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
DML
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Operation Concurrent Execution
API
Spatial Indexes
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deterioration
CarbonData FAQ
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why Data Load Performance Decreases due to Bad Records?
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Why Data loading Fails During off heap?
Why Do I Fail to Create a Hive Table?
Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
How Do I Logically Split Data Across Different Namespaces?
Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
Using ClickHouse
Using ClickHouse from Scratch
ClickHouse Table Engine Overview
Creating a ClickHouse Table
ClickHouse Data Type
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Structure
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
Migrating ClickHouse Data
Using ClickHouse to Import and Export Data
Synchronizing Kafka Data to ClickHouse
Using the ClickHouse Data Migration Tool
User Management and Authentication
ClickHouse User and Permission Management
Interconnecting ClickHouse With OpenLDAP for Authentication
Backing Up and Restoring ClickHouse Data Using a Data File
ClickHouse Log Overview
ClickHouse Performance Tuning
Solution to the "Too many parts" Error in Data Tables
Accelerating Merge Operations
Accelerating TTL Operations
ClickHouse FAQ
How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
How Do I Migrate Data from Hive/HDFS to ClickHouse?
How Do I Migrate Data from OBS/S3 to ClickHouse?
An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data
How Do I Grant the Select Permission at the Database Level to ClickHouse Users?
Using DBService
DBService Log Overview
Using Flink
Using Flink from Scratch
Viewing Flink Job Information
Configuring Flink Service Parameters
Configuring Flink Security Features
Security Features
Authentication and Encryption
Configuring Kafka
Configuring Pipeline
Configuring and Developing a Flink Visualization Job
Introduction to Flink Web UI
Flink Web UI Permission Management
Creating a FlinkServer Role
Accessing the Flink Web UI
Creating an Application
Creating a Cluster Connection
Creating a Data Connection
Creating a Stream Table
Creating a Job
Flink Log Overview
Flink Performance Tuning
Memory Configuration Optimization
Configuring DOP
Configuring Process Parameters
Optimizing the Design of Partitioning Method
Configuring the Netty Network Communication
Experience Summary
Common Flink Shell Commands
Reference
Example of Issuing a Certificate
Flink Restart Policy
Using Flume
Using Flume from Scratch
Overview
Installing the Flume Client
Installing the Flume Client on Clusters of Versions Earlier Than MRS 3.x
Installing the Flume Client on MRS 3.x or Later Clusters
Viewing Flume Client Logs
Stopping or Uninstalling the Flume Client
Using the Encryption Tool of the Flume Client
Flume Service Configuration Guide
Flume Configuration Parameter Description
Using Environment Variables in the properties.properties File
Non-Encrypted Transmission
Configuring Non-encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
Encrypted Transmission
Configuring the Encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Viewing Flume Client Monitoring Information
Connecting Flume to Kafka in Security Mode
Connecting Flume with Hive in Security Mode
Configuring the Flume Service Model
Overview
Service Model Configuration Guide
Introduction to Flume Logs
Flume Client Cgroup Usage Guide
Secondary Development Guide for Flume Third-Party Plug-ins
Common Issues About Flume
Using HBase
Using HBase from Scratch
Using an HBase Client
Creating HBase Roles
Configuring HBase Replication
Configuring HBase Parameters
Enabling Cross-Cluster Copy
Using the ReplicationSyncUp Tool
Using HIndex
Introduction to HIndex
Loading Index Data in Batches
Using an Index Generation Tool
Migrating Index Data
Configuring HBase DR
Configuring HBase Data Compression and Encoding
Performing an HBase DR Service Switchover
Performing an HBase DR Active/Standby Cluster Switchover
Community BulkLoad Tool
Configuring the MOB
Configuring Secure HBase Replication
Configuring Region In Transition Recovery Chore Service
Using a Secondary Index
HBase Log Overview
HBase Performance Tuning
Improving the BulkLoad Efficiency
Improving Put Performance
Optimizing Put and Scan Performance
Improving Real-time Data Write Performance
Improving Real-time Data Read Performance
Optimizing JVM Parameters
Common Issues About HBase
Why Does a Client Keep Failing to Connect to a Server for a Long Time?
Operation Failures Occur in Stopping BulkLoad On the Client
Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
How Do I Restore a Region in the RIT State for a Long Time?
Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
Insufficient Rights When a Tenant Accesses Phoenix
What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
How Do I Fix Region Overlapping?
Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
Using HDFS
Using Hadoop from Scratch
Configuring Memory Management
Creating an HDFS Role
Using the HDFS Client
Running the DistCp Command
Overview of HDFS File System Directories
Changing the DataNode Storage Directory
Configuring HDFS Directory Permission
Configuring NFS
Planning HDFS Capacity
Configuring ulimit for HBase and HDFS
Balancing DataNode Capacity
Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
Configuring the Number of Files in a Single HDFS Directory
Configuring the Recycle Bin Mechanism
Setting Permissions on Files and Directories
Setting the Maximum Lifetime and Renewal Interval of a Token
Configuring the Damaged Disk Volume
Configuring Encrypted Channels
Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
Configuring the NameNode Blacklist
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring HDFS NodeLabel
Configuring HDFS Mover
Using HDFS AZ Mover
Configuring HDFS DiskBalancer
Configuring the Observer NameNode to Process Read Requests
Performing Concurrent Operations on HDFS Files
Introduction to HDFS Logs
HDFS Performance Tuning
Improving Write Performance
Improving Read Performance Using Client Metadata Cache
Improving the Connection Between the Client and NameNode Using Current Active Cache
FAQ
NameNode Startup Is Slow
DataNode Is Normal but Cannot Report Data Blocks
HDFS WebUI Cannot Properly Update Information About Damaged Data
Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
Why Does Array Border-crossing Occur During FileInputFormat Split?
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
Can I Delete or Modify the Data Storage Directory in DataNode?
Blocks Miss on the NameNode UI After the Successful Rollback
Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
NameNode Fails to Be Restarted Due to EditLog Discontinuity
Using Hive
Using Hive from Scratch
Configuring Hive Parameters
Hive SQL
Permission Management
Hive Permission
Creating a Hive Role
Configuring Permissions for Hive Tables, Columns, or Databases
Configuring Permissions to Use Other Components for Hive
Using a Hive Client
Using HDFS Colocation to Store Hive Tables
Using the Hive Column Encryption Function
Customizing Row Separators
Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
Deleting Single-Row Records from Hive on HBase
Configuring HTTPS/HTTP-based REST APIs
Enabling or Disabling the Transform Function
Access Control of a Dynamic Table View on Hive
Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
Using Hive to Read Data in a Relational Database
Supporting Traditional Relational Database Syntax in Hive
Creating User-Defined Hive Functions
Enhancing beeline Reliability
Viewing Table Structures Using the show create Statement as Users with the select Permission
Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
Inserting Data to a Directory That Does Not Exist
Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
Disabling of Specifying the location Keyword When Creating an Internal Hive Table
Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
Authorizing Over 32 Roles in Hive
Restricting the Maximum Number of Maps for Hive Tasks
HiveServer Lease Isolation
Hive Supporting Transactions
Switching the Hive Execution Engine to Tez
Hive Materialized View
Hive Log Overview
Hive Performance Tuning
Creating Table Partitions
Optimizing Join
Optimizing Group By
Optimizing Data Storage
Optimizing SQL Statements
Optimizing the Query Function Using Hive CBO
Common Issues About Hive
How Do I Delete UDFs on Multiple HiveServers at the Same Time?
Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
Table Creation Fails Because Hive Complex Fields' Names Contain Special Characters
How Do I Monitor the Hive Table Size?
How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Description of Hive Table Location (Either Be an OBS or HDFS Path)
Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
Why Does Hive Not Support Vectorized Query?
Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
How Do I Disable the Logging Function of Hive?
Why Hive Tables in the OBS Directory Fail to Be Deleted?
Hive Configuration Problems
Using Hudi
Getting Started
Basic Operations
Hudi Table Schema
Write
Before You Start
Batch Write
Stream Write
Synchronizing Hudi Table Data to Hive
Read
Overview
Reading COW Table Views
Reading MOR Table Views
Data Management and Maintenance
Clustering
Cleaning
Compaction
Savepoint
Single-Table Concurrency Control
Using the Hudi Client
Operating a Hudi Table Using hudi-cli.sh
Configuration Reference
Write Configuration
Configuration of Hive Table Synchronization
Index Configuration
Storage Configuration
Compaction and Cleaning Configurations
Single-Table Concurrency Control Configuration
Hudi Performance Tuning
Common Issues About Hudi
Data Write
Parquet/Avro schema Is Reported When Updated Data Is Written
UnsupportedOperationException Is Reported When Updated Data Is Written
SchemaCompatabilityException Is Reported When Updated Data Is Written
What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
Hudi Fails to Write Decimal Data with Lower Precision
Data Collection
IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
HoodieException Is Reported When Data Is Collected
HoodieKeyException Is Reported When Data Is Collected
Hive Synchronization
SQLException Is Reported During Hive Data Synchronization
HoodieHiveSyncException Is Reported During Hive Data Synchronization
SemanticException Is Reported During Hive Data Synchronization
Using Hue (Versions Earlier Than MRS 3.x)
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using Hue (MRS 3.x or Later)
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the SparkSql Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using HBase on the Hue Web UI
Typical Scenarios
HDFS on Hue
Hive on Hue
Oozie on Hue
Hue Log Overview
Common Issues About Hue
Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
Why Does the use database Statement Become Invalid in Hive?
Why Do HDFS Files Fail to Access Through the Hue Web UI?
Why Do Large Files Fail to Upload on the Hue Page
Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
Using Impala
Using Impala from Scratch
Common Impala Parameters
Accessing the Impala Web UI
Using Impala to Operate Kudu
Interconnecting Impala with External LDAP
Enabling and Configuring a Dynamic Resource Pool for Impala
Using Kafka
Using Kafka from Scratch
Managing Kafka Topics
Querying Kafka Topics
Managing Kafka User Permissions
Managing Messages in Kafka Topics
Synchronizing Binlog-based MySQL Data to the MRS Cluster
Creating a Kafka Role
Kafka Common Parameters
Safety Instructions on Using Kafka
Kafka Specifications
Using the Kafka Client
Configuring Kafka HA and High Reliability Parameters
Changing the Broker Storage Directory
Checking the Consumption Status of Consumer Group
Kafka Balancing Tool Instructions
Balancing Data After Kafka Node Scale-Out
Kafka Token Authentication Mechanism Tool Usage
Introduction to Kafka Logs
Performance Tuning
Kafka Performance Tuning
Kafka Feature Description
Migrating Data Between Kafka Nodes
Common Issues About Kafka
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using KafkaManager
Introduction to KafkaManager
Accessing the KafkaManager Web UI
Managing Kafka Clusters
Kafka Cluster Monitoring Management
Using Loader
Using Loader from Scratch
How to Use Loader
Common Loader Parameters
Creating a Loader Role
Loader Link Configuration
Managing Loader Links (Versions Earlier Than MRS 3.x)
Managing Loader Links (MRS 3.x and Later Versions)
Source Link Configurations of Loader Jobs
Destination Link Configurations of Loader Jobs
Managing Loader Jobs
Preparing a Driver for MySQL Database Link
Importing Data
Overview
Importing Data Using Loader
Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
Typical Scenario: Importing Data from an SFTP Server to HBase
Typical Scenario: Importing Data from an SFTP Server to Hive
Typical Scenario: Importing Data from an FTP Server to HBase
Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
Typical Scenario: Importing Data from a Relational Database to HBase
Typical Scenario: Importing Data from a Relational Database to Hive
Typical Scenario: Importing Data from HDFS or OBS to HBase
Typical Scenario: Importing Data from a Relational Database to ClickHouse
Typical Scenario: Importing Data from HDFS to ClickHouse
Exporting Data
Overview
Using Loader to Export Data
Typical Scenario: Exporting Data from HDFS or OBS to an SFTP Server
Typical Scenario: Exporting Data from HBase to an SFTP Server
Typical Scenario: Exporting Data from Hive to an SFTP Server
Typical Scenario: Exporting Data from HDFS or OBS to a Relational Database
Typical Scenario: Exporting Data from HBase to a Relational Database
Typical Scenario: Exporting Data from Hive to a Relational Database
Typical Scenario: Importing Data from HBase to HDFS or OBS
Managing Jobs
Migrating Loader Jobs in Batches
Deleting Loader Jobs in Batches
Importing Loader Jobs in Batches
Exporting Loader Jobs in Batches
Viewing Historical Job Information
Operator Help
Overview
Input Operators
CSV File Input
Fixed File Input
Table Input
HBase Input
HTML Input
Hive input
Spark Input
Conversion Operators
Long Date Conversion
Null Value Conversion
Constant Field Addition
Random Value Conversion
Concat Fields
Extract Fields
Modulo Integer
String Cut
EL Operation
String Operations
String Reverse
String Trim
Filter Rows
Update Fields Operator
Output Operators
Hive output
Spark Output
Table Output
File Output
HBase Output
ClickHouse Output
Associating, Editing, Importing, or Exporting the Field Configuration of an Operator
Using Macro Definitions in Configuration Items
Operator Data Processing Rules
Client Tools
Running a Loader Job by Using Commands
loader-tool Usage Guide
loader-tool Usage Example
schedule-tool Usage Guide
schedule-tool Usage Example
Using loader-backup to Back Up Job Data
Open Source sqoop-shell Tool Usage Guide
Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
Loader Log Overview
Example: Using Loader to Import Data from OBS to HDFS
Common Issues About Loader
Data Cannot Be Saved in Internet Explorer 10 or 11
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Using Kudu
Using Kudu from Scratch
Accessing the Kudu Web UI
Using MapReduce
Configuring the Log Archiving and Clearing Mechanism
Reducing Client Application Failure Rate
Transmitting MapReduce Tasks from Windows to Linux
Configuring the Distributed Cache
Configuring the MapReduce Shuffle Address
Configuring the Cluster Administrator List
Introduction to MapReduce Logs
MapReduce Performance Tuning
Optimization Configuration for Multiple CPU Cores
Determining the Job Baseline
Streamlining Shuffle
AM Optimization for Big Tasks
Speculative Execution
Using Slow Start
Optimizing Performance for Committing MR Jobs
Common Issues About MapReduce
Why Does a MapReduce Task Stay Unchanged for a Long Time?
Why the Client Hangs During Job Running?
Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
MapReduce Job Failed in Multiple NameService Environment
Why a Fault MapReduce Node Is Not Blacklisted?
Using OpenTSDB
Using an MRS Client to Operate OpenTSDB Metric Data
Running the curl Command to Operate OpenTSDB
Using Oozie
Using Oozie from Scratch
Using the Oozie Client
Using Oozie Client to Submit an Oozie Job
Submitting a Hive Job
Submitting a Spark2x Job
Submitting a Loader Job
Submitting a DistCp Job
Submitting Other Jobs
Using Hue to Submit an Oozie Job
Creating a Workflow
Submitting a Workflow Job
Submitting a Hive2 Job
Submitting a Spark2x Job
Submitting a Java Job
Submitting a Loader Job
Submitting a MapReduce Job
Submitting a Sub-workflow Job
Submitting a Shell Job
Submitting an HDFS Job
Submitting a Streaming Job
Submitting a DistCp Job
Example of Mutual Trust Operations
Submitting an SSH Job
Submitting a Hive Script
Submitting a Coordinator Periodic Scheduling Job
Submitting a Bundle Batch Processing Job
Querying Job Execution Results
Oozie Log Overview
Common Issues About Oozie
Oozie Scheduled Tasks Are Not Executed on Time
Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
Common Oozie Troubleshooting Methods
Using Presto
Accessing the Presto Web UI
Using a Client to Execute Query Statements
Presto FAQ
How Do I Configure Multiple Hive Connections for Presto?
Using Ranger (MRS 1.9.2)
Creating a Ranger Cluster
Accessing the Ranger Web UI and Synchronizing Unix Users to the Ranger Web UI
Configuring Hive/Impala Access Permissions in Ranger
Configuring HBase Access Permissions in Ranger
Using Ranger (MRS 3.x)
Logging In to the Ranger Web UI
Enabling Ranger Authentication
Configuring Component Permission Policies
Viewing Ranger Audit Information
Configuring a Security Zone
Viewing Ranger Permission Information
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark2x
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for Storm
Ranger Log Overview
Common Issues About Ranger
Why Ranger Startup Fails During the Cluster Installation?
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
Using Spark
Precautions
Getting Started with Spark
Getting Started with Spark SQL
Using the Spark Client
Accessing the Spark Web UI
Interconnecting Spark with OpenTSDB
Creating a Table and Associating It with OpenTSDB
Inserting Data to the OpenTSDB Table
Querying an OpenTSDB Table
Modifying the Default Configuration Data
Using Spark2x
Precautions
Basic Operation
Getting Started
Configuring Parameters Rapidly
Common Parameters
Spark on HBase Overview and Basic Applications
Spark on HBase V2 Overview and Basic Applications
SparkSQL Permission Management(Security Mode)
Spark SQL Permissions
Creating a Spark SQL Role
Configuring Permissions for SparkSQL Tables, Columns, and Databases
Configuring Permissions for SparkSQL to Use Other Components
Configuring the Client and Server
Scenario-Specific Configuration
Configuring Multi-active Instance Mode
Configuring the Multi-tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Size of the Event Queue
Configuring Executor Off-Heap Memory
Enhancing Stability in a Limited Memory Condition
Viewing Aggregated Container Logs on the Web UI
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Configuring the Default Number of Data Blocks Divided by SparkSQL
Configuring the Compression Format of a Parquet Table
Configuring the Number of Lost Executors Displayed in WebUI
Setting the Log Level Dynamically
Configuring Whether Spark Obtains HBase Tokens
Configuring LIFO for Kafka
Configuring Reliability for Connected Kafka
Configuring Streaming Reading of Driver Execution Results
Filtering Partitions without Paths in Partitioned Tables
Configuring Spark2x Web UI ACLs
Configuring Vector-based ORC Data Reading
Broaden Support for Hive Partition Pruning Predicate Pushdown
Hive Dynamic Partition Overwriting Syntax
Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
Configuring Local Disk Cache for JobHistory
Configuring Spark SQL to Enable the Adaptive Execution Feature
Configuring Event Log Rollover
Adapting to the Third-party JDK When Ranger Is Used
Spark2x Logs
Obtaining Container Logs of a Running Spark Application
Small File Combination Tools
Using CarbonData for First Query
Spark2x Performance Tuning
Spark Core Tuning
Data Serialization
Optimizing Memory Configuration
Setting the DOP
Using Broadcast Variables
Using the external shuffle service to improve performance
Configuring Dynamic Resource Scheduling in Yarn Mode
Configuring Process Parameters
Designing the Direction Acyclic Graph (DAG)
Experience
Spark SQL and DataFrame Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the INSERT...SELECT Operation
Multiple JDBC Clients Concurrently Connecting to JDBCServer
Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
Optimizing SQL Query of Data of Multiple Sources
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Tuning
Common Issues About Spark2x
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
Why the Operation Fails When the Table Name Is TABLE?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Functions Cannot Be Used When Different JDBCServers Are Connected?
Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
Spark Streaming
Same DAG Log Is Recorded Twice for a Streaming Task
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Why Is not an Application Displayed When I Run the Application with the Empty Part File?
Why Does Spark2x Fail to Export a Table with the Same Field Name?
Why JRE fatal error after running Spark application multiple times?
Native Spark2x UI Fails to Be Accessed or Is Incorrectly Displayed when Internet Explorer Is Used for Access
How Does Spark2x Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
Spark Shuffle Exception Handling
Using Sqoop
Using Sqoop from Scratch
Adapting Sqoop 1.4.7 to MRS 3.x Clusters
Common Sqoop Commands and Parameters
Common Issues About Sqoop
What Should I Do If Class QueryProvider Is Unavailable?
How Do I Do If PostgreSQL or GaussDB Fails to Connect?
What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
What Should I Do If Data Failed to Be Synchronized Using hive-table?
What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
Using Storm
Using Storm from Scratch
Using the Storm Client
Submitting Storm Topologies on the Client
Accessing the Storm Web UI
Managing Storm Topologies
Querying Storm Topology Logs
Storm Common Parameters
Configuring a Storm Service User Password Policy
Migrating Storm Services to Flink
Overview
Completely Migrating Storm Services
Performing Embedded Service Migration
Migrating Services of External Security Components Interconnected with Storm
Storm Log Introduction
Performance Tuning
Storm Performance Tuning
Using Tez
Precautions
Common Tez Parameters
Accessing TezUI
Log Overview
Common Issues
TezUI Cannot Display Tez Task Execution Details
Error Occurs When a User Switches to the Tez Web UI
Yarn Logs Cannot Be Viewed on the TezUI Page
Table Data Is Empty on the TezUI HiveQueries Page
Using YARN
Common YARN Parameters
Creating Yarn Roles
Using the YARN Client
Configuring Resources for a NodeManager Role Instance
Changing NodeManager Storage Directories
Configuring Strict Permission Control for Yarn
Configuring Container Log Aggregation
Using CGroups with YARN
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring the Access Channel Protocol
Configuring Memory Usage Detection
Configuring the Additional Scheduler WebUI
Configuring Yarn Restart
Configuring ApplicationMaster Work Preserving
Configuring the Localized Log Levels
Configuring Users That Run Tasks
Yarn Log Overview
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Common ZooKeeper Parameters
Using a ZooKeeper Client
Configuring the ZooKeeper Permissions
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Accessing Manager
Accessing MRS Manager (Versions Earlier Than MRS 3.x)
Accessing FusionInsight Manager (MRS 3.x or Later)
Using an MRS Client
Installing a Client (MRS 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
API Reference (ME-Abu Dhabi Region)
Before You Start
Overview
API Calling
Endpoints
Constraints
Concepts
Selecting an API Type
API Overview
Calling APIs
Making an API Request
Authentication
Response
Application Cases
Creating an MRS Cluster
Scaling Out a Cluster
Scaling in a Cluster
Creating a Job
Terminating a Job
Terminating a Cluster
API V2
Cluster Management APIs
Creating a Cluster
Changing a Cluster Name
Job Object APIs
Adding and Executing a Job
Querying Information About a Job
Querying a List of Jobs
Terminating a Job
Obtaining SQL Results
Deleting Jobs in Batches
Auto Scaling APIs
Viewing Auto Scaling Policies
Cluster HDFS File API
Obtaining the List of Files from a Specified Directory
SQL APIs
Submitting a SQL Statement
Querying SQL Results
Canceling a SQL Execution Task
Agency Management
Querying the Mapping Between a User (Group) and an IAM Agency
Updating the Mapping Between a User (Group) and an IAM Agency
API V1.1
Cluster Management APIs
Creating a Cluster and Executing a Job
Resizing a Cluster
Querying a Cluster List
Querying Cluster Details
Querying a Host List
Terminating a Cluster
Job Object APIs
Job Execution Object APIs
Auto Scaling APIs
Configuring an Auto Scaling Rule
Tag Management APIs
Adding Tags to a Specified Cluster
Querying Tags of a Specified Cluster
Deleting Tags from a Specified Cluster
Adding Tags to a Cluster in Batches
Deleting Tags from a Cluster in Batches
Querying All Tags
Querying a List of Clusters with Specified Tags
Out-of-Date APIs
Job API Management (Deprecated)
Adding and Executing a Job (Deprecated)
Querying the exe Object List of Jobs (Deprecated)
Querying exe Object Details (Deprecated)
Deleting a Job Execution Object (Deprecated)
Permissions Policies and Supported Actions
Introduction
Appendix
Status Codes
Error Codes
Obtaining a Project ID
Obtaining the MRS Cluster Information
Roles and components supported by MRS
Change History
User Guide (Paris Region)
Overview
What Is MRS?
Application Scenarios
Components
CarbonData
ClickHouse
DBService
DBService Basic Principles
Relationship Between DBService and Other Components
Flink
Flink Basic Principles
Flink HA Solution
Relationship with Other Components
Flink Enhanced Open Source Features
Window
Job Pipeline
Stream SQL Join
Flink CEP in SQL
Flume
Flume Basic Principles
Relationship Between Flume and Other Components
Flume Enhanced Open Source Features
HBase
HBase Basic Principles
HBase HA Solution
Relationship with Other Components
HBase Enhanced Open Source Features
HDFS
HDFS Basic Principles
HDFS HA Solution
Relationship Between HDFS and Other Components
HDFS Enhanced Open Source Features
Hive
Hive Basic Principles
Hive CBO Principles
Relationship Between Hive and Other Components
Enhanced Open Source Feature
Hudi
Hue
Hue Basic Principles
Relationship Between Hue and Other Components
Hue Enhanced Open Source Features
Impala
Kafka
Kafka Basic Principles
Relationship Between Kafka and Other Components
Kafka Enhanced Open Source Features
KafkaManager
KrbServer and LdapServer
KrbServer and LdapServer Principles
KrbServer and LdapServer Enhanced Open Source Features
Kudu
Loader
Loader Basic Principles
Relationship Between Loader and Other Components
Loader Enhanced Open Source Features
Manager
Manager Basic Principles
Manager Key Features
MapReduce
MapReduce Basic Principles
Relationship Between MapReduce and Other Components
MapReduce Enhanced Open Source Features
Oozie
Oozie Basic Principles
Oozie Enhanced Open Source Features
OpenTSDB
Presto
Ranger
Ranger Basic Principles
Relationship Between Ranger and Other Components
Spark
Basic Principles of Spark
Spark HA Solution
Relationship Among Spark, HDFS, and Yarn
Spark Enhanced Open Source Feature: Optimized SQL Query of Cross-Source Data
Spark2x
Basic Principles of Spark2x
Spark2x HA Solution
Spark2x Multi-active Instance
Spark2x Multi-tenant
Relationship Between Spark2x and Other Components
Spark2x Open Source New Features
Spark2x Enhanced Open Source Features
CarbonData Overview
Optimizing SQL Query of Data of Multiple Sources
Storm
Storm Basic Principles
Relationship Between Storm and Other Components
Storm Enhanced Open Source Features
Tez
Yarn
Yarn Basic Principles
Yarn HA Solution
Relationship Between YARN and Other Components
Yarn Enhanced Open Source Features
ZooKeeper
ZooKeeper Basic Principle
Relationship Between ZooKeeper and Other Components
ZooKeeper Enhanced Open Source Features
Functions
Multi-tenant
Security Hardening
Easy Access to Web UIs of Components
Reliability Enhancement
Job Management
Bootstrap Actions
Metadata
Cluster Management
Cluster Lifecycle Management
Manually Scale Out/In a Cluster
Auto Scaling
Task Node Creation
Isolating a Host
Managing Tags
Cluster O&M
Message Notification
Constraints
Permissions Management
Related Services
Preparing a User
Creating an MRS User
Creating a Custom Policy
Synchronizing IAM Users to MRS
Configuring a Cluster
Methods of Creating MRS Clusters
Quick Creation of a Cluster
Quick Creation of a Hadoop Analysis Cluster
Quick Creation of an HBase Analysis Cluster
Quick Creation of a Kafka Streaming Cluster
Quick Creation of a ClickHouse Cluster
Quick Creation of a Real-time Analysis Cluster
Creating a Custom Cluster
Creating a Custom Topology Cluster
Adding a Tag to a Cluster
Communication Security Authorization
Configuring Auto Scaling Rules
Overview
Configuring Auto Scaling During Cluster Creation
Creating an Auto Scaling Policy for an Existing Cluster
Scenario 1: Using Auto Scaling Rules Alone
Scenario 2: Using Resource Plans Alone
Scenario 3: Using Both Auto Scaling Rules and Resource Plans
Modifying an Auto Scaling Policy
Deleting an Auto Scaling Policy
Enabling or Disabling an Auto Scaling Policy
Viewing an Auto Scaling Policy
Configuring Automation Scripts
Configuring Auto Scaling Metrics
Managing Data Connections
Configuring Data Connections
Configuring Ranger Data Connections
Configuring a Hive Data Connection
Installing Third-Party Software Using Bootstrap Actions
Viewing Failed MRS Tasks
Viewing Information of a Historical Cluster
Managing Clusters
Logging In to a Cluster
MRS Cluster Node Overview
Logging In to an ECS
Determining Active and Standby Management Nodes of Manager
Cluster Overview
Cluster List
Checking the Cluster Status
Viewing Basic Cluster Information
Viewing Cluster Patch Information
Viewing and Customizing Cluster Monitoring Metrics
Managing Components and Monitoring Hosts
Cluster O&M
Importing and Exporting Data
Changing the Subnet of a Cluster
Configuring Message Notification
Checking Health Status
Before You Start
Performing a Health Check
Viewing and Exporting a Health Check Report
Remote O&M
Authorizing O&M
Sharing Logs
Viewing MRS Operation Logs
Terminating a Cluster
Managing Nodes
Manually Scaling Out a Cluster
Manually Scaling In a Cluster
Managing a Host (Node)
Isolating a Host
Canceling Host Isolation
Scaling Up Master Node Specifications
Job Management
Introduction to MRS Jobs
Running a MapReduce Job
Running a SparkSubmit Job
Running a HiveSQL Job
Running a SparkSql Job
Running a Flink Job
Running a Kafka Job
Viewing Job Configuration and Logs
Stopping a Job
Deleting a Job
Using Encrypted OBS Data for Job Running
Configuring Job Notification Rules
Component Management
Object Management
Viewing Configuration
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configuration
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Starting and Stopping a Cluster
Synchronizing Cluster Configuration
Exporting Cluster Configuration
Performing Rolling Restart
Alarm Management
Viewing the Alarm List
Viewing the Event List
Viewing and Manually Clearing an Alarm
Patch Management
Patch Operation Guide for Versions Earlier than MRS 1.7.0
Patch Operation Guide for Versions from MRS 1.7.0 to MRS 2.0.1
Rolling Patches
Restoring Patches for the Isolated Hosts
Tenant Management
Before You Start
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a Tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Bootstrap Actions
Introduction to Bootstrap Actions
Preparing the Bootstrap Action Script
View Execution Records
Adding a Bootstrap Action
Modifying a Bootstrap Action
Deleting a Bootstrap Action
Using an MRS Client
Installing a Client
Installing a Client (Version 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
Using the Client of Each Component
Using a ClickHouse Client
Using a Flink Client
Using a Flume Client
Using an HBase Client
Using an HDFS Client
Using a Hive Client
Using an Impala Client
Using a Kafka Client
Using a Kudu Client
Using the Oozie Client
Using a Storm Client
Using a Yarn Client
Configuring a Cluster with Storage and Compute Decoupled
Introduction to Storage-Compute Decoupling
Configuring a Storage-Compute Decoupled Cluster (Agency)
Configuring a Storage-Compute Decoupled Cluster (AK/SK)
Using a Storage-Compute Decoupled Cluster
Interconnecting Flink with OBS
Interconnecting Flume with OBS
Interconnecting HDFS with OBS
Interconnecting Hive with OBS
Interconnecting MapReduce with OBS
Interconnecting Spark2x with OBS
Interconnecting Sqoop with External Storage Systems
Interconnecting Hudi with OBS
Accessing Web Pages of Open Source Components Managed in MRS Clusters
Web UIs of Open Source Components
List of Open Source Component Ports
Access Through Direct Connect
EIP-based Access
Access Using a Windows ECS
Creating an SSH Channel for Connecting to an MRS Cluster and Configuring the Browser
Interconnecting Jupyter Notebook with MRS Using Custom Python
Overview
Installing a Client on a Node Outside the Cluster
Installing Python 3
Configuring the MRS Client
Installing Jupyter Notebook
Verifying that Jupyter Notebook Can Access MRS
FAQs
Accessing Manager
Accessing FusionInsight Manager (MRS 3.x or Later)
Accessing MRS Manager MRS 2.x or Earlier)
FusionInsight Manager Operation Guide (Applicable to 3.x)
Getting Started
FusionInsight Manager Introduction
Querying the FusionInsight Manager Version
Logging In to FusionInsight Manager
Logging In to the Management Node
Homepage
Overview
Managing Monitoring Metric Reports
Cluster
Cluster Management
Overview
Performing a Rolling Restart of a Cluster
Managing Expired Configurations
Downloading the Client
Modifying Cluster Attributes
Managing Cluster Configurations
Managing Static Service Pools
Static Service Resources
Configuring Cluster Static Resources
Viewing Cluster Static Resources
Managing Clients
Managing a Client
Batch Upgrading Clients
Updating the hosts File in Batches
Managing a Service
Overview
Other Service Management Operations
Service Details Page
Performing Active/Standby Switchover of a Role Instance
Resource Monitoring
Collecting Stack Information
Switching Ranger Authentication
Service Configuration
Modifying Service Configuration Parameters
Modifying Custom Configuration Parameters of a Service
Instance Management
Overview
Decommissioning and Recommissioning an Instance
Managing Instance Configurations
Viewing the Instance Configuration File
Instance Group
Managing Instance Groups
Viewing Information About an Instance Group
Configuring Instantiation Group Parameters
Hosts
Host Management Page
Viewing the Host List
Viewing the Host Dashboard
Checking Host Processes and Resources
Host Maintenance Operations
Starting and Stopping All Instances on a Host
Performing a Host Health Check
Configuring Racks for Hosts
Isolating a Host
Exporting Host Information
Resource Overview
Distribution
Trend
Cluster
Host
O&M
Alarms
Overview of Alarms and Events
Configuring the Threshold
Configuring the Alarm Masking Status
Log
Log Online Search
Log Download
Perform a Health Check
Viewing a Health Check Task
Managing Health Check Reports
Modifying Health Check Configuration
Configuring Backup and Backup Restoration
Creating a Backup Task
Creating a Backup Restoration Task
Managing Backup and Backup Restoration Tasks
Audit
Overview
Configuring Audit Log Dumping
Tenant Resources
Multi-Tenancy
Overview
Technical Principles
Multi-Tenant Management
Multi-Tenant Model
Resource Overview
Dynamic Resources
Storage Resources
Multi-Tenancy Usage
Overview
Process Overview
Using the Superior Scheduler
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing Tenant Directories
Restoring Tenant Data
Deleting a Tenant
Managing Resources
Adding a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Managing Global User Policies
Using the Capacity Scheduler
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing Tenant Directories
Restoring Tenant Data
Deleting a Tenant
Clearing Non-associated Queues of a Tenant
Managing Resources
Adding a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Switching the Scheduler
System
Configuring Permissions
Managing Users
Creating a User
Modifying User Information
Exporting User Information
Locking a User
Unlocking a User
Deleting a User
Changing a User Password
Initializing a Password
Exporting an Authentication Credential File
Managing User Groups
Managing Roles
Security Policies
Configuring Password Policies
Configuring the Independent Attribute
Configuring Interconnections
Configuring SNMP Northbound Parameters
Configuring Syslog Northbound Parameters
Configuring Monitoring Metric Dumping
Importing a Certificate
OMS Management
Overview of the OMS Page
Modifying OMS Service Configuration Parameters
Component Management
Viewing Component Packages
Cluster Management
Configuring Client
Installing a Client
Using a Client
Updating the Configuration of an Installed Client
Cluster Mutual Trust Management
Overview of Mutual Trust Between Clusters
Changing Manager's Domain Name
Configuring Cross-Manager Mutual Trust Between Clusters
Assigning User Permissions After Cross-Cluster Mutual Trust Is Configured
Configuring Scheduled Backup of Alarm and Audit Information
Modifying the FusionInsight Manager Routing Table
Switching to the Maintenance Mode
Routine Maintenance
Log Management
About Logs
Manager Log List
Configuring the Log Level and Log File Size
Configuring the Number of Local Audit Log Backups
Viewing Role Instance Logs
Backup and Recovery Management
Introduction
Backing Up Data
Backing Up Manager Data
Backing Up CDL Data
Backing Up ClickHouse Metadata
Backing Up ClickHouse Service Data
Backing Up DBService Data
Backing Up HBase Metadata
Backing Up HBase Service Data
Backing Up NameNode Data
Backing Up HDFS Service Data
Backing Up Hive Service Data
Backing Up IoTDB Metadata
Backing Up IoTDB Service Data
Backing Up Kafka Metadata
Recovering Data
Restoring Manager Data
Restoring CDL Data
Restoring ClickHouse Metadata
Restoring ClickHouse Service Data
Restoring DBService data
Restoring HBase Metadata
Restoring HBase Service Data
Restoring NameNode Data
Restoring HDFS Service Data
Restoring Hive Service Data
Restoring IoTDB Metadata
Restoring IoTDB Service Data
Restoring Kafka Metadata
Enabling Cross-Cluster Replication
Managing Local Quick Restoration Tasks
Modifying a Backup Task
Viewing Backup and Restoration Tasks
How Do I Configure the Environment When I Create a ClickHouse Backup Task on FusionInsight Manager and Set the Path Type to RemoteHDFS?
Security Management
Security Overview
Right Model
Right Mechanism
Authentication Policies
Permission Verification Policies
User Account List
Default Permission Information
FusionInsight Manager Security Functions
Account Management
Account Security Settings
Unlocking LDAP Users and Management Accounts
Internal an Internal System User
Enabling and Disabling Permission Verification on Cluster Components
Logging In to a Non-Cluster Node Using a Cluster User in Normal Mode
Changing the Password for a System User
Changing the Password for User admin
Changing the Password for an OS User
Changing the Password for a System Internal User
Changing the Password for the Kerberos Administrator
Changing the Password for the OMS Kerberos Administrator
Changing the Passwords of the LDAP Administrator and the LDAP User (Including OMS LDAP)
Changing the Password for the LDAP Administrator
Changing the Password for a Component Running User
Changing the Password for a Database User
Changing the Password of the OMS Database Administrator
Changing the Password for the Data Access User of the OMS Database
Changing the Password for a Component Database User
Resetting the Component Database User Password
Changing the Password for User compdbuser of the DBService Database
Security Hardening
Hardening Policies
Configuring a Trusted IP Address to Access LDAP
HFile and WAL Encryption
Configuring Hadoop Security Parameters
Configuring an IP Address Whitelist for Modification Allowed by HBase
Updating a Key for a Cluster
Hardening the LDAP
Configuring Kafka Data Encryption During Transmission
Configuring HDFS Data Encryption During Transmission
Encrypting the Communication Between the Controller and the Agent
Updating SSH Keys for User omm
Security Maintenance
Account Maintenance Suggestions
Password Maintenance Suggestions
Log Maintenance Suggestions
Security Statement
MRS Manager Operation Guide (Applicable to 2.x and Earlier Versions)
Introduction to MRS Manager
Checking Running Tasks
Monitoring Management
Dashboard
Managing Services and Monitoring Hosts
Managing Resource Distribution
Configuring Monitoring Metric Dumping
Alarm Management
Viewing and Manually Clearing an Alarm
Configuring an Alarm Threshold
Configuring Syslog Northbound Interface Parameters
Configuring SNMP Northbound Interface Parameters
Object Management
Managing Objects
Viewing Configurations
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configurations
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Managing a Host
Isolating a Host
Canceling Host Isolation
Starting or Stopping a Cluster
Synchronizing Cluster Configurations
Exporting Configuration Data of a Cluster
Log Management
About Logs
Manager Log List
Viewing and Exporting Audit Logs
Exporting Service Logs
Configuring Audit Log Dumping Parameters
Health Check Management
Performing a Health Check
Viewing and Exporting a Health Check Report
Configuring the Number of Health Check Reports to Be Reserved
Managing Health Check Reports
DBService Health Check Indicators
Flume Health Check Indicators
HBase Health Check Indicators
Host Health Check Indicators
HDFS Health Check Indicators
Hive Health Check Indicators
Kafka Health Check Indicators
KrbServer Health Check Indicators
LdapServer Health Check Indicators
Loader Health Check Indicators
MapReduce Health Check Indicators
OMS Health Check Indicators
Spark Health Check Indicators
Storm Health Check Indicators
Yarn Health Check Indicators
ZooKeeper Health Check Indicators
Static Service Pool Management
Viewing the Status of a Static Service Pool
Configuring a Static Service Pool
Tenant Management
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Backup and Restoration
Introduction
Backing Up Metadata
Restoring Metadata
Modifying a Backup Task
Viewing Backup and Restoration Tasks
Security Management
Default Users of Clusters with Kerberos Authentication Disabled
Default Users of Clusters with Kerberos Authentication Enabled
Changing the Password of an OS User
Changing the password of user admin
Changing the Password of the Kerberos Administrator
Changing the Passwords of the LDAP Administrator and the LDAP User
Changing the Password of a Component Running User
Changing the Password of the OMS Database Administrator
Changing the Password of the Data Access User of the OMS Database
Changing the Password of a Component Database User
Replacing the HA Certificate
Updating Cluster Keys
Permissions Management
Creating a Role
Creating a User Group
Creating a User
Modifying User Information
Locking a User
Unlocking a User
Deleting a User
Changing the Password of an Operation User
Initializing the Password of a System User
Downloading a User Authentication File
Modifying a Password Policy
MRS Multi-User Permission Management
Users and Permissions of MRS Clusters
Default Users of Clusters with Kerberos Authentication Enabled
Creating a Role
Creating a User Group
Creating a User
Modifying User Information
Locking a User
Unlocking a User
Deleting a User
Changing the Password of an Operation User
Initializing the Password of a System User
Downloading a User Authentication File
Modifying a Password Policy
Configuring Cross-Cluster Mutual Trust Relationships
Configuring Users to Access Resources of a Trusted Cluster
Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
Patch Operation Guide
Patch Operation Guide for Versions Earlier than MRS 1.7.0
Patch Operation Guide for Versions from MRS 1.7.0 to MRS 2.0.1
Supporting Rolling Patches
Restoring Patches for the Isolated Hosts
Rolling Restart
Security Description
Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled
Security Authentication Principles and Mechanisms
High-Risk Operations
MRS Quick Start
How to Use MRS
Creating a Cluster
Uploading Data and Programs
Creating a Job
Using Clusters with Kerberos Authentication Enabled
Terminating a Cluster
Troubleshooting
Accessing the Web Pages
Failed to Access MRS Manager
Failed to Log In to MRS Manager After the Python Upgrade
Failed to Log In to MRS Manager After Changing the Domain Name
A Blank Page Is Displayed Upon Login to Manager
Failed to Download Authentication Credentials When the Username Is Too Long
Cluster Management
Failed to Reduce Task Nodes
Adding a New Disk to an MRS Cluster
Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
Replacing a Disk in an MRS Cluster (Applicable to 3.x)
MRS Backup Failure
Inconsistency Between df and du Command Output on the Core Node
Disassociating a Subnet from the ACL Network
MRS Becomes Abnormal After hostname Modification
DataNode Restarts Unexpectedly
Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
Failed to Download the MRS Cluster Client
Failed to Scale Out an MRS Cluster
Error Occurs When MRS Executes the Insert Command Using Beeline
How Do I Upgrade EulerOS to Fix Vulnerabilities in an MRS Cluster?
Using CDM to Migrate Data to HDFS
Alarms Are Frequently Generated in the MRS Cluster
Memory Usage of the PMS Process Is High
High Memory Usage of the Knox Process
It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
How Do I Locate a Job Submission Failure?
OS Disk Space Is Insufficient Due to Oversized HBase Log Files
Failed to Delete a New Tenant on FusionInsight Manager
Using Alluixo
Error Message "Does not contain a valid host:port authority" Is Reported When Alluixo Is in HA Mode
Using ClickHouse
ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
Using DBService
DBServer Instance Is in Abnormal Status
DBServer Instance Remains in the Restoring State
Default Port 20050 or 20051 Is Occupied
DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
DBService Backup Failure
Components Failed to Connect to DBService in Normal State
DBServer Failed to Start
DBService Backup Failed Because the Floating IP Address Is Unreachable
DBService Failed to Start Due to the Loss of the DBService Configuration File
Using Flink
"IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
"IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
Flink Service Program Fails to Read Files on the NFS Disk
Failed to Customize the Flink Log4j Log Level
Using Flume
Class Cannot Be Found After Flume Submits Jobs to Spark Streaming
Failed to Install a Flume Client
A Flume Client Cannot Connect to the Server
Flume Data Fails to Be Written to the Component
Flume Server Process Fault
Flume Data Collection Is Slow
Failed to Start Flume
Using HBase
Slow Response to HBase Connection
Failed to Authenticate the HBase User
RegionServer Failed to Start Because the Port Is Occupied
HBase Failed to Start Due to Insufficient Node Memory
HBase Service Unavailable Due to Poor HDFS Performance
HBase Failed to Start Due to Inappropriate Parameter Settings
RegionServer Failed to Start Due to Residual Processes
HBase Failed to Start Due to a Quota Set on HDFS
HBase Failed to Start Due to Corrupted Version Files
High CPU Usage Caused by Zero-Loaded RegionServer
HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
RegionServer Instance Is in the Restoring State
HBase Failed to Start in a Newly Installed Cluster
HBase Failed to Start Due to the Loss of the ACL Table Directory
HBase Failed to Start After the Cluster Is Powered Off and On
Failed to Import HBase Data Due to Oversized File Blocks
Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
Failed to Run the hbase shell Command on the MRS Cluster Client
Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
HBase Failed to Start Due to Insufficient RegionServer Memory
Using HDFS
All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
Failed to Use Python to Remotely Connect to the Port of HDFS
HDFS Capacity Usage Reaches 100%, Causing Unavailable Upper-layer Services Such as HBase and Spark
An Error Is Reported During HDFS and Yarn Startup
HDFS Permission Setting Error
A DataNode of HDFS Is Always in the Decommissioning State
HDFS Failed to Start Due to Insufficient Memory
A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
Common File Read/Write Faults
Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
A Client File Fails to Be Closed After Data Writing
File Fails to Be Uploaded to HDFS Due to File Errors
After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
Failed to Read Files, and "FileNotFoundException" Is Displayed
Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
Adjusting the Log Level of the Shell Client
File Read Fails, and "No common protection layer" Is Displayed
Failed to Write Files Because the HDFS Directory Quota Is Insufficient
Balancing Fails, and "Source and target differ in block-size" Is Displayed
A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
Uneven Data Distribution Due to Non-HDFS Data Residuals
Uneven Data Distribution Due to the Client Installation on the DataNode
Handling Unbalanced DataNode Disk Usage on Nodes
Locating Common Balance Problems
HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
HDFS Client Failed to Delete Overlong Directories
An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
Using Hive
Content Recorded in Hive Logs
Causes of Hive Startup Failure
"Cannot modify xxx at runtime" Is Reported When the set Command Is Executed in a Security Cluster
How to Specify a Queue When Hive Submits a Job
How to Set Map and Reduce Memory on the Client
Specifying the Output File Compression Format When Importing a Table
desc Table Cannot Be Completely Displayed
NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
A Newly Created User Has No Query Permissions
An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
An Error Is Reported When the "load data inpath" Command Is Executed
An Error Is Reported When the "load data local inpath" Command Is Executed
An Error Is Reported When the "create external table" Command Is Executed
An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
Insufficient Permissions to Execute the set role admin Command
An Error Is Reported When UDF Is Created Using Beeline
Difference Between Hive Service Health Status and Hive Instance Health Status
Hive Alarms and Triggering Conditions
"authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
Failed to Access ZooKeeper from the Client
"Invalid function" Is Displayed When a UDF Is Used
Hive Service Status Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Concerning
Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
Hive Task Failed to Run on the Client But Successful on Yarn
An Error Is Reported When the select Statement Is Executed
Failed to Drop a Large Number of Partitions
Failed to Start a Local Task
Failed to Start WebHCat
Sample Code Error for Hive Secondary Development After Domain Switching
MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
"Failed to execute session hooks: over max connections" Reported by Beeline
beeline Reports the "OutOfMemoryError" Error
Task Execution Fails Because the Input File Number Exceeds the Threshold
Task Execution Fails Because of Stack Memory Overflow
Task Failed Due to Concurrent Writes to One Table or Partition
Hive Task Failed Due to a Lack of HDFS Directory Permission
Failed to Load Data to Hive Tables
HiveServer and HiveHCat Process Faults
An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
Timeout Reported When Adding the Hive Table Field
Failed to Restart the Hive Service
Hive Failed to Delete a Table
An Error Is Reported When msck repair table table_name Is Run on Hive
How Do I Release Disk Space After Dropping a Table in Hive?
Connection Timeout During SQL Statement Execution on the Client
WebHCat Failed to Start Due to Abnormal Health Status
WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
Using Hue
A Job Is Running on Hue
HQL Fails to Be Executed on Hue Using Internet Explorer
Hue (Active) Cannot Open Web Pages
Failed to Access the Hue Web UI
HBase Tables Cannot Be Loaded on the Hue Web UI
Using Impala
Failed to Connect to impala-shell
Failed to Create a Kudu Table
Failed to Log In to the Impala Client
Using Kafka
An Error Is Reported When Kafka Is Run to Obtain a Topic
Flume Normally Connects to Kafka But Fails to Send Messages
Producer Failed to Send Data and Threw "NullPointerException"
Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
Consumer Fails to Consume Data and Remains in the Waiting State
SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
Checking Whether Data Consumed by a Customer Is Lost
Failed to Start a Component Due to Account Lock
Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
Kafka Topics Cannot Be Deleted
Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
Consumer Repeatedly Consumes Data
Leader for the Created Kafka Topic Partition Is Displayed as none
Safety Instructions on Using Kafka
Obtaining Kafka Consumer Offset Information
Adding or Deleting Configurations for a Topic
Reading the Content of the __consumer_offsets Internal Topic
Configuring Logs for Shell Commands on the Client
Obtaining Topic Distribution Information
Kafka HA Usage Description
Kafka Producer Writes Oversized Records
Kafka Consumer Reads Oversized Records
High Usage of Multiple Disks on a Kafka Cluster Node
Using Oozie
Oozie Jobs Do Not Run When a Large Number of Jobs Are Submitted Concurrently
Using Presto
During sql-standard-with-group Configuration, a Schema Fails to Be Created and the Error Message "Access Denied" Is Displayed
The Presto coordinator cannot be started properly.
An Error Is Reported When Presto Is Used to Query a Kudu Table
No Data is Found in the Hive Table Using Presto
Using Spark
An Error Occurs When the Split Size Is Changed in a Spark Application
An Error Is Reported When Spark Is Used
A Spark Job Fails to Run Due to Incorrect JAR File Import
A Spark Job Is Pending Due to Insufficient Memory
An Error Is Reported During Spark Running
Executor Memory Reaches the Threshold Is Displayed in Driver
Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
ApplicationMaster Failed to Start Twice in Yarn-client Mode
Failed to Connect to ResourceManager When a Spark Task Is Submitted
DataArts Studio Failed to Schedule Spark Jobs
Submission Status of the Spark Job API Is Error
Alarm 43006 Is Repeatedly Generated in the Cluster
Failed to Create or Delete a Table in Spark Beeline
Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
Large Number of Shuffle Results Are Lost During Spark Task Execution
Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
Spark Task Submission Failure
Spark Task Execution Failure
JDBCServer Connection Failure
Failed to View Spark Task Logs
Authentication Fails When Spark Connects to Other Services
An Error Occurs When Spark Connects to Redis
An Error Is Reported When spark-beeline Is Used to Query a Hive View
Using Sqoop
Connecting Sqoop to MySQL
Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
Failed to Export HBase Data to HDFS Through Hue's Sqoop Task
A Format Error Is Reported When Sqoop Is Used to Export Data from Hive to MySQL 8.0
An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
Using Storm
Invalid Hyperlink of Events on the Storm UI
Failed to Submit a Topology
Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
The Worker Log Is Empty After a Topology Is Submitted
Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
"well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
The GC Parameter Configured for the Service Topology Does Not Take Effect
Internal Server Error Is Displayed When the User Queries Information on the UI
Using Ranger
After Ranger Authentication Is Enabled for Hive, Unauthorized Tables and Databases Can Be Viewed on the Hue Page
Using Yarn
Plenty of Jobs Are Found After Yarn Is Started
"GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
Temporary Files Are Not Deleted When an MR Job Is Abnormal
ResourceManager of Yarn (Port 8032) Throws Error "connection refused"
Failed to View Job Logs on the Yarn Web UI
An Error Is Reported When a Queue Name Is Clicked on the Yarn Page
Using ZooKeeper
Accessing ZooKeeper from an MRS Cluster
Accessing OBS
When Using the MRS Multi-user Access to OBS Function, a User Does Not Have the Permission to Access the /tmp Directory
When the Hadoop Client Is Used to Delete Data from OBS, It Does Not Have the Permission for the .Trash Directory
Appendix
BMS Specifications Used by MRS
Data Migration Solution
Making Preparations
Exporting Metadata
Copying Data
Restoring Data
Precautions for MRS 3.x
Installing the Flume Client
Installing the Flume Client on Clusters of Versions Earlier Than MRS 3.x
Installing the Flume Client on MRS 3.x or Later Clusters
Change History
Component Operation Guide (Paris Region)
Using CarbonData (for Versions Earlier Than MRS 3.x)
Using CarbonData from Scratch
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Using CarbonData (for MRS 3.x or Later)
Overview
CarbonData Overview
Main Specifications of CarbonData
Configuration Reference
CarbonData Operation Guide
CarbonData Quick Start
CarbonData Table Management
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Modify the CarbonData Table
CarbonData Table Data Management
Loading Data
Deleting Segments
Combining Segments
CarbonData Data Migration
Migrating Data on CarbonData from Spark 1.5 to Spark2x
CarbonData Performance Tuning
Tuning Guidelines
Suggestions for Creating CarbonData Tables
Configurations for Performance Tuning
CarbonData Access Control
CarbonData Syntax Reference
DDL
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
DML
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Operation Concurrent Execution
API
Spatial Indexes
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deterioration
CarbonData FAQ
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why Data Load Performance Decreases due to Bad Records?
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Why Data loading Fails During off heap?
Why Do I Fail to Create a Hive Table?
Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
How Do I Logically Split Data Across Different Namespaces?
Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
Using ClickHouse
Using ClickHouse from Scratch
ClickHouse Table Engine Overview
Creating a ClickHouse Table
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Structure
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
Migrating ClickHouse Data
Using ClickHouse to Import and Export Data
Synchronizing Kafka Data to ClickHouse
Using the ClickHouse Data Migration Tool
User Management and Authentication
ClickHouse User and Permission Management
Interconnecting ClickHouse With OpenLDAP for Authentication
Backing Up and Restoring ClickHouse Data Using a Data File
ClickHouse Log Overview
Using DBService
DBService Log Overview
Using Flink
Using Flink from Scratch
Viewing Flink Job Information
Flink Configuration Management
Configuring Parameter Paths
JobManager & TaskManager
Blob
Distributed Coordination (via Akka)
SSL
Network communication (via Netty)
JobManager Web Frontend
File Systems
State Backend
Kerberos-based Security
HA
Environment
YARN
Pipeline
Security Configuration
Security Features
Configuring Kafka
Configuring Pipeline
Security Hardening
Authentication and Encryption
ACL Control
Web Security
Security Statement
Using the Flink Web UI
Overview
Introduction to Flink Web UI
Flink Web UI Application Process
FlinkServer Permissions Management
Overview
Authentication Based on Users and Roles
Accessing the Flink Web UI
Creating an Application on the Flink Web UI
Creating a Cluster Connection on the Flink Web UI
Creating a Data Connection on the Flink Web UI
Managing Tables on the Flink Web UI
Managing Jobs on the Flink Web UI
Flink Log Overview
Flink Performance Tuning
Optimization DataStream
Memory Configuration Optimization
Configuring DOP
Configuring Process Parameters
Optimizing the Design of Partitioning Method
Configuring the Netty Network Communication
Experience Summary
Common Flink Shell Commands
Using Flume
Using Flume from Scratch
Overview
Installing the Flume Client
Installing the Flume Client on Clusters of Versions Earlier Than MRS 3.x
Installing the Flume Client on MRS 3.x or Later Clusters
Viewing Flume Client Logs
Stopping or Uninstalling the Flume Client
Using the Encryption Tool of the Flume Client
Flume Service Configuration Guide
Flume Configuration Parameter Description
Using Environment Variables in the properties.properties File
Non-Encrypted Transmission
Configuring Non-encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
Encrypted Transmission
Configuring the Encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Viewing Flume Client Monitoring Information
Connecting Flume to Kafka in Security Mode
Connecting Flume with Hive in Security Mode
Configuring the Flume Service Model
Overview
Service Model Configuration Guide
Introduction to Flume Logs
Flume Client Cgroup Usage Guide
Secondary Development Guide for Flume Third-Party Plug-ins
Common Issues About Flume
Using HBase
Using HBase from Scratch
Using an HBase Client
Creating HBase Roles
Configuring HBase Replication
Configuring HBase Parameters
Enabling Cross-Cluster Copy
Using the ReplicationSyncUp Tool
GeoMesa Command Line
Using HIndex
Introduction to HIndex
Loading Index Data in Batches
Using an Index Generation Tool
Migrating Index Data
Configuring HBase DR
Configuring HBase Data Compression and Encoding
Performing an HBase DR Service Switchover
Performing an HBase DR Active/Standby Cluster Switchover
Community BulkLoad Tool
Configuring the MOB
Configuring Secure HBase Replication
Configuring Region In Transition Recovery Chore Service
Using a Secondary Index
HBase Log Overview
HBase Performance Tuning
Improving the BulkLoad Efficiency
Improving Put Performance
Optimizing Put and Scan Performance
Improving Real-time Data Write Performance
Improving Real-time Data Read Performance
Optimizing JVM Parameters
Common Issues About HBase
Why Does a Client Keep Failing to Connect to a Server for a Long Time?
Operation Failures Occur in Stopping BulkLoad On the Client
Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
How Do I Restore a Region in the RIT State for a Long Time?
Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
Insufficient Rights When a Tenant Accesses Phoenix
What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
How Do I Fix Region Overlapping?
Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
Using HDFS
Using Hadoop from Scratch
Configuring Memory Management
Creating an HDFS Role
Using the HDFS Client
Running the DistCp Command
Overview of HDFS File System Directories
Changing the DataNode Storage Directory
Configuring HDFS Directory Permission
Configuring NFS
Planning HDFS Capacity
Configuring ulimit for HBase and HDFS
Balancing DataNode Capacity
Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
Configuring the Number of Files in a Single HDFS Directory
Configuring the Recycle Bin Mechanism
Setting Permissions on Files and Directories
Setting the Maximum Lifetime and Renewal Interval of a Token
Configuring the Damaged Disk Volume
Configuring Encrypted Channels
Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
Configuring the NameNode Blacklist
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring HDFS NodeLabel
Configuring HDFS Mover
Using HDFS AZ Mover
Configuring HDFS DiskBalancer
Configuring the Observer NameNode to Process Read Requests
Performing Concurrent Operations on HDFS Files
Introduction to HDFS Logs
HDFS Performance Tuning
Improving Write Performance
Improving Read Performance Using Client Metadata Cache
Improving the Connection Between the Client and NameNode Using Current Active Cache
FAQ
NameNode Startup Is Slow
DataNode Is Normal but Cannot Report Data Blocks
HDFS WebUI Cannot Properly Update Information About Damaged Data
Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
Why Does Array Border-crossing Occur During FileInputFormat Split?
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
Can I Delete or Modify the Data Storage Directory in DataNode?
Blocks Miss on the NameNode UI After the Successful Rollback
Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
NameNode Fails to Be Restarted Due to EditLog Discontinuity
Using Hive
Using Hive from Scratch
Configuring Hive Parameters
Hive SQL
Permission Management
Hive Permission
Creating a Hive Role
Configuring Permissions for Hive Tables, Columns, or Databases
Configuring Permissions to Use Other Components for Hive
Using a Hive Client
Using HDFS Colocation to Store Hive Tables
Using the Hive Column Encryption Function
Customizing Row Separators
Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
Deleting Single-Row Records from Hive on HBase
Configuring HTTPS/HTTP-based REST APIs
Enabling or Disabling the Transform Function
Access Control of a Dynamic Table View on Hive
Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
Using Hive to Read Data in a Relational Database
Supporting Traditional Relational Database Syntax in Hive
Creating User-Defined Hive Functions
Enhancing beeline Reliability
Viewing Table Structures Using the show create Statement as Users with the select Permission
Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
Inserting Data to a Directory That Does Not Exist
Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
Disabling of Specifying the location Keyword When Creating an Internal Hive Table
Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
Authorizing Over 32 Roles in Hive
Restricting the Maximum Number of Maps for Hive Tasks
HiveServer Lease Isolation
Hive Supporting Transactions
Switching the Hive Execution Engine to Tez
Hive Materialized View
Hive Log Overview
Hive Performance Tuning
Creating Table Partitions
Optimizing Join
Optimizing Group By
Optimizing Data Storage
Optimizing SQL Statements
Optimizing the Query Function Using Hive CBO
Common Issues About Hive
How Do I Delete UDFs on Multiple HiveServers at the Same Time?
Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
How Do I Monitor the Hive Table Size?
How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Description of Hive Table Location (Either Be an OBS or HDFS Path)
Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
Why Does Hive Not Support Vectorized Query?
Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
How Do I Disable the Logging Function of Hive?
Why Hive Tables in the OBS Directory Fail to Be Deleted?
Hive Configuration Problems
Using Hudi
Getting Started
Basic Operations
Hudi Table Schema
Write
Batch Write
Stream Write
Synchronizing Hudi Table Data to Hive
Read
Reading COW Table Views
Reading MOR Table Views
Data Management and Maintenance
Clustering
Cleaning
Compaction
Savepoint
Single-Table Concurrent Write
Using the Hudi Client
Operating a Hudi Table Using hudi-cli.sh
Configuration Reference
Write Configuration
Configuration of Hive Table Synchronization
Index Configuration
Storage Configuration
Compaction and Cleaning Configurations
Single-Table Concurrent Write Configuration
Hudi Performance Tuning
Performance Tuning Methods
Recommended Resource Configuration
Common Issues About Hudi
Data Write
Parquet/Avro schema Is Reported When Updated Data Is Written
UnsupportedOperationException Is Reported When Updated Data Is Written
SchemaCompatabilityException Is Reported When Updated Data Is Written
What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
Hudi Fails to Write Decimal Data with Lower Precision
Data Collection
IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
HoodieException Is Reported When Data Is Collected
HoodieKeyException Is Reported When Data Is Collected
Hive Synchronization
SQLException Is Reported During Hive Data Synchronization
HoodieHiveSyncException Is Reported During Hive Data Synchronization
SemanticException Is Reported During Hive Data Synchronization
Using Hue (Versions Earlier Than MRS 3.x)
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using Hue (MRS 3.x or Later)
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the SparkSql Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using HBase on the Hue Web UI
Typical Scenarios
HDFS on Hue
Hive on Hue
Oozie on Hue
Hue Log Overview
Common Issues About Hue
How Do I Solve the Problem that HQL Fails to Be Executed in Hue Using Internet Explorer?
Why Does the use database Statement Become Invalid When Hive Is Used?
What Can I Do If HDFS Files Fail to Be Accessed Using Hue WebUI?
How Do I Do If a Large File Fails to Upload on the Hue Page?
Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
Using Impala
Using Impala from Scratch
Common Impala Parameters
Accessing the Impala Web UI
Using Impala to Operate Kudu
Interconnecting Impala with External LDAP
Enabling and Configuring a Dynamic Resource Pool for Impala
Using Kafka
Using Kafka from Scratch
Managing Kafka Topics
Querying Kafka Topics
Managing Kafka User Permissions
Managing Messages in Kafka Topics
Synchronizing Binlog-based MySQL Data to the MRS Cluster
Creating a Kafka Role
Kafka Common Parameters
Safety Instructions on Using Kafka
Kafka Specifications
Using the Kafka Client
Configuring Kafka HA and High Reliability Parameters
Changing the Broker Storage Directory
Checking the Consumption Status of Consumer Group
Kafka Balancing Tool Instructions
Balancing Data After Kafka Node Scale-Out
Kafka Token Authentication Mechanism Tool Usage
Introduction to Kafka Logs
Performance Tuning
Kafka Performance Tuning
Kafka Feature Description
Migrating Data Between Kafka Nodes
Common Issues About Kafka
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using KafkaManager
Introduction to KafkaManager
Accessing the KafkaManager Web UI
Managing Kafka Clusters
Kafka Cluster Monitoring Management
Using Loader
Using Loader from Scratch
How to Use Loader
Loader Link Configuration
Managing Loader Links (Versions Earlier Than MRS 3.x)
Source Link Configurations of Loader Jobs
Destination Link Configurations of Loader Jobs
Managing Loader Jobs
Preparing a Driver for MySQL Database Link
Loader Log Overview
Example: Using Loader to Import Data from OBS to HDFS
Common Issues About Loader
How to Resolve the Problem that Failed to Save Data When Using Internet Explorer 10 or Internet Explorer 11 ?
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Using Kudu
Using Kudu from Scratch
Accessing the Kudu Web UI
Using MapReduce
Configuring the Log Archiving and Clearing Mechanism
Reducing Client Application Failure Rate
Transmitting MapReduce Tasks from Windows to Linux
Configuring the Distributed Cache
Configuring the MapReduce Shuffle Address
Configuring the Cluster Administrator List
Introduction to MapReduce Logs
MapReduce Performance Tuning
Optimization Configuration for Multiple CPU Cores
Determining the Job Baseline
Streamlining Shuffle
AM Optimization for Big Tasks
Speculative Execution
Using Slow Start
Optimizing Performance for Committing MR Jobs
Common Issues About MapReduce
Why Does It Take a Long Time to Run a Task Upon ResourceManager Active/Standby Switchover?
Why Does a MapReduce Task Stay Unchanged for a Long Time?
Why the Client Hangs During Job Running?
Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
MapReduce Job Failed in Multiple NameService Environment
Why a Fault MapReduce Node Is Not Blacklisted?
Using OpenTSDB
Using an MRS Client to Operate OpenTSDB Metric Data
Running the curl Command to Operate OpenTSDB
Using Oozie
Using Oozie from Scratch
Using the Oozie Client
Using Oozie Client to Submit an Oozie Job
Submitting a Hive Job
Submitting a Spark2x Job
Submitting a Loader Job
Submitting a DistCp Job
Submitting Other Jobs
Using Hue to Submit an Oozie Job
Creating a Workflow
Submitting a Workflow Job
Submitting a Hive2 Job
Submitting a Spark2x Job
Submitting a Java Job
Submitting a Loader Job
Submitting a MapReduce Job
Submitting a Sub-workflow Job
Submitting a Shell Job
Submitting an HDFS Job
Submitting a Streaming Job
Submitting a DistCp Job
Example of Mutual Trust Operations
Submitting an SSH Job
Submitting a Hive Script
Submitting a Coordinator Periodic Scheduling Job
Submitting a Bundle Batch Processing Job
Querying the Operation Results
Oozie Log Overview
Common Issues About Oozie
Oozie Scheduled Tasks Are Not Executed on Time
Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
Common Oozie Troubleshooting Methods
Using Presto
Accessing the Presto Web UI
Using a Client to Execute Query Statements
Using Ranger (MRS 3.x)
Logging In to the Ranger Web UI
Enabling Ranger Authentication
Configuring Component Permission Policies
Viewing Ranger Audit Information
Configuring a Security Zone
Changing the Ranger Data Source to LDAP for a Normal Cluster
Viewing Ranger Permission Information
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark2x
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for Storm
Ranger Log Overview
Common Issues About Ranger
Why Ranger Startup Fails During the Cluster Installation?
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
Using Spark
Precautions
Getting Started with Spark
Getting Started with Spark SQL
Using the Spark Client
Accessing the Spark Web UI
Interconnecting Spark with OpenTSDB
Creating a Table and Associating It with OpenTSDB
Inserting Data to the OpenTSDB Table
Querying an OpenTSDB Table
Modifying the Default Configuration Data
Using Spark2x
Precautions
Basic Operation
Getting Started
Configuring Parameters Rapidly
Common Parameters
Spark on HBase Overview and Basic Applications
Spark on HBase V2 Overview and Basic Applications
SparkSQL Permission Management(Security Mode)
Spark SQL Permissions
Creating a Spark SQL Role
Configuring Permissions for SparkSQL Tables, Columns, and Databases
Configuring Permissions for SparkSQL to Use Other Components
Configuring the Client and Server
Scenario-Specific Configuration
Configuring Multi-active Instance Mode
Configuring the Multi-tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Size of the Event Queue
Configuring Executor Off-Heap Memory
Enhancing Stability in a Limited Memory Condition
Viewing Aggregated Container Logs on the Web UI
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Configuring the Default Number of Data Blocks Divided by SparkSQL
Configuring the Compression Format of a Parquet Table
Configuring the Number of Lost Executors Displayed in WebUI
Setting the Log Level Dynamically
Configuring Whether Spark Obtains HBase Tokens
Configuring LIFO for Kafka
Configuring Reliability for Connected Kafka
Configuring Streaming Reading of Driver Execution Results
Filtering Partitions without Paths in Partitioned Tables
Configuring Spark2x Web UI ACLs
Configuring Vector-based ORC Data Reading
Broaden Support for Hive Partition Pruning Predicate Pushdown
Hive Dynamic Partition Overwriting Syntax
Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
Configuring Local Disk Cache for JobHistory
Configuring Spark SQL to Enable the Adaptive Execution Feature
Configuring Event Log Rollover
Adapting to the Third-party JDK When Ranger Is Used
Spark2x Logs
Obtaining Container Logs of a Running Spark Application
Small File Combination Tools
Using CarbonData for First Query
Spark2x Performance Tuning
Spark Core Tuning
Data Serialization
Optimizing Memory Configuration
Setting the DOP
Using Broadcast Variables
Using the external shuffle service to improve performance
Configuring Dynamic Resource Scheduling in Yarn Mode
Configuring Process Parameters
Designing the Direction Acyclic Graph (DAG)
Experience
Spark SQL and DataFrame Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the INSERT...SELECT Operation
Multiple JDBC Clients Concurrently Connecting to JDBCServer
Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
Optimizing SQL Query of Data of Multiple Sources
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Tuning
Common Issues About Spark2x
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
Why the Operation Fails When the Table Name Is TABLE?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
What Should I Do If the JDBCServer Process is Mistakenly Killed During a Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Are Some Functions Not Available when Another JDBCServer Is Connected?
Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
Spark Streaming
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Why Is not an Application Displayed When I Run the Application with the Empty Part File?
Why Does Spark2x Fail to Export a Table with the Same Field Name?
Why JRE fatal error after running Spark application multiple times?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native Spark2x UI
How Does Spark2x Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
What Should I Do If the Native Page of an Application of Spark2x JobHistory Fails to Display During Access to the Page
Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
Spark Shuffle Exception Handling
Using Sqoop
Using Sqoop from Scratch
Adapting Sqoop 1.4.7 to MRS 3.x Clusters
Common Sqoop Commands and Parameters
Common Issues About Sqoop
What Should I Do If Class QueryProvider Is Unavailable?
How Do I Do If PostgreSQL or GaussDB Fails to Connect?
What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
What Should I Do If Data Failed to Be Synchronized Using hive-table?
What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
Using Storm
Using Storm from Scratch
Using the Storm Client
Submitting Storm Topologies on the Client
Accessing the Storm Web UI
Managing Storm Topologies
Querying Storm Topology Logs
Storm Common Parameters
Configuring a Storm Service User Password Policy
Migrating Storm Services to Flink
Overview
Completely Migrating Storm Services
Performing Embedded Service Migration
Migrating Services of External Security Components Interconnected with Storm
Storm Log Introduction
Performance Tuning
Storm Performance Tuning
Using Tez
Precautions
Common Tez Parameters
Accessing TezUI
Log Overview
Common Issues
TezUI Cannot Display Tez Task Execution Details
Error Occurs When a User Switches to the Tez Web UI
Yarn Logs Cannot Be Viewed on the TezUI Page
Table Data Is Empty on the TezUI HiveQueries Page
Using Yarn
Common YARN Parameters
Creating Yarn Roles
Using the YARN Client
Configuring Resources for a NodeManager Role Instance
Changing NodeManager Storage Directories
Configuring Strict Permission Control for Yarn
Configuring Container Log Aggregation
Using CGroups with YARN
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring the Access Channel Protocol
Configuring Memory Usage Detection
Configuring the Additional Scheduler WebUI
Configuring Yarn Restart
Configuring ApplicationMaster Work Preserving
Configuring the Localized Log Levels
Configuring Users That Run Tasks
Yarn Log Overview
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Common ZooKeeper Parameters
Using a ZooKeeper Client
Configuring the ZooKeeper Permissions
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Accessing Manager
Accessing MRS Manager (Versions Earlier Than MRS 3.x)
Accessing FusionInsight Manager (MRS 3.x or Later)
Using an MRS Client
Installing a Client (Version 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
Component Operation Guide (LTS) (Paris Region)
Using CarbonData
Overview
CarbonData Overview
Main Specifications of CarbonData
Configuration Reference
CarbonData Operation Guide
CarbonData Quick Start
CarbonData Table Management
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Modify the CarbonData Table
CarbonData Table Data Management
Loading Data
Deleting Segments
Combining Segments
CarbonData Data Migration
Migrating Data on CarbonData from Spark1.5 to Spark2x
CarbonData Performance Tuning
Tuning Guidelines
Suggestions for Creating CarbonData Tables
Configurations for Performance Tuning
CarbonData Access Control
CarbonData Syntax Reference
DDL
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
REFRESH INDEX
DML
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Operation Concurrent Execution
API
Spatial Indexes
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deterioration
CarbonData FAQ
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why Data Load Performance Decreases due to Bad Records?
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial Executors Is Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Why Data loading Fails During off heap?
Why Do I Fail to Create a Hive Table?
Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
How Do I Logically Split Data Across Different Namespaces?
Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
Using ClickHouse
Using ClickHouse from Scratch
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Structure
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
Importing and Exporting File Data
User Management and Authentication
ClickHouse User and Permission Management
Setting the ClickHouse Username and Password
ClickHouse Table Engine Overview
Creating a ClickHouse Table
Using the ClickHouse Data Migration Tool
Monitoring of Slow ClickHouse Query Statements and Replication Table Data Synchronization
Slow Query Statement Monitoring
Replication Table Data Synchronization Monitoring
Adaptive MV Usage in ClickHouse
ClickHouse Log Overview
Using DBService
Configuring SSL for the HA Module
Restoring SSL for the HA Module
Configuring the Timeout Interval of DBService Backup Tasks
DBService Log Overview
Using Flink
Using Flink from Scratch
Viewing Flink Job Information
Flink Configuration Management
Configuring Parameter Paths
JobManager & TaskManager
Blob
Distributed Coordination (via Akka)
SSL
Network communication (via Netty)
JobManager Web Frontend
File Systems
State Backend
Kerberos-based Security
HA
Environment
Yarn
Pipeline
Security Configuration
Security Features
Configuring Kafka
Configuring Pipeline
Security Hardening
Authentication and Encryption
ACL Control
Web Security
Security Statement
Using the Flink Web UI
Overview
Introduction to Flink Web UI
Flink Web UI Application Process
FlinkServer Permissions Management
Overview
Authentication Based on Users and Roles
Accessing the Flink Web UI
Creating an Application on the Flink Web UI
Creating a Cluster Connection on the Flink Web UI
Creating a Data Connection on the Flink Web UI
Managing Tables on the Flink Web UI
Managing Jobs on the Flink Web UI
Managing UDFs on the Flink Web UI
Managing UDFs on the Flink Web UI
UDF Java and SQL Examples
UDAF Java and SQL Examples
UDTF Java and SQL Examples
Interconnecting FlinkServer with External Components
Interconnecting FlinkServer with ClickHouse
Interconnecting FlinkServer with HBase
Interconnecting FlinkServer with HDFS
Interconnecting FlinkServer with Hive
Interconnecting FlinkServer with Hudi
Interconnecting FlinkServer with Kafka
Deleting Residual Information About Flink Tasks
Flink Log Overview
Flink Performance Tuning
Optimization DataStream
Memory Configuration Optimization
Configuring DOP
Configuring Process Parameters
Optimizing the Design of Partitioning Method
Configuring the Netty Network Communication
Summarization
Common Flink Shell Commands
Using Flume
Using Flume from Scratch
Overview
Installing the Flume Client on Clusters
Viewing Flume Client Logs
Stopping or Uninstalling the Flume Client
Using the Encryption Tool of the Flume Client
Flume Service Configuration Guide
Flume Configuration Parameter Description
Using Environment Variables in the properties.properties File
Non-Encrypted Transmission
Configuring Non-encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
Encrypted Transmission
Configuring the Encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Viewing Flume Client Monitoring Information
Connecting Flume to Kafka in Security Mode
Connecting Flume with Hive in Security Mode
Configuring the Flume Service Model
Overview
Service Model Configuration Guide
Introduction to Flume Logs
Flume Client Cgroup Usage Guide
Secondary Development Guide for Flume Third-Party Plug-ins
Common Issues About Flume
Using HBase
Using HBase from Scratch
Creating HBase Roles
Using an HBase Client
Configuring HBase Replication
Enabling Cross-Cluster Copy
Supporting Full-Text Index
Using the ReplicationSyncUp Tool
Using HIndex
Introduction to HIndex
Loading Index Data in Batches
Using an Index Generation Tool
Configuring HBase DR
Performing an HBase DR Service Switchover
Configuring HBase Data Compression and Encoding
Performing an HBase DR Active/Standby Cluster Switchover
Community BulkLoad Tool
Configuring the MOB
Configuring Secure HBase Replication
Configuring Region In Transition Recovery Chore Service
Using a Secondary Index
HBase Log Overview
HBase Performance Tuning
Improving the BulkLoad Efficiency
Improving Put Performance
Optimizing Put and Scan Performance
Improving Real-time Data Write Performance
Improving Real-time Data Read Performance
Optimizing JVM Parameters
Common Issues About HBase
Why Does a Client Keep Failing to Connect to a Server for a Long Time?
Operation Failures Occur in Stopping BulkLoad On the Client
Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
How Do I Restore a Region in the RIT State for a Long Time?
Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
Insufficient Rights When a Tenant Accesses Phoenix
What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
How Do I Fix Region Overlapping?
Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
Using HDFS
Configuring Memory Management
Creating an HDFS Role
Using the HDFS Client
Running the DistCp Command
Overview of HDFS File System Directories
Changing the DataNode Storage Directory
Configuring HDFS Directory Permission
Configuring NFS
Planning HDFS Capacity
Configuring ulimit for HBase and HDFS
Balancing DataNode Capacity
Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
Configuring the Number of Files in a Single HDFS Directory
Configuring the Recycle Bin Mechanism
Setting Permissions on Files and Directories
Setting the Maximum Lifetime and Renewal Interval of a Token
Configuring the Damaged Disk Volume
Configuring Encrypted Channels
Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
Configuring the NameNode Blacklist
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring HDFS NodeLabel
Configuring HDFS DiskBalancer
Performing Concurrent Operations on HDFS Files
Introduction to HDFS Logs
HDFS Performance Tuning
Improving Write Performance
Improving Read Performance Using Client Metadata Cache
Improving the Connection Between the Client and NameNode Using Current Active Cache
FAQ
NameNode Startup Is Slow
Why MapReduce Tasks Fails in the Environment with Multiple NameServices?
DataNode Is Normal but Cannot Report Data Blocks
HDFS WebUI Cannot Properly Update Information About Damaged Data
Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?
Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
Why Does Array Border-crossing Occur During FileInputFormat Split?
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
Can I Delete or Modify the Data Storage Directory in DataNode?
Blocks Miss on the NameNode UI After the Successful Rollback
Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
NameNode Fails to Be Restarted Due to EditLog Discontinuity
Using HetuEngine
Using HetuEngine from Scratch
HetuEngine Permission Management
HetuEngine Permission Management Overview
Creating a HetuEngine User
HetuEngine Ranger-based Permission Control
HetuEngine MetaStore-based Permission Control
Overview
Creating a HetuEngine Role
Configuring Permissions for Tables, Columns, and Databases
Permission Principles and Constraints
Creating HetuEngine Compute Instances
Configuring Data Sources
Before You Start
Configuring a Hive Data Source
Configuring a Co-deployed Hive Data Source
Configuring a Traditional Data Source
Configuring a Hudi Data Source
Configuring an HBase Data Source
Configuring a GaussDB Data Source
Configuring a HetuEngine Data Source
Configuring a ClickHouse Data Source
Managing Data Sources
Managing an External Data Source
Managing Compute Instances
Configuring Resource Groups
Adjusting the Number of Worker Nodes
Managing a HetuEngine Compute Instance
Importing and Exporting Compute Instance Configurations
Viewing the Instance Monitoring Page
Viewing Coordinator and Worker Logs
Using Resource Labels to Specify on Which Node Coordinators Should Run
Using the HetuEngine Client
Using the HetuEngine Cross-Source Function
Introduction to HetuEngine Cross-Source Function
Usage Guide of HetuEngine Cross-Source Function
Using HetuEngine Cross-Domain Function
Introduction to HetuEngine Cross-Source Function
HetuEngine Cross-Domain Function Usage
HetuEngine Cross-Domain Rate Limit Function
Using a Third-Party Visualization Tool to Access HetuEngine
Usage Instruction
Using DBeaver to Access HetuEngine
Using Tableau to Access HetuEngine
Using PowerBI to Access HetuEngine
Using Yonghong BI to Access HetuEngine
Function & UDF Development and Application
HetuEngine Function Plugin Development and Application
Hive UDF Development and Application
HetuEngine UDF Development and Application
Introduction to HetuEngine Logs
HetuEngine Performance Tuning
Adjusting the Yarn Service Configuration
Adjusting Cluster Node Resource Configurations
Adjusting Execution Plan Cache
Adjusting Metadata Cache
Modifying the CTE Configuration
Common Issues About HetuEngine
How Do I Perform Operations After the Domain Name Is Changed?
What Do I Do If Starting a Cluster on the Client Times Out?
How Do I Handle Data Source Loss?
How Do I Handle HetuEngine Alarms?
How Do I Do If Coordinators and Workers Cannot Be Started on the New Node?
Using Hive
Using Hive from Scratch
Configuring Hive Parameters
Hive SQL
Permission Management
Hive Permission
Creating a Hive Role
Configuring Permissions for Hive Tables, Columns, or Databases
Configuring Permissions to Use Other Components for Hive
Using a Hive Client
Using HDFS Colocation to Store Hive Tables
Using the Hive Column Encryption Function
Customizing Row Separators
Deleting Single-Row Records from Hive on HBase
Configuring HTTPS/HTTP-based REST APIs
Enabling or Disabling the Transform Function
Access Control of a Dynamic Table View on Hive
Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
Using Hive to Read Data in a Relational Database
Supporting Traditional Relational Database Syntax in Hive
Creating User-Defined Hive Functions
Enhancing beeline Reliability
Viewing Table Structures Using the show create Statement as Users with the select Permission
Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
Inserting Data to a Directory That Does Not Exist
Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
Disabling of Specifying the location Keyword When Creating an Internal Hive Table
Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
Authorizing Over 32 Roles in Hive
Restricting the Maximum Number of Maps for Hive Tasks
HiveServer Lease Isolation
Hive Supporting Transactions
Switching the Hive Execution Engine to Tez
Connecting Hive with External RDS
Redis-based CacheStore of HiveMetaStore
Hive Materialized View
Hive Supporting Reading Hudi Tables
Hive Supporting Cold and Hot Storage of Partitioned Metadata
Hive Supporting ZSTD Compression Formats
Hive Log Overview
Hive Performance Tuning
Creating Table Partitions
Optimizing Join
Optimizing Group By
Optimizing Data Storage
Optimizing SQL Statements
Optimizing the Query Function Using Hive CBO
Common Issues About Hive
How Do I Delete UDFs on Multiple HiveServers at the Same Time?
Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
How Do I Monitor the Hive Table Size?
How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Description of Hive Table Location (Either Be an OBS or HDFS Path)
Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
Why Does Hive Not Support Vectorized Query?
Hive Configuration Problems
Using Hudi
Quick Start
Basic Operations
Hudi Table Schema
Write
Batch Write
Stream Write
Bootstrapping
Synchronizing Hudi Table Data to Hive
Read
Reading COW Table Views
Reading MOR Table Views
Data Management and Maintenance
Metadata Table
Clustering
Cleaning
Compaction
Savepoint
Single-Table Concurrent Write
Using the Hudi Client
Operating a Hudi Table Using hudi-cli.sh
Configuration Reference
Write Configuration
Configuration of Hive Table Synchronization
Index Configuration
Storage Configuration
Compaction and Cleaning Configurations
Metadata Table Configuration
Single-Table Concurrent Write Configuration
Hudi Performance Tuning
Performance Tuning Methods
Recommended Resource Configuration
Hudi SQL Syntax Reference
Constraints
DDL
CREATE TABLE
CREATE TABLE AS SELECT
DROP TABLE
SHOW TABLE
ALTER RENAME TABLE
ALTER ADD COLUMNS
TRUNCATE TABLE
DML
INSERT INTO
MERGE INTO
UPDATE
DELETE
COMPACTION
SET/RESET
Common Issues About Hudi
Data Write
Parquet/Avro schema Is Reported When Updated Data Is Written
UnsupportedOperationException Is Reported When Updated Data Is Written
SchemaCompatabilityException Is Reported When Updated Data Is Written
What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
Data Collection
IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
HoodieException Is Reported When Data Is Collected
HoodieKeyException Is Reported When Data Is Collected
Hive Synchronization
SQLException Is Reported During Hive Data Synchronization
HoodieHiveSyncException Is Reported During Hive Data Synchronization
SemanticException Is Reported During Hive Data Synchronization
Using Hue
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using HBase on the Hue Web UI
Typical Scenarios
HDFS on Hue
Hive on Hue
Oozie on Hue
Hue Log Overview
Common Issues About Hue
How Do I Solve the Problem that HQL Fails to Be Executed in Hue Using Internet Explorer?
Why Does the use database Statement Become Invalid When Hive Is Used?
What Can I Do If HDFS Files Fail to Be Accessed Using Hue WebUI?
What Should I Do If a Large File Fails to Be Uploaded on the Hue Page?
Hue Page Cannot Be Displayed When the Hive Service Is Not Installed in a Cluster
Using Kafka
Using Kafka from Scratch
Managing Kafka Topics
Querying Kafka Topics
Managing Kafka User Permissions
Managing Messages in Kafka Topics
Creating a Kafka Role
Kafka Common Parameters
Safety Instructions on Using Kafka
Kafka Specifications
Using the Kafka Client
Configuring Kafka HA and High Reliability Parameters
Changing the Broker Storage Directory
Checking the Consumption Status of Consumer Group
Kafka Balancing Tool Instructions
Kafka Token Authentication Mechanism Tool Usage
Kafka Feature Description
Using Kafka UI
Accessing Kafka UI
Kafka UI Overview
Creating a Topic on Kafka UI
Migrating a Partition on Kafka UI
Managing Topics on Kafka UI
Viewing Brokers on Kafka UI
Viewing a Consumer Group on Kafka UI
Introduction to Kafka Logs
Performance Tuning
Kafka Performance Tuning
Common Issues About Kafka
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using Loader
Common Loader Parameters
Creating a Loader Role
Managing Loader Links
Importing Data
Overview
Importing Data Using Loader
Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
Typical Scenario: Importing Data from an SFTP Server to HBase
Typical Scenario: Importing Data from an SFTP Server to Hive
Typical Scenario: Importing Data from an SFTP Server to Spark
Typical Scenario: Importing Data from an FTP Server to HBase
Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
Typical Scenario: Importing Data from a Relational Database to HBase
Typical Scenario: Importing Data from a Relational Database to Hive
Typical Scenario: Importing Data from a Relational Database to Spark
Typical Scenario: Importing Data from HDFS or OBS to HBase
Typical Scenario: Importing Data from a Relational Database to ClickHouse
Typical Scenario: Importing Data from HDFS to ClickHouse
Exporting Data
Overview
Using Loader to Export Data
Typical Scenario: Exporting Data from HDFS/OBS to an SFTP Server
Typical Scenario: Exporting Data from HBase to an SFTP Server
Typical Scenario: Exporting Data from Hive to an SFTP Server
Typical Scenario: Exporting Data from Spark to an SFTP Server
Typical Scenario: Exporting Data from HDFS/OBS to a Relational Database
Typical Scenario: Exporting Data from HBase to a Relational Database
Typical Scenario: Exporting Data from Hive to a Relational Database
Typical Scenario: Exporting Data from Spark to a Relational Database
Typical Scenario: Importing Data from HBase to HDFS/OBS
Job Management
Migrating Loader Jobs in Batches
Deleting Loader Jobs in Batches
Importing Loader Jobs in Batches
Exporting Loader Jobs in Batches
Viewing Historical Job Information
Operator Help
Overview
Input Operators
CSV File Input
Fixed File Input
Table Input
HBase Input
HTML Input
Hive input
Spark Input
Conversion Operators
Long Date Conversion
Null Value Conversion
Constant Field Addition
Random Value Conversion
Concat Fields
Extract Fields
Modulo Integer
String Cut
EL Operation
String Operations
String Reverse
String Trim
Filter Rows
Update Fields Operator
Output Operators
Hive output
Spark Output
Table Output
File Output
HBase Output
ClickHouse Output
Associating, Editing, Importing, or Exporting the Field Configuration of an Operator
Using Macro Definitions in Configuration Items
Operator Data Processing Rules
Client Tool Description
Running a Loader Job by Using Commands
loader-tool Usage Guide
loader-tool Usage Example
schedule-tool Usage Guide
schedule-tool Usage Example
Using loader-backup to Back Up Job Data
Open Source sqoop-shell Tool Usage Guide
Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
Loader Log Overview
Common Issues About Loader
How to Resolve the Problem that Failed to Save Data When Using Internet Explorer 10 or Internet Explorer 11 ?
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Using MapReduce
Converting MapReduce from the Single Instance Mode to the HA Mode
Configuring the Log Archiving and Clearing Mechanism
Reducing Client Application Failure Rate
Transmitting MapReduce Tasks from Windows to Linux
Configuring the Distributed Cache
Configuring the MapReduce Shuffle Address
Configuring the Cluster Administrator List
Introduction to MapReduce Logs
MapReduce Performance Tuning
Optimization Configuration for Multiple CPU Cores
Determining the Job Baseline
Streamlining Shuffle
AM Optimization for Big Tasks
Speculative Execution
Using Slow Start
Optimizing Performance for Committing MR Jobs
Common Issues About MapReduce
Why Does It Take a Long Time to Run a Task Upon ResourceManager Active/Standby Switchover?
Why Does a MapReduce Task Stay Unchanged for a Long Time?
Why the Client Hangs During Job Running?
Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
MapReduce Job Failed in Multiple NameService Environment
Why a Fault MapReduce Node Is Not Blacklisted?
Using Oozie
Using Oozie from Scratch
Using the Oozie Client
Enabling Oozie High Availability (HA)
Using Oozie Client to Submit an Oozie Job
Submitting a Hive Job
Submitting a Spark2x Job
Submitting a Loader Job
Submitting a DistCp Job
Submitting Other Jobs
Using Hue to Submit an Oozie Job
Creating a Workflow
Submitting a Workflow Job
Submitting a Hive2 Job
Submitting a Spark2x Job
Submitting a Java Job
Submitting a Loader Job
Submitting a MapReduce Job
Submitting a Sub-workflow Job
Submitting a Shell Job
Submitting an HDFS Job
Submitting a DistCp Job
Example of Mutual Trust Operations
Submitting an SSH Job
Submitting a Hive Script
Submitting an Email Job
Submitting a Coordinator Periodic Scheduling Job
Submitting a Bundle Batch Processing Job
Querying the Operation Results
Oozie Log Overview
Common Issues About Oozie
How Do I Resolve the Problem that the Oozie Client Fails to Submit a MapReduce Job?
Oozie Scheduled Tasks Are Not Executed on Time
The Update of the share lib Directory of Oozie Does Not Take Effect
Using Ranger
Logging In to the Ranger Web UI
Enabling Ranger Authentication
Configuring Component Permission Policies
Viewing Ranger Audit Information
Configuring a Security Zone
Changing the Ranger Data Source to LDAP for a Normal Cluster
Viewing Ranger Permission Information
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark2x
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for HetuEngine
Ranger Log Overview
Common Issues About Ranger
Why Ranger Startup Fails During the Cluster Installation?
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
Using Spark2x
Basic Operation
Getting Started
Configuring Parameters Rapidly
Common Parameters
Spark on HBase Overview and Basic Applications
Spark on HBase V2 Overview and Basic Applications
SparkSQL Permission Management(Security Mode)
Spark SQL Permissions
Creating a Spark SQL Role
Configuring Permissions for SparkSQL Tables, Columns, and Databases
Configuring Permissions for SparkSQL to Use Other Components
Configuring the Client and Server
Scenario-Specific Configuration
Configuring Multi-active Instance Mode
Configuring the Multi-tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Size of the Event Queue
Configuring Executor Off-Heap Memory
Enhancing Stability in a Limited Memory Condition
Viewing Aggregated Container Logs on the Web UI
Configuring Whether to Display Spark SQL Statements Containing Sensitive Words
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Configuring the Default Number of Data Blocks Divided by SparkSQL
Configuring the Compression Format of a Parquet Table
Configuring the Number of Lost Executors Displayed in WebUI
Setting the Log Level Dynamically
Configuring Whether Spark Obtains HBase Tokens
Configuring LIFO for Kafka
Configuring Reliability for Connected Kafka
Configuring Streaming Reading of Driver Execution Results
Filtering Partitions without Paths in Partitioned Tables
Configuring Spark2x Web UI ACLs
Configuring Vector-based ORC Data Reading
Broaden Support for Hive Partition Pruning Predicate Pushdown
Hive Dynamic Partition Overwriting Syntax
Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
Configuring Local Disk Cache for JobHistory
Configuring Spark SQL to Enable the Adaptive Execution Feature
Configuring Event Log Rollover
Adapting to the Third-party JDK When Ranger Is Used
Spark2x Logs
Obtaining Container Logs of a Running Spark Application
Small File Combination Tools
Using CarbonData for First Query
Spark2x Performance Tuning
Spark Core Tuning
Data Serialization
Optimizing Memory Configuration
Setting the DOP
Using Broadcast Variables
Using the external shuffle service to improve performance
Configuring Dynamic Resource Scheduling in Yarn Mode
Configuring Process Parameters
Designing the Direction Acyclic Graph (DAG)
Experience
Spark SQL and DataFrame Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the INSERT...SELECT Operation
Multiple JDBC Clients Concurrently Connecting to JDBCServer
Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
Optimizing SQL Query of Data of Multiple Sources
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Tuning
Spark on OBS Tuning
Common Issues About Spark2x
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
Why the Operation Fails When the Table Name Is TABLE?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
What Should I Do If the JDBCServer Process is Mistakenly Killed During a Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Are Some Functions Not Available when Another JDBCServer Is Connected?
Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Spark Streaming
Streaming Task Prints the Same DAG Log Twice
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Why Is not an Application Displayed When I Run the Application with the Empty Part File?
Why Does Spark2x Fail to Export a Table with the Same Field Name?
Why JRE fatal error after running Spark application multiple times?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native Spark2x UI
How Does Spark2x Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
What Should I Do If the Native Page of an Application of Spark2x JobHistory Fails to Display During Access to the Page
Spark Shuffle Exception Handling
Using Tez
Common Tez Parameters
Accessing TezUI
Log Overview
Common Issues
TezUI Cannot Display Tez Task Execution Details
Error Occurs When a User Switches to the Tez Web UI
Yarn Logs Cannot Be Viewed on the TezUI Page
Table Data Is Empty on the TezUI HiveQueries Page
Using Yarn
Common Yarn Parameters
Creating Yarn Roles
Using the Yarn Client
Configuring Resources for a NodeManager Role Instance
Changing NodeManager Storage Directories
Configuring Strict Permission Control for Yarn
Configuring Container Log Aggregation
Using CGroups with YARN
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring the Access Channel Protocol
Configuring Memory Usage Detection
Configuring the Additional Scheduler WebUI
Configuring Yarn Restart
Configuring ApplicationMaster Work Preserving
Configuring the Localized Log Levels
Configuring Users That Run Tasks
Yarn Log Overview
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
What Is the Queue Replacement Policy?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Common ZooKeeper Parameters
Using a ZooKeeper Client
Configuring the ZooKeeper Permissions
Changing the ZooKeeper Storage Directory
Configuring the ZooKeeper Connection
Configuring ZooKeeper Response Timeout Interval
Binding the Client to an IP Address
Configuring the Port Range Bound to the Client
Performing Special Configuration on ZooKeeper Clients in the Same JVM
Configuring a Quota for a Znode
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Accessing FusionInsight Manager
Using an MRS Client
Using an MRS Client on Nodes Inside a MRS Cluster
Using an MRS Client on Nodes Outside a MRS Cluster
API Reference (Paris Region)
Before You Start
Overview
API Calling
Endpoints
Constraints
Concepts
Selecting an API Type
API Overview
Calling APIs
Making an API Request
Authentication
Response
Application Cases
Creating an MRS Cluster
Scaling Out a Cluster
Scaling in a Cluster
Creating a Job
Terminating a Job
Terminating a Cluster
API V2
Cluster Management APIs
Creating Clusters
Job Object APIs
Adding and Executing a Job
Querying Information About a Job
Querying a List of Jobs
Terminating a Job
Deleting Jobs in Batches
Obtain the SQL Result
SQL APIs
Submitting an SQL Statement
Querying SQL Results
Cancel an SQL Execution Task
Cluster HDFS File API
Obtaining Files from a Specified Directory
Agency Management
Querying the Mapping Between a User (Group) and an IAM Agency
Updating the Mapping Between a User (Group) and an IAM Agency
API V1.1
Data Source APIs
Creating a Data Source
Updating a Data Source
Querying the Data Source List
Querying the Data Source Details
Deleting a Data Source
Cluster Management APIs
Creating a Cluster and Running a Job
Resizing a Cluster
Querying a Cluster List
Deleting a Cluster
Querying Cluster Details
Querying a Host List
Job Binary Object APIs
Creating a Job Binary Object
Updating a Job Binary Object
Querying the Binary Object List
Querying the Binary Object Details
Deleting a Job Binary Object
Job Object APIs
Creating a Job Object
Updating a Job Object
Executing a Job Object
Querying the Job Object List
Querying Job Object Details
Deleting a Job Object
Job Execution Object APIs
Querying the Job Execution Object List
Querying Job Execution Object Details
Canceling Job Execution
Auto Scaling APIs
Configuring an Auto Scaling Rule
Tag Management APIs
Adding a Tag to a Specified Cluster
Deleting a Tag of a Specified Cluster
Querying Tags of a Specified Cluster
Adding or Deleting Cluster Tags in Batches
Querying All Tags
Querying a List of Clusters with Specified Tags
Out-of-Date APIs
Job API Management (Deprecated)
Adding and Executing a Job (Deprecated)
Querying the exe Object List of Jobs (Deprecated)
Querying exe Object Details (Deprecated)
Deleting a Job Execution Object (Deprecated)
Permissions Policies and Supported Actions
Introduction
Appendix
ECS Specifications Used by MRS
Status Codes
Obtaining a Project ID
Obtaining Account ID
Obtaining the MRS Cluster Information
Roles and components supported by MRS
Change History
User Guide (Kuala Lumpur Region)
Overview
What Is MRS?
Advantages of MRS Compared with Self-Built Hadoop
Application Scenarios
Components
Alluxio
CarbonData
ClickHouse
DBService
DBService Basic Principles
Relationship Between DBService and Other Components
Flink
Flink Basic Principles
Flink HA Solution
Relationship with Other Components
Flink Enhanced Open Source Features
Window
Job Pipeline
Configuration Table
Stream SQL Join
Flink CEP in SQL
Flume
Flume Basic Principles
Relationship Between Flume and Other Components
Flume Enhanced Open Source Features
HBase
HBase Basic Principles
HBase HA Solution
Relationship with Other Components
HBase Enhanced Open Source Features
HDFS
HDFS Basic Principles
HDFS HA Solution
Relationship Between HDFS and Other Components
HDFS Enhanced Open Source Features
Hive
Hive Basic Principles
Hive CBO Principles
Relationship Between Hive and Other Components
Enhanced Open Source Feature
Hue
Hue Basic Principles
Relationship Between Hue and Other Components
Hue Enhanced Open Source Features
Impala
Kafka
Kafka Basic Principles
Relationship Between Kafka and Other Components
Kafka Enhanced Open Source Features
KafkaManager
KrbServer and LdapServer
KrbServer and LdapServer Principles
KrbServer and LdapServer Enhanced Open Source Features
Kudu
Loader
Loader Basic Principles
Relationship Between Loader and Other Components
Loader Enhanced Open Source Features
Manager
Manager Basic Principles
Manager Key Features
MapReduce
MapReduce Basic Principles
Relationship Between MapReduce and Other Components
MapReduce Enhanced Open Source Features
Oozie
Oozie Basic Principles
Oozie Enhanced Open Source Features
OpenTSDB
Presto
Ranger
Ranger Basic Principles
Relationship Between Ranger and Other Components
Spark
Basic Principles of Spark
Spark HA Solution
Relationship Among Spark, HDFS, and Yarn
Spark Enhanced Open Source Feature: Optimized SQL Query of Cross-Source Data
Spark2x
Basic Principles of Spark2x
Spark2x HA Solution
Spark2x Multi-active Instance
Spark2x Multi-tenant
Relationship Between Spark2x and Other Components
Spark2x Open Source New Features
Spark2x Enhanced Open Source Features
CarbonData Overview
Optimizing SQL Query of Data of Multiple Sources
Data Skewness Optimization
Storm
Storm Basic Principles
Relationship Between Storm and Other Components
Storm Enhanced Open Source Features
Tez
Yarn
Yarn Basic Principles
Yarn HA Solution
Relationship Between YARN and Other Components
Yarn Enhanced Open Source Features
ZooKeeper
ZooKeeper Basic Principle
Relationship Between ZooKeeper and Other Components
ZooKeeper Enhanced Open Source Features
Functions
Multi-tenant
Security Hardening
Easy Access to Web UIs of Components
Reliability Enhancement
Job Management
Bootstrap Actions
Enterprise Project Management
Metadata
Cluster Management
Cluster Lifecycle Management
Manually Scale Out/In a Cluster
Auto Scaling
Task Node Creation
Scaling Up Master Node Specifications
Isolating a Host
Managing Tags
Cluster O&M
Message Notification
Constraints
Technical Support
Permissions Management
Related Services
Common Concepts
MRS Quick Start
How to Use MRS
Creating a Cluster
Uploading Data and Programs
Creating a Job
Using Clusters with Kerberos Authentication Enabled
Terminating a Cluster
Preparing a User
Creating an MRS User
Creating a Custom Policy
Synchronizing IAM Users to MRS
Configuring a Cluster
Methods of Creating MRS Clusters
Quick Creation of a Cluster
Quick Creation of a Hadoop Analysis Cluster
Quick Creation of an HBase Analysis Cluster
Quick Creation of a Kafka Streaming Cluster
Quick Creation of a ClickHouse Cluster
Quick Creation of a Real-time Analysis Cluster
Creating a Custom Cluster
Creating a Custom Topology Cluster
Adding a Tag to a Cluster
Communication Security Authorization
Configuring an Auto Scaling Rule
Managing Data Connections
Configuring Data Connections
Configuring Ranger Data Connections
Configuring a Hive Data Connection
Installing the Third-Party Software Using Bootstrap Actions
Introduction to Bootstrap Actions
Preparing the Bootstrap Action Script
View Execution Records
Adding a Bootstrap Action
Viewing Failed MRS Tasks
Viewing Information of a Historical Cluster
Managing Clusters
Logging In to a Cluster
MRS Cluster Node Overview
Logging In to an ECS
Determining Active and Standby Management Nodes of Manager
Cluster Overview
Cluster List
Checking the Cluster Status
Viewing Basic Cluster Information
Viewing Cluster Patch Information
Viewing and Customizing Cluster Monitoring Metrics
Managing Components and Monitoring Hosts
Cluster O&M
Importing and Exporting Data
Changing the Subnet of a Cluster
Configuring Message Notification
Checking Health Status
Before You Start
Performing a Health Check
Viewing and Exporting a Health Check Report
Remote O&M
Authorizing O&M
Sharing Logs
Viewing MRS Operation Logs
Terminating a Cluster
Managing Nodes
Manually Scaling Out a Cluster
Manually Scaling In a Cluster
Managing a Host (Node)
Isolating a Host
Canceling Host Isolation
Scaling Up Master Node Specifications
Job Management
Introduction to MRS Jobs
Running a MapReduce Job
Running a SparkSubmit Job
Running a HiveSQL Job
Running a SparkSql Job
Running a Flink Job
Running a Kafka Job
Viewing Job Configuration and Logs
Stopping a Job
Deleting a Job
Using Encrypted OBS Data for Job Running
Configuring Job Notification Rules
Component Management
Object Management
Viewing Configuration
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configuration
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Starting and Stopping a Cluster
Synchronizing Cluster Configuration
Exporting Cluster Configuration
Performing Rolling Restart
Alarm Management
Viewing the Alarm List
Viewing the Event List
Viewing and Manually Clearing an Alarm
Patch Management
Patch Operation Guide for Versions Earlier Than MRS 3.x
Rolling Patches
Restoring Patches for the Isolated Hosts
Tenant Management
Before You Start
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a Tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Using an MRS Client
Installing a Client
Installing a Client (Version 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
Using the Client of Each Component
Using a ClickHouse Client
Using a Flink Client
Using a Flume Client
Using an HBase Client
Using an HDFS Client
Using a Hive Client
Using an Impala Client
Using a Kafka Client
Using a Kudu Client
Using the Oozie Client
Using a Storm Client
Using a Yarn Client
Configuring a Cluster with Storage and Compute Decoupled
Introduction to Storage-Compute Decoupling
Configuring a Storage-Compute Decoupled Cluster (Agency)
Configuring a Storage-Compute Decoupled Cluster (AK/SK)
Using a Storage-Compute Decoupled Cluster
Interconnecting Flink with OBS
Interconnecting Flume with OBS
Interconnecting HDFS with OBS
Interconnecting Hive with OBS
Interconnecting MapReduce with OBS
Interconnecting Spark2x with OBS
Interconnecting Sqoop with External Storage Systems
Accessing Web Pages of Open Source Components Managed in MRS Clusters
Web UIs of Open Source Components
List of Open Source Component Ports
Access Through Direct Connect
EIP-based Access
Access Using a Windows ECS
Creating an SSH Channel for Connecting to an MRS Cluster and Configuring the Browser
Accessing Manager
Accessing FusionInsight Manager (MRS 3.x or Later)
Accessing MRS Manager MRS 2.1.0 or Earlier)
FusionInsight Manager Operation Guide (Applicable to 3.x)
Getting Started
FusionInsight Manager Introduction
Querying the FusionInsight Manager Version
Logging In to FusionInsight Manager
Logging In to the Management Node
Homepage
Overview
Managing the Monitoring Indicator Report
Cluster
Cluster Management
Overview
Performing a Rolling Restart of a Cluster
Managing Expired Configurations
Downloading the Client
Modifying Cluster Properties
Management Cluster Configuration
Static Service Pool
Static Service Resources
Configuring Cluster Static Resources
Viewing Cluster Static Resources
Client Management
Managing the Client
Batch Upgrading Clients
Updating the hosts File in Batches
Managing a Service
Overview
Other Service Management Operations
Service Details Page
Performing Active/Standby Switchover of a Role Instance
Resource Monitoring
Collecting Stack Information
Switching Ranger Authentication
Service Configuration
Modifying Service Configuration Parameters
Modifying Customized Configuration Parameters of a Service
Instance Management
Instance Management Overview
Decommissioning and Recommissioning an Instance
Managing Instance Configurations
Viewing the Instance Configuration File
Instance Group
Managing Instance Groups
Viewing Information About an Instance Group
Configuring Instantiation Group Parameters
Hosts
Host Management Page
Viewing the Host List
Viewing the Host Dashboard
Checking Processes and Resources on the Active Node
Host Maintenance Operations
Starting and Stopping All Instances on a Host
Performing a Host Health Check
Configuring Racks for Hosts
Isolating a Host
Exporting Host Information
Resource Overview
Distribution
Trend
Cluster
Host
O&M
Alarms
Overview of Alarms and Events
Configuring the Threshold
Configuring the Alarm Masking Status
Log
Online Log Searching
Log Downloadind
Perform a Health Check
Viewing a Health Check Task
Managing Health Check Reports
Modifying Health Check Configuration
Configuring Backup and Backup Restoration
Creating a Backup Task
Creating a Backup Restoration Task
Managing Backup and Backup Restoration Tasks
Audit
Overview
Configuring Audit Log Dumping
Tenant Resources
Introduction to Multi-Tenant
Overview
Technical Principles
Multi-Tenant Management
Models Related to Multi-Tenant
Resource Overview
Dynamic Resources
Storage Resource
Multi-Tenant Use
Overview
Process Overview
Using the Superior Scheduler in Multi-Tenant Scenarios
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing a Tenant Directory
Restoring Tenant Data
Deleting a Tenant
Managing Resources
Add a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Managing Global User Policies
Using the Capacity Scheduler in Multi-Tenant Scenarios
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing a Tenant Directory
Restoring Tenant Data
Deleting a Tenant
Clearing Unassociated Queues of a Tenant in Capacity Scheduler Mode
Managing Resources
Add a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Switching the Scheduler
System Configuration
Configuring Permissions
Managing Users
Creating a User
Modifying User Information
Exporting User Information
Locking a User
Unlocking a User
Deleting a User
Changing a User Password
Initializing a Password
Exporting an Authentication Credential File
Managing User Groups
Managing Roles
Security Policy
Configuring Password Policies
Configuring the Independent Attribute
Configuring Interconnections
Configuring SNMP Northbound Parameters
Configuring Syslog Northbound Parameters
Configuring Monitoring Indicator Data Dump
Importing a Certificate
OMS Management
Overview of the OMS Maintenance Page
Modifying OMS Service Configuration Parameters
Component Management
Viewing Component Packages
Cluster Management
Configuring Client
Installing a Client
Using a Client
Updating the Configuration of the Installed Client
Managing Mutual Trust Relationships Between Managers
Introduction to Mutual Trust Relationships Between Clusters
Changing Manager System Domain Name
Configuring Cross-Manager Cluster Mutual Trust Relationships
Assigning User Permissions After Cross-Cluster Mutual Trust Is Configured
Configuring Periodical Alarm and Audit Information Backup
Modifying the Manager Routing Table
Switching to Maintenance Mode
Routine Maintenance
Log Management
About Logs
Manager Log List
Configuring the Log Level and Log File Size
Configuring the Number of Local Backup Audit Log Files
Viewing Role Instance Logs
Backup and Recovery Management
Introduction
Backing Up Data
Backing Up OMS Data
Backing Up DBService Data
Backing Up HBase Metadata
Backing Up HBase Service Data
Backing Up NameNode Data
Backing Up HDFS Service Data
Backing Up Hive Service Data
Backing Up Kafka Metadata
Recovering Data
Recovering OMS Data
Recovering DBService Data
Recovering HBase Metadata
Recovering HBase Service Data
Recovering NameNode Data
Recovering HDFS Service Data
Recovering Hive Service Data
Recovering Kafka Metadata
Enabling Cross-Cluster Replication
Managing Local Quick Recovery Tasks
Modifying a Backup Task
Viewing Backup and Recovery Tasks
Security Management
Security Overview
Rights Model
Rights Mechanism
Authentication Policies
Permission Verification Policies
User Account List
Default Permission Information
FusionInsight Manager Security Functions
Account Management
Account Security Settings
Unlocking LDAP Users and Management Accounts
Unlocking an Internal System User
Enabling and Disabling Permission Verification on Cluster Components
Logging In to a Non-Cluster Node Using a Cluster User in Normal Mode
Changing the Password for a System User
Changing the Password for User admin
Changing the Password for an OS User
Changing the Password for a System Internal User
Changing the Password for the Kerberos Administrator
Changing the Password for the OMS Kerberos Administrator
Changing the Passwords of the LDAP Administrator and the LDAP User (Including OMS LDAP)
Changing the Password for the LDAP Administrator
Changing the Password for a Component Running User
Changing the Password for a Database User
Changing the Password for the OMS Database Administrator
Changing the Password for the OMS Database Data Access User
Changing the Password for a Component Database User
Changing the Password for User omm in DBService
Security Hardening
Hardening Policy
Configuring a Trusted IP Address to Access LDAP
HFile and WAL Encryption
Security Configuration
Configuring an IP Address Whitelist for Modifications Allowed by HBase
Updating a Key for a Cluster
Hardening the LDAP
Configuring Kafka Data Encryption During Transmission
Configuring HDFS Data Encryption During Transmission
Encrypting the Communication Between Controller and Agent
Updating SSH Keys for User omm
Security Maintenance
Account Maintenance Suggestions
Password Maintenance Suggestions
Logs Maintenance Suggestions
Security Statement
Alarm Reference (Applicable to MRS 3.x)
ALM-12001 Audit Log Dumping Failure
ALM-12004 OLdap Resource Abnormal
ALM-12005 OKerberos Resource Abnormal
ALM-12006 Node Fault
ALM-12007 Process Fault
ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes
ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes
ALM-12014 Partition Lost
ALM-12015 Partition Filesystem Readonly
ALM-12016 CPU Usage Exceeds the Threshold
ALM-12017 Insufficient Disk Capacity
ALM-12018 Memory Usage Exceeds the Threshold
ALM-12027 Host PID Usage Exceeds the Threshold
ALM-12028 The number of processes that are in the D state on the host exceeds the threshold
ALM-12033 Slow Disk Fault
ALM-12034 Periodical Backup Failure
ALM-12035 Unknown Data Status After Recovery Task Failure
ALM-12038 Monitoring Indicator Dumping Failure
ALM-12039 Active/Standby OMS Databases Not Synchronized
ALM-12040 Insufficient System Entropy
ALM-12041 Incorrect Permission on Key Files
ALM-12042 Incorrect Configuration of Key Files
ALM-12045 Network Read Packet Dropped Rate Exceeds the Threshold
ALM-12046 Network Write Packet Dropped Rate Exceeds the Threshold
ALM-12047 Network Read Packet Error Rate Exceeds the Threshold
ALM-12048 Network Write Packet Error Rate Exceeds the Threshold
ALM-12049 Network Read Throughput Rate Exceeds the Threshold
ALM-12050 Network Write Throughput Rate Exceeds the Threshold
ALM-12051 Disk Inode Usage Exceeds the Threshold
ALM-12052 TCP Temporary Port Usage Exceeds the Threshold
ALM-12053 Host File Handle Usage Exceeds the Threshold
ALM-12054 Invalid Certificate File
ALM-12055 The Certificate File Is About to Expire
ALM-12057 Metadata Not Configured with the Task to Periodically Back Up Data to a Third-Party Server
ALM-12061 Process Usage Exceeds the Threshold
ALM-12062 OMS Parameter Configurations Mismatch with the Cluster Scale
ALM-12063 Unavailable Disk
ALM-12064 Host Random Port Range Conflicts with Cluster Used Port
ALM-12066 Trust Relationships Between Nodes Become Invalid
ALM-12067 Tomcat Resource Is Abnormal
ALM-12068 ACS Resource Is Abnormal
ALM-12069 AOS Resource Is Abnormal
ALM-12070 Controller Resource Is Abnormal
ALM-12071 Httpd Resource Is Abnormal
ALM-12072 FloatIP Resource Is Abnormal
ALM-12073 CEP Resource Is Abnormal
ALM-12074 FMS Resource Is Abnormal
ALM-12075 PMS Resource Is Abnormal
ALM-12076 GaussDB Resource Is Abnormal
ALM-12077 User omm Expired
ALM-12078 Password of User omm Expired
ALM-12079 User omm Is About to Expire
ALM-12080 Password of User omm Is About to Expire
ALM-12081User ommdba Expired
ALM-12082 User ommdba Is About to Expire
ALM-12083 Password of User ommdba Is About to Expire
ALM-12084 Password of User ommdba Expired
ALM-12085 Service Audit Log Dump Failure
ALM-12087 System Is in the Upgrade Observation Period
ALM-12089 Inter-Node Network Is Abnormal
ALM-12101 AZ Unhealthy
ALM-12102 AZ HA Component Is Not Deployed Based on DR Requirements
ALM-12110 Failed to get ECS temporary ak/sk
ALM-13000 ZooKeeper Service Unavailable
ALM-13001 Available ZooKeeper Connections Are Insufficient
ALM-13002 ZooKeeper Direct Memory Usage Exceeds the Threshold
ALM-13003 GC Duration of the ZooKeeper Process Exceeds the Threshold
ALM-13004 ZooKeeper Heap Memory Usage Exceeds the Threshold
ALM-13005 Failed to Set the Quota of Top Directories of ZooKeeper Components
ALM-13006 Znode Number or Capacity Exceeds the Threshold
ALM-13007 Available ZooKeeper Client Connections Are Insufficient
ALM-13008 ZooKeeper Znode Usage Exceeds the Threshold
ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold
ALM-13010 Znode Usage of a Directory with Quota Configured Exceeds the Threshold
ALM-14000 HDFS Service Unavailable
ALM-14001 HDFS Disk Usage Exceeds the Threshold
ALM-14002 DataNode Disk Usage Exceeds the Threshold
ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold
ALM-14006 Number of HDFS Files Exceeds the Threshold
ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold
ALM-14008 DataNode Heap Memory Usage Exceeds the Threshold
ALM-14009 Number of Dead DataNodes Exceeds the Threshold
ALM-14010 NameService Service Is Abnormal
ALM-14011 DataNode Data Directory Is Not Configured Properly
ALM-14012 JournalNode Is Out of Synchronization
ALM-14013 Failed to Update the NameNode FsImage File
ALM-14014 NameNode GC Time Exceeds the Threshold
ALM-14015 DataNode GC Time Exceeds the Threshold
ALM-14016 DataNode Direct Memory Usage Exceeds the Threshold
ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold
ALM-14018 NameNode Non-heap Memory Usage Exceeds the Threshold
ALM-14019 DataNode Non-heap Memory Usage Exceeds the Threshold
ALM-14020 Number of Entries in the HDFS Directory Exceeds the Threshold
ALM-14021 NameNode Average RPC Processing Time Exceeds the Threshold
ALM-14022 NameNode Average RPC Queuing Time Exceeds the Threshold
ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold
ALM-14024 Tenant Space Usage Exceeds the Threshold
ALM-14025 Tenant File Object Usage Exceeds the Threshold
ALM-14026 Blocks on DataNode Exceed the Threshold
ALM-14027 DataNode Disk Fault
ALM-14028 Number of Blocks to Be Supplemented Exceeds the Threshold
ALM-14029 Number of Blocks in a Replica Exceeds the Threshold
ALM-16000 Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold
ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold
ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold
ALM-16003 Background Thread Usage Exceeds the Threshold
ALM-16004 Hive Service Unavailable
ALM-16005 The Heap Memory Usage of the Hive Process Exceeds the Threshold
ALM-16006 The Direct Memory Usage of the Hive Process Exceeds the Threshold
ALM-16007 Hive GC Time Exceeds the Threshold
ALM-16008 Non-Heap Memory Usage of the Hive Process Exceeds the Threshold
ALM-16009 Map Number Exceeds the Threshold
ALM-16045 Hive Data Warehouse Is Deleted
ALM-16046 Hive Data Warehouse Permission Is Modified
ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
ALM-16048 Tez or Spark Library Path Does Not Exist
ALM-17003 Oozie Service Unavailable
ALM-17004 Oozie Heap Memory Usage Exceeds the Threshold
ALM-17005 Oozie Non Heap Memory Usage Exceeds the Threshold
ALM-17006 Oozie Direct Memory Usage Exceeds the Threshold
ALM-17007 Garbage Collection (GC) Time of the Oozie Process Exceeds the Threshold
ALM-18000 Yarn Service Unavailable
ALM-18002 NodeManager Heartbeat Lost
ALM-18003 NodeManager Unhealthy
ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold
ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold
ALM-18010 ResourceManager GC Time Exceeds the Threshold
ALM-18011 NodeManager GC Time Exceeds the Threshold
ALM-18012 JobHistoryServer GC Time Exceeds the Threshold
ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold
ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold
ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold
ALM-18016 Non Heap Memory Usage of ResourceManager Exceeds the Threshold
ALM-18017 Non Heap Memory Usage of NodeManager Exceeds the Threshold
ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold
ALM-18019 Non Heap Memory Usage of JobHistoryServer Exceeds the Threshold
ALM-18020 Yarn Task Execution Timeout
ALM-18021 Mapreduce Service Unavailable
ALM-18022 Insufficient Yarn Queue Resources
ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold
ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold
ALM-18025 Number of Terminated Yarn Tasks Exceeds the Threshold
ALM-18026 Number of Failed Yarn Tasks Exceeds the Threshold
ALM-19000 HBase Service Unavailable
ALM-19006 HBase Replication Sync Failed
ALM-19007 HBase GC Time Exceeds the Threshold
ALM-19008 Heap Memory Usage of the HBase Process Exceeds the Threshold
ALM-19009 Direct Memory Usage of the HBase Process Exceeds the Threshold
ALM-19011 RegionServer Region Number Exceeds the Threshold
ALM-19012 HBase System Table Directory or File Lost
ALM-19013 Duration of Regions in transaction State Exceeds the Threshold
ALM-19014 Capacity Quota Usage on ZooKeeper Exceeds the Threshold Severely
ALM-19015 Quantity Quota Usage on ZooKeeper Exceeds the Threshold
ALM-19016 Quantity Quota Usage on ZooKeeper Exceeds the Threshold Severely
ALM-19017 Capacity Quota Usage on ZooKeeper Exceeds the Threshold
ALM-19018 HBase Compaction Queue Exceeds the Threshold
ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold
ALM-19020 Number of HBase WAL Files to Be Synchronized Exceeds the Threshold
ALM-20002 Hue Service Unavailable
ALM-24000 Flume Service Unavailable
ALM-24001 Flume Agent Exception
ALM-24003 Flume Client Connection Interrupted
ALM-24004 Exception Occurs When Flume Reads Data
ALM-24005 Exception Occurs When Flume Transmits Data
ALM-24006 Heap Memory Usage of Flume Server Exceeds the Threshold
ALM-24007 Flume Server Direct Memory Usage Exceeds the Threshold
ALM-24008 Flume Server Non-Heap Memory Usage Exceeds the Threshold
ALM-24009 Flume Server Garbage Collection (GC) Time Exceeds the Threshold
ALM-24010 Flume Certificate File Is Invalid or Damaged
ALM-24011 Flume Certificate File Is About to Expire
ALM-24012 Flume Certificate File Has Expired
ALM-24013 Flume MonitorServer Certificate File Is Invalid or Damaged
ALM-24014 Flume MonitorServer Certificate Is About to Expire
ALM-24015 Flume MonitorServer Certificate File Has Expired
ALM-25000 LdapServer Service Unavailable
ALM-25004 Abnormal LdapServer Data Synchronization
ALM-25005 nscd Service Exception
ALM-25006 Sssd Service Exception
ALM-25500 KrbServer Service Unavailable
ALM-26051 Storm Service Unavailable
ALM-26052 Number of Available Supervisors of the Storm Service Is Less Than the Threshold
ALM-26053 Storm Slot Usage Exceeds the Threshold
ALM-26054 Nimbus Heap Memory Usage Exceeds the Threshold
ALM-27001 DBService Service Unavailable
ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
ALM-27004 Data Inconsistency Between Active and Standby DBServices
ALM-27005 Database Connections Usage Exceeds the Threshold
ALM-27006 Disk Space Usage of the Data Directory Exceeds the Threshold
ALM-27007 Database Enters the Read-Only Mode
ALM-29000 Impala Service Unavailable
ALM-29004 Impalad Process Memory Usage Exceeds the Threshold
ALM-29005 Number of JDBC Connections to Impalad Exceeds the Threshold
ALM-29006 Number of ODBC Connections to Impalad Exceeds the Threshold
ALM-29100 Kudu Service Unavailable
ALM-29104 Tserver Process Memory Usage Exceeds the Threshold
ALM-29106 Tserver Process CPU Usage Exceeds the Threshold
ALM-29107 Tserver Process Memory Usage Exceeds the Threshold
ALM-38000 Kafka Service Unavailable
ALM-38001 Insufficient Kafka Disk Capacity
ALM-38002 Kafka Heap Memory Usage Exceeds the Threshold
ALM-38004 Kafka Direct Memory Usage Exceeds the Threshold
ALM-38005 GC Duration of the Broker Process Exceeds the Threshold
ALM-38006 Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold
ALM-38007 Status of Kafka Default User Is Abnormal
ALM-38008 Abnormal Kafka Data Directory Status
ALM-38009 Busy Broker Disk I/Os
ALM-38010 Topics with Single Replica
ALM-43001 Spark2x Service Unavailable
ALM-43006 Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold
ALM-43007 Non-Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold
ALM-43008 The Direct Memory Usage of the JobHistory2x Process Exceeds the Threshold
ALM-43009 JobHistory2x Process GC Time Exceeds the Threshold
ALM-43010 Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
ALM-43011 Non-Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
ALM-43012 Direct Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
ALM-43013 JDBCServer2x Process GC Time Exceeds the Threshold
ALM-43017 JDBCServer2x Process Full GC Number Exceeds the Threshold
ALM-43018 JobHistory2x Process Full GC Number Exceeds the Threshold
ALM-43019 Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold
ALM-43020 Non-Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold
ALM-43021 Direct Memory Usage of the IndexServer2x Process Exceeds the Threshold
ALM-43022 IndexServer2x Process GC Time Exceeds the Threshold
ALM-43023 IndexServer2x Process Full GC Number Exceeds the Threshold
ALM-44004 Presto Coordinator Resource Group Queuing Tasks Exceed the Threshold
ALM-44005 Presto Coordinator Process GC Time Exceeds the Threshold
ALM-44006 Presto Worker Process GC Time Exceeds the Threshold
ALM-45175 Average Time for Calling OBS Metadata APIs Is Greater than the Threshold
ALM-45176 Success Rate of Calling OBS Metadata APIs Is Lower than the Threshold
ALM-45177 Success Rate of Calling OBS Data Read APIs Is Lower than the Threshold
ALM-45178 Success Rate of Calling OBS Data Write APIs Is Lower Than the Threshold
ALM-45275 Ranger Service Unavailable
ALM-45276 Abnormal RangerAdmin status
ALM-45277 RangerAdmin Heap Memory Usage Exceeds the Threshold
ALM-45278 RangerAdmin Direct Memory Usage Exceeds the Threshold
ALM-45279 RangerAdmin Non Heap Memory Usage Exceeds the Threshold
ALM-45280 RangerAdmin GC Duration Exceeds the Threshold
ALM-45281 UserSync Heap Memory Usage Exceeds the Threshold
ALM-45282 UserSync Direct Memory Usage Exceeds the Threshold
ALM-45283 UserSync Non Heap Memory Usage Exceeds the Threshold
ALM-45284 UserSync Garbage Collection (GC) Time Exceeds the Threshold
ALM-45285 TagSync Heap Memory Usage Exceeds the Threshold
ALM-45286 TagSync Direct Memory Usage Exceeds the Threshold
ALM-45287 TagSync Non Heap Memory Usage Exceeds the Threshold
ALM-45288 TagSync Garbage Collection (GC) Time Exceeds the Threshold
ALM-45425 ClickHouse Service Unavailable
ALM-45426 ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold
ALM-45427 ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold
ALM-45736 Guardian Service Unavailable
MRS Manager Operation Guide (Applicable to 2.x and Earlier Versions)
Introduction to MRS Manager
Checking Running Tasks
Monitoring Management
Dashboard
Managing Services and Monitoring Hosts
Managing Resource Distribution
Configuring Monitoring Metric Dumping
Alarm Management
Viewing and Manually Clearing an Alarm
Configuring an Alarm Threshold
Configuring Syslog Northbound Interface Parameters
Configuring SNMP Northbound Interface Parameters
Object Management
Managing Objects
Viewing Configurations
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configurations
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Managing a Host
Isolating a Host
Canceling Host Isolation
Starting or Stopping a Cluster
Synchronizing Cluster Configurations
Exporting Configuration Data of a Cluster
Log Management
About Logs
Manager Log List
Viewing and Exporting Audit Logs
Exporting Service Logs
Configuring Audit Log Exporting Parameters
Health Check Management
Performing a Health Check
Viewing and Exporting a Health Check Report
Configuring the Number of Health Check Reports to Be Reserved
Managing Health Check Reports
DBService Health Check Indicators
Flume Health Check Indicators
HBase Health Check Indicators
Host Health Check Indicators
HDFS Health Check Indicators
Hive Health Check Indicators
Kafka Health Check Indicators
KrbServer Health Check Indicators
LdapServer Health Check Indicators
Loader Health Check Indicators
MapReduce Health Check Indicators
OMS Health Check Indicators
Spark Health Check Indicators
Storm Health Check Indicators
Yarn Health Check Indicators
ZooKeeper Health Check Indicators
Static Service Pool Management
Viewing the Status of a Static Service Pool
Configuring a Static Service Pool
Tenant Management
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Backup and Restoration
Introduction
Backing Up Metadata
Restoring Metadata
Modifying a Backup Task
Viewing Backup and Restoration Tasks
Security Management
Default Users of Clusters with Kerberos Authentication Disabled
Default Users of Clusters with Kerberos Authentication Enabled
Changing the Password of an OS User
Changing the password of user admin
Changing the Password of the Kerberos Administrator
Changing the Passwords of the LDAP Administrator and the LDAP User
Changing the Password of a Component Running User
Changing the Password of the OMS Database Administrator
Changing the Password of the Data Access User of the OMS Database
Changing the Password of a Component Database User
Updating Cluster Keys
Permissions Management
Creating a Role
Creating a User Group
Creating a User
Modifying User Information
Locking a User
Unlocking a User
Deleting a User
Changing the Password of an Operation User
Initializing the Password of a System User
Downloading a User Authentication File
Modifying a Password Policy
MRS Multi-User Permission Management
Users and Permissions of MRS Clusters
Default Users of Clusters with Kerberos Authentication Enabled
Creating a Role
Creating a User Group
Creating a User
Modifying User Information
Locking a User
Unlocking a User
Deleting a User
Changing the Password of an Operation User
Initializing the Password of a System User
Downloading a User Authentication File
Modifying a Password Policy
Configuring Cross-Cluster Mutual Trust Relationships
Configuring Users to Access Resources of a Trusted Cluster
Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
Patch Operation Guide
Patch Operation Guide for Versions
Supporting Rolling Patches
Restoring Patches for the Isolated Hosts
Rolling Restart
MRS Cluster Component Operation Gudie
Using Alluxio
Configuring an Underlying Storage System
Accessing Alluxio Using a Data Application
Common Operations of Alluxio
Using CarbonData (for Versions Earlier Than MRS 3.x)
Using CarbonData from Scratch
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Using CarbonData (for MRS 3.x or Later)
Overview
CarbonData Overview
Main Specifications of CarbonData
Configuration Reference
CarbonData Operation Guide
CarbonData Quick Start
CarbonData Table Management
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Modify the CarbonData Table
CarbonData Table Data Management
Loading Data
Deleting Segments
Combining Segments
CarbonData Data Migration
Migrating Data on CarbonData from Spark 1.5 to Spark2x
CarbonData Performance Tuning
Tuning Guidelines
Suggestions for Creating CarbonData Tables
Configurations for Performance Tuning
CarbonData Access Control
CarbonData Syntax Reference
DDL
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
DML
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Operation Concurrent Execution
API
Spatial Indexes
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deterioration
CarbonData FAQ
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why Data Load Performance Decreases due to Bad Records?
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Why Data loading Fails During off heap?
Why Do I Fail to Create a Hive Table?
Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
How Do I Logically Split Data Across Different Namespaces?
Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
Using ClickHouse
Using ClickHouse from Scratch
ClickHouse Table Engine Overview
Creating a ClickHouse Table
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Structure
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
Migrating ClickHouse Data
Using ClickHouse to Import and Export Data
Synchronizing Kafka Data to ClickHouse
Using the ClickHouse Data Migration Tool
User Management and Authentication
ClickHouse User and Permission Management
Interconnecting ClickHouse With OpenLDAP for Authentication
Backing Up and Restoring ClickHouse Data Using a Data File
ClickHouse Log Overview
Using DBService
DBService Log Overview
Using Flink
Using Flink from Scratch
Viewing Flink Job Information
Flink Configuration Management
Configuring Parameter Paths
JobManager & TaskManager
Blob
Distributed Coordination (via Akka)
SSL
Network communication (via Netty)
JobManager Web Frontend
File Systems
State Backend
Kerberos-based Security
HA
Environment
Yarn
Pipeline
Security Configuration
Security Features
Configuring Kafka
Configuring Pipeline
Security Hardening
Authentication and Encryption
ACL Control
Web Security
Security Statement
Using the Flink Web UI
Overview
Introduction to Flink Web UI
Flink Web UI Application Process
FlinkServer Permissions Management
Overview
Authentication Based on Users and Roles
Accessing the Flink Web UI
Creating an Application on the Flink Web UI
Creating a Cluster Connection on the Flink Web UI
Creating a Data Connection on the Flink Web UI
Managing Tables on the Flink Web UI
Managing Jobs on the Flink Web UI
Flink Log Overview
Flink Performance Tuning
Optimization DataStream
Memory Configuration Optimization
Configuring DOP
Configuring Process Parameters
Optimizing the Design of Partitioning Method
Configuring the Netty Network Communication
Experience Summary
Common Flink Shell Commands
Reference
Example of Issuing a Certificate
Using Flume
Using Flume from Scratch
Overview
Installing the Flume Client
Installing the Flume Client on Clusters of Versions Earlier Than MRS 3.x
Installing the Flume Client on Clusters of MRS 3.x or a Later Version
Viewing Flume Client Logs
Stopping or Uninstalling the Flume Client
Using the Encryption Tool of the Flume Client
Flume Service Configuration Guide
Flume Configuration Parameter Description
Using Environment Variables in the properties.properties File
Non-Encrypted Transmission
Configuring Non-encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
Encrypted Transmission
Configuring the Encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Viewing Flume Client Monitoring Information
Connecting Flume to Kafka in Security Mode
Connecting Flume with Hive in Security Mode
Configuring the Flume Service Model
Overview
Service Model Configuration Guide
Introduction to Flume Logs
Flume Client Cgroup Usage Guide
Secondary Development Guide for Flume Third-Party Plug-ins
Common Issues About Flume
Using HBase
Using HBase from Scratch
Using an HBase Client
Creating HBase Roles
Configuring HBase Replication
Configuring HBase Parameters
Enabling Cross-Cluster Copy
Using the ReplicationSyncUp Tool
Using HIndex
Introduction to HIndex
Loading Index Data in Batches
Using an Index Generation Tool
Migrating Index Data
Configuring HBase DR
Configuring HBase Data Compression and Encoding
Performing an HBase DR Service Switchover
Performing an HBase DR Active/Standby Cluster Switchover
Community BulkLoad Tool
Configuring the MOB
Configuring Secure HBase Replication
Configuring Region In Transition Recovery Chore Service
Using a Secondary Index
HBase Log Overview
HBase Performance Tuning
Improving the BulkLoad Efficiency
Improving Put Performance
Optimizing Put and Scan Performance
Improving Real-time Data Write Performance
Improving Real-time Data Read Performance
Optimizing JVM Parameters
Common Issues About HBase
Why Does a Client Keep Failing to Connect to a Server for a Long Time?
Operation Failures Occur in Stopping BulkLoad On the Client
Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
How Do I Restore a Region in the RIT State for a Long Time?
Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
Insufficient Rights When a Tenant Accesses Phoenix
What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
How Do I Fix Region Overlapping?
Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
Using HDFS
Using Hadoop from Scratch
Configuring Memory Management
Creating an HDFS Role
Using the HDFS Client
Running the DistCp Command
Overview of HDFS File System Directories
Changing the DataNode Storage Directory
Configuring HDFS Directory Permission
Configuring NFS
Planning HDFS Capacity
Configuring ulimit for HBase and HDFS
Balancing DataNode Capacity
Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
Configuring the Number of Files in a Single HDFS Directory
Configuring the Recycle Bin Mechanism
Setting Permissions on Files and Directories
Setting the Maximum Lifetime and Renewal Interval of a Token
Configuring the Damaged Disk Volume
Configuring Encrypted Channels
Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
Configuring the NameNode Blacklist
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring HDFS NodeLabel
Configuring HDFS Mover
Using HDFS AZ Mover
Configuring HDFS DiskBalancer
Configuring the Observer NameNode to Process Read Requests
Performing Concurrent Operations on HDFS Files
Introduction to HDFS Logs
HDFS Performance Tuning
Improving Write Performance
Improving Read Performance Using Client Metadata Cache
Improving the Connection Between the Client and NameNode Using Current Active Cache
FAQ
NameNode Startup Is Slow
DataNode Is Normal but Cannot Report Data Blocks
HDFS WebUI Cannot Properly Update Information About Damaged Data
Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?
Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
Why Does Array Border-crossing Occur During FileInputFormat Split?
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
Can I Delete or Modify the Data Storage Directory in DataNode?
Blocks Miss on the NameNode UI After the Successful Rollback
Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
NameNode Fails to Be Restarted Due to EditLog Discontinuity
Using Hive
Using Hive from Scratch
Configuring Hive Parameters
Hive SQL
Permission Management
Hive Permission
Creating a Hive Role
Configuring Permissions for Hive Tables, Columns, or Databases
Configuring Permissions to Use Other Components for Hive
Using a Hive Client
Using HDFS Colocation to Store Hive Tables
Using the Hive Column Encryption Function
Customizing Row Separators
Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
Deleting Single-Row Records from Hive on HBase
Configuring HTTPS/HTTP-based REST APIs
Enabling or Disabling the Transform Function
Access Control of a Dynamic Table View on Hive
Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
Using Hive to Read Data in a Relational Database
Supporting Traditional Relational Database Syntax in Hive
Creating User-Defined Hive Functions
Enhancing beeline Reliability
Viewing Table Structures Using the show create Statement as Users with the select Permission
Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
Inserting Data to a Directory That Does Not Exist
Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
Disabling of Specifying the location Keyword When Creating an Internal Hive Table
Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
Authorizing Over 32 Roles in Hive
Restricting the Maximum Number of Maps for Hive Tasks
HiveServer Lease Isolation
Hive Supporting Transactions
Switching the Hive Execution Engine to Tez
Hive Materialized View
Hive Log Overview
Hive Performance Tuning
Creating Table Partitions
Optimizing Join
Optimizing Group By
Optimizing Data Storage
Optimizing SQL Statements
Optimizing the Query Function Using Hive CBO
Common Issues About Hive
How Do I Delete UDFs on Multiple HiveServers at the Same Time?
Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
How Do I Monitor the Hive Table Size?
How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Description of Hive Table Location (Either Be an OBS or HDFS Path)
Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
Why Does Hive Not Support Vectorized Query?
Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
How Do I Disable the Logging Function of Hive?
Why Hive Tables in the OBS Directory Fail to Be Deleted?
Hive Configuration Problems
Using Hue (Versions Earlier Than MRS 3.x)
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using Hue (MRS 3.x or Later)
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the SparkSql Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using HBase on the Hue Web UI
Typical Scenarios
HDFS on Hue
Configuring HDFS Cold and Hot Data Migration
Hive on Hue
Oozie on Hue
Hue Log Overview
Common Issues About Hue
How Do I Solve the Problem that HQL Fails to Be Executed in Hue Using Internet Explorer?
Why Does the use database Statement Become Invalid When Hive Is Used?
What Can I Do If HDFS Files Fail to Be Accessed Using Hue WebUI?
What Can I Do If a Large File Fails to Be Uploaded on the Hue Page?
Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
Using Impala
Using Impala from Scratch
Accessing the Impala Web UI
Using Impala to Operate Kudu
Interconnecting Impala with External LDAP
Using Kafka
Using Kafka from Scratch
Managing Kafka Topics
Querying Kafka Topics
Managing Kafka User Permissions
Managing Messages in Kafka Topics
Synchronizing Binlog-based MySQL Data to the MRS Cluster
Creating a Kafka Role
Kafka Common Parameters
Safety Instructions on Using Kafka
Kafka Specifications
Using the Kafka Client
Configuring Kafka HA and High Reliability Parameters
Changing the Broker Storage Directory
Checking the Consumption Status of Consumer Group
Kafka Balancing Tool Instructions
Balancing Data After Kafka Node Scale-Out
Kafka Token Authentication Mechanism Tool Usage
Introduction to Kafka Logs
Performance Tuning
Kafka Performance Tuning
Kafka Feature Description
Migrating Data Between Kafka Nodes
Common Issues About Kafka
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using KafkaManager
Introduction to KafkaManager
Accessing the KafkaManager Web UI
Managing Kafka Clusters
Kafka Cluster Monitoring Management
Using Kudu
Using Kudu from Scratch
Accessing the Kudu Web UI
Using Loader
Using Loader from Scratch
How to Use Loader
Loader Link Configuration
Managing Loader Links (Versions Earlier Than MRS 3.x)
Source Link Configurations of Loader Jobs
Destination Link Configurations of Loader Jobs
Managing Loader Jobs
Preparing a Driver for MySQL Database Link
Loader Log Overview
Example: Using Loader to Import Data from OBS to HDFS
Common Issues About Loader
How to Resolve the Problem that Failed to Save Data When Using Internet Explorer 10 or Internet Explorer 11 ?
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Using MapReduce
Configuring the Log Archiving and Clearing Mechanism
Reducing Client Application Failure Rate
Transmitting MapReduce Tasks from Windows to Linux
Configuring the Distributed Cache
Configuring the MapReduce Shuffle Address
Configuring the Cluster Administrator List
Introduction to MapReduce Logs
MapReduce Performance Tuning
Optimization Configuration for Multiple CPU Cores
Determining the Job Baseline
Streamlining Shuffle
AM Optimization for Big Tasks
Speculative Execution
Using Slow Start
Optimizing Performance for Committing MR Jobs
Common Issues About MapReduce
Why Does It Take a Long Time to Run a Task Upon ResourceManager Active/Standby Switchover?
Why Does a MapReduce Task Stay Unchanged for a Long Time?
Why the Client Hangs During Job Running?
Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
MapReduce Job Failed in Multiple NameService Environment
Why a Fault MapReduce Node Is Not Blacklisted?
Using Oozie
Using Oozie from Scratch
Using the Oozie Client
Using Oozie Client to Submit an Oozie Job
Submitting a Hive Job
Submitting a Spark2x Job
Submitting a Loader Job
Submitting a DistCp Job
Submitting Other Jobs
Using Hue to Submit an Oozie Job
Creating a Workflow
Submitting a Workflow Job
Submitting a Hive2 Job
Submitting a Spark2x Job
Submitting a Java Job
Submitting a Loader Job
Submitting a MapReduce Job
Submitting a Sub-workflow Job
Submitting a Shell Job
Submitting an HDFS Job
Submitting a Streaming Job
Submitting a DistCp Job
Example of Mutual Trust Operations
Submitting an SSH Job
Submitting a Hive Script
Submitting a Coordinator Periodic Scheduling Job
Submitting a Bundle Batch Processing Job
Querying the Operation Results
Oozie Log Overview
Common Issues About Oozie
Oozie Scheduled Tasks Are Not Executed on Time
Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
Common Oozie Troubleshooting Methods
Using Presto
Accessing the Presto Web UI
Using a Client to Execute Query Statements
Using Ranger (MRS 3.x)
Logging In to the Ranger Web UI
Enabling Ranger Authentication
Configuring Component Permission Policies
Viewing Ranger Audit Information
Configuring a Security Zone
Changing the Ranger Data Source to LDAP for a Normal Cluster
Viewing Ranger Permission Information
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark2x
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for Storm
Ranger Log Overview
Common Issues About Ranger
Why Ranger Startup Fails During the Cluster Installation?
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
Using Spark
Precautions
Getting Started with Spark
Getting Started with Spark SQL
Using the Spark Client
Accessing the Spark Web UI
Interconnecting Spark with OpenTSDB
Creating a Table and Associating It with OpenTSDB
Inserting Data to the OpenTSDB Table
Querying an OpenTSDB Table
Modifying the Default Configuration Data
Using Spark2x
Precautions
Basic Operation
Getting Started
Configuring Parameters Rapidly
Common Parameters
Spark on HBase Overview and Basic Applications
Spark on HBase V2 Overview and Basic Applications
SparkSQL Permission Management(Security Mode)
Spark SQL Permissions
Creating a Spark SQL Role
Configuring Permissions for SparkSQL Tables, Columns, and Databases
Configuring Permissions for SparkSQL to Use Other Components
Configuring the Client and Server
Scenario-Specific Configuration
Configuring Multi-active Instance Mode
Configuring the Multi-tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Size of the Event Queue
Configuring Executor Off-Heap Memory
Enhancing Stability in a Limited Memory Condition
Viewing Aggregated Container Logs on the Web UI
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Configuring the Default Number of Data Blocks Divided by SparkSQL
Configuring the Compression Format of a Parquet Table
Configuring the Number of Lost Executors Displayed in WebUI
Setting the Log Level Dynamically
Configuring Whether Spark Obtains HBase Tokens
Configuring LIFO for Kafka
Configuring Reliability for Connected Kafka
Configuring Streaming Reading of Driver Execution Results
Filtering Partitions without Paths in Partitioned Tables
Configuring Spark2x Web UI ACLs
Configuring Vector-based ORC Data Reading
Broaden Support for Hive Partition Pruning Predicate Pushdown
Hive Dynamic Partition Overwriting Syntax
Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
Configuring Local Disk Cache for JobHistory
Configuring Spark SQL to Enable the Adaptive Execution Feature
Configuring Event Log Rollover
Adapting to the Third-party JDK When Ranger Is Used
Spark2x Logs
Obtaining Container Logs of a Running Spark Application
Small File Combination Tools
Using CarbonData for First Query
Spark2x Performance Tuning
Spark Core Tuning
Data Serialization
Optimizing Memory Configuration
Setting the DOP
Using Broadcast Variables
Using the external shuffle service to improve performance
Configuring Dynamic Resource Scheduling in Yarn Mode
Configuring Process Parameters
Designing the Direction Acyclic Graph (DAG)
Experience
Spark SQL and DataFrame Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the INSERT...SELECT Operation
Multiple JDBC Clients Concurrently Connecting to JDBCServer
Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
Optimizing SQL Query of Data of Multiple Sources
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Tuning
Common Issues About Spark2x
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
Why the Operation Fails When the Table Name Is TABLE?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
What Should I Do If the JDBCServer Process is Mistakenly Killed During a Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Are Some Functions Not Available when Another JDBCServer Is Connected?
Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
Spark Streaming
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Why Is not an Application Displayed When I Run the Application with the Empty Part File?
Why Does Spark2x Fail to Export a Table with the Same Field Name?
Why JRE fatal error after running Spark application multiple times?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native Spark2x UI
How Does Spark2x Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
What Should I Do If the Native Page of an Application of Spark2x JobHistory Fails to Display During Access to the Page
Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
Spark Shuffle Exception Handling
Using Sqoop
Using Sqoop from Scratch
Adapting Sqoop 1.4.7 to MRS 3.x Clusters
Common Sqoop Commands and Parameters
Common Issues About Sqoop
What Should I Do If Class QueryProvider Is Unavailable?
What Should I Do If PostgreSQL or GaussDB Failed to Be Connected?
What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
What Should I Do If Data Failed to Be Synchronized Using hive-table?
What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
Using Storm
Using Storm from Scratch
Using the Storm Client
Submitting Storm Topologies on the Client
Accessing the Storm Web UI
Managing Storm Topologies
Querying Storm Topology Logs
Storm Common Parameters
Configuring a Storm Service User Password Policy
Migrating Storm Services to Flink
Overview
Completely Migrating Storm Services
Performing Embedded Service Migration
Migrating Services of External Security Components Interconnected with Storm
Storm Log Introduction
Performance Tuning
Storm Performance Tuning
Using Tez
Precautions
Common Tez Parameters
Accessing TezUI
Log Overview
Common Issues
TezUI Cannot Display Tez Task Execution Details
Error Occurs When a User Switches to the Tez Web UI
Yarn Logs Cannot Be Viewed on the TezUI Page
Table Data Is Empty on the TezUI HiveQueries Page
Using Yarn
Common YARN Parameters
Creating Yarn Roles
Using the YARN Client
Configuring Resources for a NodeManager Role Instance
Changing NodeManager Storage Directories
Configuring Strict Permission Control for Yarn
Configuring Container Log Aggregation
Using CGroups with YARN
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring the Access Channel Protocol
Configuring Memory Usage Detection
Configuring the Additional Scheduler WebUI
Configuring Yarn Restart
Configuring ApplicationMaster Work Preserving
Configuring the Localized Log Levels
Configuring Users That Run Tasks
Yarn Log Overview
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Common ZooKeeper Parameters
Using a ZooKeeper Client
Configuring the ZooKeeper Permissions
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Accessing Manager
Accessing MRS Manager (Versions Earlier Than MRS 3.x)
Accessing FusionInsight Manager (MRS 3.x or Later)
Using an MRS Client
Installing a Client (Version 3.x or Later)
Installing a Client (Versions Earlier Than 3.x)
Updating a Client (Version 3.x or Later)
Updating a Client (Versions Earlier Than 3.x)
Security Description
Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled
Security Authentication Principles and Mechanisms
High-Risk Operations Overview
FAQs
MRS Overview
What Is MRS Used For?
What Types of Distributed Storage Does MRS Support?
How Do I Create an MRS Cluster Using a Custom Security Group?
How Do I Use MRS?
How Does MRS Ensure Security of Data and Services?
Can I Configure a Phoenix Connection Pool?
Does MRS Support Change of the Network Segment?
Can I Downgrade the Specifications of an MRS Cluster Node?
What Is the Relationship Between Hive and Other Components?
Does an MRS Cluster Support Hive on Spark?
What Are the Differences Between Hive Versions?
Which MRS Cluster Version Supports Hive Connection and User Synchronization?
What Are the Differences Between OBS and HDFS in Data Storage?
How Do I Obtain the Hadoop Pressure Test Tool?
What Is the Relationship Between Impala and Other Components?
Statement About the Public IP Addresses in the Open-Source Third-Party SDK Integrated by MRS
What Is the Relationship Between Kudu and HBase?
Does MRS Support Running Hive on Kudu?
What Are the Solutions for processing 1 Billion Data Records?
Can I Change the IP address of DBService?
Can I Clear MRS sudo Logs?
Is the Storm Log also limited to 20 GB in MRS cluster 2.1.0?
What Is Spark ThriftServer?
What Access Protocols Are Supported by Kafka?
What Is the Compression Ratio of zstd?
Why Are the HDFS, YARN, and MapReduce Components Unavailable When an MRS Cluster Is Created?
Why Is the ZooKeeper Component Unavailable When an MRS Cluster Is Created?
Which Python Versions Are Supported by Spark Tasks in an MRS 3.1.0 Cluster?
How Do I Enable Different Service Programs to Use Different YARN Queues?
Differences and Relationships Between the MRS Management Console and Cluster Manager
How Do I Unbind an EIP from an MRS Cluster Node?
Account and Password
What Is the Account for Logging In to Manager?
How Do I Query and Change the Password Validity Period of an Account?
Accounts and Permissions
Does an MRS Cluster Support Access Permission Control If Kerberos Authentication Is not Enabled?
How Do I Assign Tenant Management Permission to a New Account?
How Do I Customize an MRS Policy?
Why Is the Manage User Function Unavailable on the System Page on MRS Manager?
Does Hue Support Account Permission Configuration?
Client Usage
How Do I Configure Environment Variables and Run Commands on a Component Client?
How Do I Disable ZooKeeper SASL Authentication?
An Error Is Reported When the kinit Command Is Executed on a Client Node Outside an MRS Cluster
Web Page Access
How Do I Change the Session Timeout Duration for an Open Source Component Web UI?
Why Cannot I Refresh the Dynamic Resource Plan Page on MRS Tenant Tab?
What Do I Do If the Kafka Topic Monitoring Tab Is Unavailable on Manager?
How Do I Do If an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of HDFS, Hue, YARN, and Flink?
Alarm Monitoring
In an MRS Streaming Cluster, Can the Kafka Topic Monitoring Function Send Alarm Notifications?
Where Can I View the Running Resource Queues When the Alarm "ALM-18022 Insufficient Yarn Queue Resources" Is Reported?
How Do I Understand the Multi-Level Chart Statistics in the HBase Operation Requests Metric?
Performance Tuning
Does an MRS Cluster Support System Reinstallation?
Can I Change the OS of an MRS Cluster?
How Do I Improve the Resource Utilization of Core Nodes in a Cluster?
How Do I Stop the Firewall Service?
Job Development
How Do I Get My Data into OBS or HDFS?
What Types of Spark Jobs Can Be Submitted in a Cluster?
Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
How Do I View MRS Job Logs?
How Do I Do If the Message "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page." Is Displayed?
LauncherJob Job Execution Is Failed And the Error Message "jobPropertiesMap is null." Is Displayed
How Do I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
How Do I Do If a SparkStreaming Job Fails After Being Executed Dozens of Hours and the OBS Access 403 Error is Reported?
How Do I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
How Do I Do If Error Message "java.io.IOException: Connection reset by peer" Is Displayed During the Execution of a Spark Job?
How Do I Do If Error Message "requestId=4971883851071737250" Is Displayed When a Spark Job Accesses OBS?
Why DataArtsStudio Occasionally Fail to Schedule Spark Jobs and the Rescheduling also Fails?
How Do I Do If a Flink Job Fails to Execute and the Error Message "java.lang.NoSuchFieldError: SECURITY_SSL_ENCRYPT_ENABLED" Is Displayed?
Why Submitted Yarn Job Cannot Be Viewed on the Web UI?
How Do I Modify the HDFS NameSpace (fs.defaultFS) of an Existing Cluster?
How Do I Do If the launcher-job Queue Is Stopped by YARN due to Insufficient Heap Size When I Submit a Flink Job on the Management Plane?
How Do I Do If the Error Message "slot request timeout" Is Displayed When I Submit a Flink Job?
Data Import and Export of DistCP Jobs
Cluster Upgrade/Patching
Can I Upgrade an MRS Cluster?
Can I Change the MRS Cluster Version?
Cluster Access
Can I Switch Between the Two Login Modes of MRS?
How Can I Obtain the IP Address and Port Number of a ZooKeeper Instance?
How Do I Access an MRS Cluster from a Node Outside the Cluster?
Big Data Service Development
Can MRS Run Multiple Flume Tasks at a Time?
How Do I Change FlumeClient Logs to Standard Logs?
Where Are the .jar Files and Environment Variables of Hadoop Located?
What Compression Algorithms Does HBase Support?
Can MRS Write Data to HBase Through the HBase External Table of Hive?
How Do I View HBase Logs?
How Do I Set the TTL for an HBase Table?
How Do I Balance HDFS Data?
How Do I Change the Number of HDFS Replicas?
What Is the Port for Accessing HDFS Using Python?
How Do I Modify the HDFS Active/Standby Switchover Class?
What Is the Recommended Number Type of DynamoDB in Hive Tables?
Can the Hive Driver Be Interconnected with DBCP2?
How Do I View the Hive Table Created by Another User?
Can I Export the Query Result of Hive Data?
How Do I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
How Do I Do If a "hivesql/hivescript" Job Fails to Submit After Hive Is Added?
What If an Excel File Downloaded on Hue Failed to Open?
How Do I Do If Sessions Are Not Released After Hue Connects to HiveServer and the Error Message "over max user connections" Is Displayed?
How Do I Reset Kafka Data?
How Do I Obtain the Client Version of MRS Kafka?
What Access Protocols Are Supported by Kafka?
How Do I Do If Error Message "Not Authorized to access group xxx" Is Displayed When a Kafka Topic Is Consumed?
What Compression Algorithms Does Kudu Support?
How Do I View Kudu Logs?
How Do I Handle the Kudu Service Exceptions Generated During Cluster Creation?
Does OpenTSDB Support Python APIs?
How Do I Configure Other Data Sources on Presto?
How Do I Connect to Spark Shell from MRS?
How Do I Connect to Spark Beeline from MRS?
Where Are the Execution Logs of Spark Jobs Stored?
How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
How Do I Check Whether the ResourceManager Configuration of Yarn Is Correct?
How Do I Modify the allow_drop_detached Parameter of ClickHouse?
How Do I Do If an Alarm Indicating Insufficient Memory Is Reported During Spark Task Execution?
How Do I Do If ClickHouse Consumes Excessive CPU Resources?
How Do I Enable the Map Type on ClickHouse?
A Large Number of OBS APIs Are Called When Spark SQL Accesses Hive Partitioned Tables
API
How Do I Configure the node_id Parameter When Using the API for Adjusting Cluster Nodes?
Cluster Management
How Do I View All Clusters?
How Do I View Log Information?
How Do I View Cluster Configuration Information?
How Do I Install Kafka and Flume in an MRS Cluster?
How Do I Stop an MRS Cluster?
Can I Expand Data Disk Capacity for MRS?
Can I Add Components to an Existing Cluster?
Can I Delete Components Installed in an MRS Cluster?
Can I Change MRS Cluster Nodes on the MRS Console?
How Do I Shield Cluster Alarm/Event Notifications?
Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
How Do I Configure the knox Memory?
What Is the Python Version Installed for an MRS Cluster?
How Do I View the Configuration File Directory of Each Component?
How Do I Do If the Time on MRS Nodes Is Incorrect?
How Do I Query the Startup Time of an MRS Node?
How Do I Do If Trust Relationships Between Nodes Are Abnormal?
How Do I Adjust the Memory Size of the manager-executor Process?
Kerberos Usage
How Do I Change the Kerberos Authentication Status of a Created MRS Cluster?
What Are the Ports of the Kerberos Authentication Service?
How Do I Deploy the Kerberos Service in a Running Cluster?
How Do I Access Hive in a Cluster with Kerberos Authentication Enabled?
How Do I Access Presto in a Cluster with Kerberos Authentication Enabled?
How Do I Access Spark in a Cluster with Kerberos Authentication Enabled?
How Do I Prevent Kerberos Authentication Expiration?
Metadata Management
Where Can I View Hive Metadata?
Troubleshooting
Accessing the Web Pages
Failed to Access MRS Manager
Failed to Log In to MRS Manager After the Python Upgrade
Failed to Log In to MRS Manager After Changing the Domain Name
A Blank Page Is Displayed Upon Login to Manager
Failed to Download Authentication Credentials When the Username Is Too Long
Cluster Management
Failed to Reduce Task Nodes
Adding a New Disk to an MRS Cluster
Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
Replacing a Disk in an MRS Cluster (Applicable to 3.x)
MRS Backup Failure
Inconsistency Between df and du Command Output on the Core Node
Disassociating a Subnet from the ACL Network
MRS Becomes Abnormal After hostname Modification
DataNode Restarts Unexpectedly
Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
Failed to Download the MRS Cluster Client
Failed to Scale Out an MRS Cluster
Error Occurs When MRS Executes the Insert Command Using Beeline
How Do I Upgrade EulerOS to Fix Vulnerabilities in an MRS Cluster?
Using CDM to Migrate Data to HDFS
Alarms Are Frequently Generated in the MRS Cluster
Memory Usage of the PMS Process Is High
High Memory Usage of the Knox Process
It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
How Do I Locate a Job Submission Failure?
OS Disk Space Is Insufficient Due to Oversized HBase Log Files
Failed to Delete a New Tenant on FusionInsight Manager
Using Alluixo
Error Message "Does not contain a valid host:port authority" Is Reported When Alluixo Is in HA Mode
Using ClickHouse
ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
Using DBService
DBServer Instance Is in Abnormal Status
DBServer Instance Remains in the Restoring State
Default Port 20050 or 20051 Is Occupied
DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
DBService Backup Failure
Components Failed to Connect to DBService in Normal State
DBServer Failed to Start
DBService Backup Failed Because the Floating IP Address Is Unreachable
DBService Failed to Start Due to the Loss of the DBService Configuration File
Using Flink
"IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
"IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
Flink Service Program Fails to Read Files on the NFS Disk
Failed to Customize the Flink Log4j Log Level
Using Flume
Class Cannot Be Found After Flume Submits Jobs to Spark Streaming
Failed to Install a Flume Client
A Flume Client Cannot Connect to the Server
Flume Data Fails to Be Written to the Component
Flume Server Process Fault
Flume Data Collection Is Slow
Failed to Start Flume
Using HBase
Slow Response to HBase Connection
Failed to Authenticate the HBase User
RegionServer Failed to Start Because the Port Is Occupied
HBase Failed to Start Due to Insufficient Node Memory
HBase Service Unavailable Due to Poor HDFS Performance
HBase Failed to Start Due to Inappropriate Parameter Settings
RegionServer Failed to Start Due to Residual Processes
HBase Failed to Start Due to a Quota Set on HDFS
HBase Failed to Start Due to Corrupted Version Files
High CPU Usage Caused by Zero-Loaded RegionServer
HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
RegionServer Instance Is in the Restoring State
HBase Failed to Start in a Newly Installed Cluster
HBase Failed to Start Due to the Loss of the ACL Table Directory
HBase Failed to Start After the Cluster Is Powered Off and On
Failed to Import HBase Data Due to Oversized File Blocks
Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
Failed to Run the hbase shell Command on the MRS Cluster Client
Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
HBase Failed to Start Due to Insufficient RegionServer Memory
Using HDFS
All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
Failed to Use Python to Remotely Connect to the Port of HDFS
HDFS Capacity Usage Reaches 100%, Causing Unavailable Upper-layer Services Such as HBase and Spark
An Error Is Reported During HDFS and Yarn Startup
HDFS Permission Setting Error
A DataNode of HDFS Is Always in the Decommissioning State
HDFS Failed to Start Due to Insufficient Memory
A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
Common File Read/Write Faults
Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
A Client File Fails to Be Closed After Data Writing
File Fails to Be Uploaded to HDFS Due to File Errors
After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
Failed to Read Files, and "FileNotFoundException" Is Displayed
Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
Adjusting the Log Level of the Shell Client
File Read Fails, and "No common protection layer" Is Displayed
Failed to Write Files Because the HDFS Directory Quota Is Insufficient
Balancing Fails, and "Source and target differ in block-size" Is Displayed
A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
Uneven Data Distribution Due to Non-HDFS Data Residuals
Uneven Data Distribution Due to the Client Installation on the DataNode
Handling Unbalanced DataNode Disk Usage on Nodes
Locating Common Balance Problems
HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
HDFS Client Failed to Delete Overlong Directories
An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
Using Hive
Content Recorded in Hive Logs
Causes of Hive Startup Failure
"Cannot modify xxx at runtime" Is Reported When the set Command Is Executed in a Security Cluster
How to Specify a Queue When Hive Submits a Job
How to Set Map and Reduce Memory on the Client
Specifying the Output File Compression Format When Importing a Table
desc Table Cannot Be Completely Displayed
NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
A Newly Created User Has No Query Permissions
An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
An Error Is Reported When the "load data inpath" Command Is Executed
An Error Is Reported When the "load data local inpath" Command Is Executed
An Error Is Reported When the "create external table" Command Is Executed
An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
Insufficient Permissions to Execute the set role admin Command
An Error Is Reported When UDF Is Created Using Beeline
Difference Between Hive Service Health Status and Hive Instance Health Status
Hive Alarms and Triggering Conditions
"authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
Failed to Access ZooKeeper from the Client
"Invalid function" Is Displayed When a UDF Is Used
Hive Service Status Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Concerning
Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
Hive Task Failed to Run on the Client But Successful on Yarn
An Error Is Reported When the select Statement Is Executed
Failed to Drop a Large Number of Partitions
Failed to Start a Local Task
Failed to Start WebHCat
Sample Code Error for Hive Secondary Development After Domain Switching
MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
"Failed to execute session hooks: over max connections" Reported by Beeline
beeline Reports the "OutOfMemoryError" Error
Task Execution Fails Because the Input File Number Exceeds the Threshold
Task Execution Fails Because of Stack Memory Overflow
Task Failed Due to Concurrent Writes to One Table or Partition
Hive Task Failed Due to a Lack of HDFS Directory Permission
Failed to Load Data to Hive Tables
HiveServer and HiveHCat Process Faults
An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
Timeout Reported When Adding the Hive Table Field
Failed to Restart the Hive Service
Hive Failed to Delete a Table
An Error Is Reported When msck repair table table_name Is Run on Hive
How Do I Release Disk Space After Dropping a Table in Hive?
Connection Timeout During SQL Statement Execution on the Client
WebHCat Failed to Start Due to Abnormal Health Status
WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
Using Hue
A Job Is Running on Hue
HQL Fails to Be Executed on Hue Using Internet Explorer
Hue (Active) Cannot Open Web Pages
Failed to Access the Hue Web UI
HBase Tables Cannot Be Loaded on the Hue Web UI
Using Impala
Failed to Connect to impala-shell
Failed to Create a Kudu Table
Failed to Log In to the Impala Client
Using Kafka
An Error Is Reported When Kafka Is Run to Obtain a Topic
Flume Normally Connects to Kafka But Fails to Send Messages
Producer Failed to Send Data and Threw "NullPointerException"
Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
Consumer Fails to Consume Data and Remains in the Waiting State
SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
Checking Whether Data Consumed by a Customer Is Lost
Failed to Start a Component Due to Account Lock
Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
Kafka Topics Cannot Be Deleted
Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
Consumer Repeatedly Consumes Data
Leader for the Created Kafka Topic Partition Is Displayed as none
Safety Instructions on Using Kafka
Obtaining Kafka Consumer Offset Information
Adding or Deleting Configurations for a Topic
Reading the Content of the __consumer_offsets Internal Topic
Configuring Logs for Shell Commands on the Client
Obtaining Topic Distribution Information
Kafka HA Usage Description
Kafka Producer Writes Oversized Records
Kafka Consumer Reads Oversized Records
High Usage of Multiple Disks on a Kafka Cluster Node
Using Oozie
Oozie Jobs Do Not Run When a Large Number of Jobs Are Submitted Concurrently
Using Presto
During sql-standard-with-group Configuration, a Schema Fails to Be Created and the Error Message "Access Denied" Is Displayed
The Presto coordinator cannot be started properly.
An Error Is Reported When Presto Is Used to Query a Kudu Table
No Data is Found in the Hive Table Using Presto
Using Spark
An Error Occurs When the Split Size Is Changed in a Spark Application
An Error Is Reported When Spark Is Used
A Spark Job Fails to Run Due to Incorrect JAR File Import
A Spark Job Is Pending Due to Insufficient Memory
An Error Is Reported During Spark Running
Executor Memory Reaches the Threshold Is Displayed in Driver
Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
ApplicationMaster Failed to Start Twice in Yarn-client Mode
Failed to Connect to ResourceManager When a Spark Task Is Submitted
DataArts Studio Failed to Schedule Spark Jobs
Submission Status of the Spark Job API Is Error
Alarm 43006 Is Repeatedly Generated in the Cluster
Failed to Create or Delete a Table in Spark Beeline
Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
Large Number of Shuffle Results Are Lost During Spark Task Execution
Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
Spark Task Submission Failure
Spark Task Execution Failure
JDBCServer Connection Failure
Failed to View Spark Task Logs
Authentication Fails When Spark Connects to Other Services
An Error Occurs When Spark Connects to Redis
An Error Is Reported When spark-beeline Is Used to Query a Hive View
Using Sqoop
Connecting Sqoop to MySQL
Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
Failed to Export HBase Data to HDFS Through Hue's Sqoop Task
A Format Error Is Reported When Sqoop Is Used to Export Data from Hive to MySQL 8.0
An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
Using Storm
Invalid Hyperlink of Events on the Storm UI
Failed to Submit a Topology
Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
The Worker Log Is Empty After a Topology Is Submitted
Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
"well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
The GC Parameter Configured for the Service Topology Does Not Take Effect
Internal Server Error Is Displayed When the User Queries Information on the UI
Using Ranger
After Ranger Authentication Is Enabled for Hive, Unauthorized Tables and Databases Can Be Viewed on the Hue Page
Using Yarn
Plenty of Jobs Are Found After Yarn Is Started
"GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
Temporary Files Are Not Deleted When an MR Job Is Abnormal
ResourceManager of Yarn (Port 8032) Throws Error "connection refused"
Failed to View Job Logs on the Yarn Web UI
An Error Is Reported When a Queue Name Is Clicked on the Yarn Page
Using ZooKeeper
Accessing ZooKeeper from an MRS Cluster
Accessing OBS
When Using the MRS Multi-user Access to OBS Function, a User Does Not Have the Permission to Access the /tmp Directory
When the Hadoop Client Is Used to Delete Data from OBS, It Does Not Have the Permission for the .Trash Directory
Appendix
Precautions for MRS 3.x
API Reference (Kuala Lumpur Region)
Before You Start
Overview
API Calling
Endpoints
Constraints
Concepts
Selecting an API Type
API Overview
Calling APIs
Making an API Request
Authentication
Response
Application Cases
Creating an MRS Cluster
Scaling Out a Cluster
Scaling in a Cluster
Creating a Job
Terminating a Job
Terminating a Cluster
API V2
Cluster Management APIs
Creating Clusters
Job Object APIs
Adding and Executing a Job
Querying Information About a Job
Querying a List of Jobs
Terminating a Job
Deleting Jobs in Batches
Obtain the SQL Result
SQL APIs
Submitting an SQL Statement
Querying SQL Results
Cancel an SQL Execution Task
Cluster HDFS File API
Obtaining Files from a Specified Directory
Agency Management
Querying the Mapping Between a User (Group) and an IAM Agency
Updating the Mapping Between a User (Group) and an IAM Agency
API V1.1
Cluster Management APIs
Creating a Cluster and Running a Job
Resizing a Cluster
Querying a Cluster List
Deleting a Cluster
Querying Cluster Details
Querying a Host List
Job Object APIs
Adding a Job and Executing the Job
Querying the exe Object List of Jobs
Querying exe Object Details
Job Execution Object APIs
Deleting a Job Execution Object
Auto Scaling APIs
Configuring an Auto Scaling Rule
Tag Management APIs
Adding a Tag to a Specified Cluster
Deleting a Tag of a Specified Cluster
Querying Tags of a Specified Cluster
Adding or Deleting Cluster Tags in Batches
Querying All Tags
Querying a List of Clusters with Specified Tags
Permissions Policies and Supported Actions
Introduction
Appendix
Status Codes
Obtaining a Project ID
Obtaining Account IDObtaining Account IDObtaining Tenant ID
Obtaining the MRS Cluster Information
Roles and components supported by MRS
User Guide (Ankara Region)
Overview
What Is MRS?
Application Scenarios
Components
CarbonData
ClickHouse
Containers
ALB Basic Principles
Containers Basic Principles
Containers Enhanced Features
CDL
CDL Basic Principles
Relationship Between CDL and Other Components
DBService
DBService Basic Principles
Relationship Between DBService and Other Components
Doris
Basic Principles
Relationship with Other Components
Elasticsearch
Elasticsearch Basic Principles
Relationship with Other Components
Elasticsearch Enhanced Open Source Features
Flink
Flink Basic Principles
Flink HA Solution
Relationship Between Flink and Other Components
Flink Enhanced Open Source Features
Window
Job Pipeline
Stream SQL Join
Flink CEP in SQL
Flume
Flume Basic Principles
Relationship Between Flume and Other Components
Flume Enhanced Open Source Features
FTP-Server
FTP-Server Basic Principles
Relationship with Components
FTP-Server Enhanced Open Source Features
GraphBase
GraphBase Basic Principles
GraphBase Key Features
Relationship Between GraphBase and Other Components
Guardian
HBase
HBase Basic Principles
HBase HA Solution
Relationship with Other Components
HBase Enhanced Open Source Features
HDFS
HDFS Basic Principles
HDFS HA Solution
Relationship Between HDFS and Other Components
HDFS Enhanced Open Source Features
HetuEngine
HetuEngine Product Overview
Relationship Between HetuEngine and Other Components
Hive
Hive Basic Principles
Hive CBO Principles
Relationship Between Hive and Other Components
Enhanced Open Source Feature
Hudi
Hue
Hue Basic Principles
Relationship Between Hue and Other Components
Hue Enhanced Open Source Features
IoTDB
IoTDB Basic Principles
Relationship Between IoTDB and Other Components
IoTDB Enhanced Open Source Features
Kafka
Kafka Basic Principles
Relationship Between Kafka and Other Components
Kafka Enhanced Open Source Features
KafkaManager
KMS
KMS Basic Principles
Relationship Between KMS and Other Components
KrbServer and LdapServer
KrbServer and LdapServer Principles
KrbServer and LdapServer Enhanced Open Source Features
LakeSearch
LakeSearch Basic Principles
Relationship with Other Components
Loader
Loader Basic Principles
Relationship Between Loader and Other Components
Loader Enhanced Open Source Features
Manager
Manager Basic Principles
Manager Key Features
MapReduce
MapReduce Basic Principles
Relationship Between MapReduce and Other Components
MapReduce Enhanced Open Source Features
MemArtsCC
MemArtsCC Basic Principles
Relationships Between MemArtsCC and Other Components
Metadata
Metadata Basic Principles
Relationship Between Metadata and Other Components
Metadata Enhanced Open Source Features
MOTService
MOTService Basic Principles
MOTService Enhanced Features
Oozie
Oozie Basic Principles
Oozie Enhanced Open Source Features
Ranger
Ranger Basic Principles
Relationship Between Ranger and Other Components
Redis
Redis Basic Principles
Redis Enhanced Open Source Features
RTDService
RTDService Basic Principles
RTDService Enhanced Features
Solr
Solr Basic Principle
Solr Relationship with Other Components
Solr Enhanced Open Source Features
Spark
Spark Basic Principles
Spark HA Solution
Spark Multi-active Instance
Spark Multi-tenant
Relationship Between Spark and Other Components
Spark Open Source New Features
Spark Enhanced Open Source Features
CarbonData Overview
Optimizing SQL Query of Data of Multiple Sources
Tez
YARN
YARN Basic Principles
YARN HA Solution
Relationship Between YARN and Other Components
Yarn Enhanced Open Source Features
ZooKeeper
ZooKeeper Basic Principles
Relationship Between ZooKeeper and Other Components
ZooKeeper Enhanced Open Source Features
Functions
Multi-tenant
Security Hardening
Easy Access to Web UIs of Components
Reliability Enhancement
Job Management
Bootstrap Actions
Metadata
Cluster Management
Cluster Lifecycle Management
Cluster Scaling
Auto Scaling
Task Node Creation
Isolating a Host
Managing Tags
Cluster O&M
Message Notification
Constraints
Permissions Management
Related Services
Preparing a User
Creating an MRS User
Creating a Custom Policy
Synchronizing IAM Users to MRS
Getting Started
How to Use MRS
Creating a Cluster
Uploading Data
Creating a Job
Terminating a Cluster
Configuring a Cluster
How to Create an MRS Cluster
Quick Configuration
Quickly Creating a Hadoop Analysis Cluster
Quickly Creating an HBase Query Cluster
Quickly Creating a ClickHouse Cluster
Quickly Creating a Real-time Analysis Cluster
Creating a Custom Cluster
Configuring Custom Topology
Adding a Tag to a Cluster/Node
Communication Security Authorization
Configuring Auto Scaling Rules
Overview
Configuring Auto Scaling During Cluster Creation
Creating an Auto Scaling Policy for an Existing Cluster
Scenario 1: Using Auto Scaling Rules Alone
Scenario 2: Using Resource Plans Alone
Scenario 3: Using Both Auto Scaling Rules and Resource Plans
Modifying an Auto Scaling Policy
Deleting an Auto Scaling Policy
Enabling or Disabling an Auto Scaling Policy
Viewing an Auto Scaling Policy
Configuring Automation Scripts
Configuring Auto Scaling Metrics
Managing Data Connections
Configuring Data Connections
Configuring an RDS Data Connection
Configuring an RDS Data Connection
Configuring a Ranger Data Connection
Configuring a Hive Data Connection
Installing Third-Party Software Using Bootstrap Actions
Viewing Failed MRS Tasks
Viewing Information of a Historical Cluster
Managing Clusters
Logging In to a Cluster
MRS Cluster Node Overview
Logging In to an ECS
Determining Active and Standby Management Nodes
Cluster Overview
Cluster List
Checking the Cluster Status
Viewing Basic Cluster Information
Managing Components and Monitoring Hosts
Viewing and Customizing Cluster Monitoring Metrics
Cluster O&M
Importing and Exporting Data
Changing the Subnet of a Cluster
Configuring Message Notification
Remote O&M
Authorizing O&M
Sharing Logs
Viewing MRS Operation Logs
Deleting a Cluster
Managing Nodes
Scaling Out a Cluster
Scaling In a Cluster
Removing ClickHouseServer Instance Nodes
Constraints on ClickHouseServer Scale-in
Scaling In ClickHouseServer Nodes
Managing a Host (Node)
Isolating a Host
Canceling Host Isolation
Job Management
Introduction to MRS Jobs
Running a MapReduce Job
Running a SparkSubmit Job
Running a HiveSQL Job
Running a SparkSql Job
Running a Flink Job
Viewing Job Configuration and Logs
Stopping a Job
Deleting a Job
Configuring Job Notification Rules
Component Management
Object Management
Viewing Configuration
Managing Services
Configuring Service Parameters
Configuring Customized Service Parameters
Synchronizing Service Configuration
Managing Role Instances
Configuring Role Instance Parameters
Synchronizing Role Instance Configuration
Decommissioning and Recommissioning a Role Instance
Starting and Stopping a Cluster
Performing Rolling Restart
Alarm Management
Viewing the Alarm List
Viewing the Event List
Viewing and Manually Clearing an Alarm
Tenant Management
Overview
Creating a Tenant
Creating a Sub-tenant
Deleting a Tenant
Managing a Tenant Directory
Restoring Tenant Data
Creating a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Configuration of a Queue
Bootstrap Actions
Introduction to Bootstrap Actions
Preparing the Bootstrap Action Script
View Execution Records
Adding a Bootstrap Action
Modifying a Bootstrap Action
Deleting a Bootstrap Action
Using an MRS Client
Installing a Client
Updating a Client
Using the Client of Each Component
Using a ClickHouse Client
Using a Flink Client
Using a Flume Client
Using an HBase Client
Using an HDFS Client
Using a Hive Client
Using a Kafka Client
Using the Oozie Client
Using a Storm Client
Using a Yarn Client
Configuring a Cluster with Decoupled Storage and Compute
MRS Storage-Compute Decoupling
Interconnecting with OBS Using the Cluster Agency Mechanism
Configuring a Storage-Compute Decoupled Cluster (Agency)
Configuring a Storage-Compute Decoupled Cluster (AK/SK)
Configuring the Policy for Clearing Component Data in the Recycle Bin
Interconnecting MRS with OBS Using an Agency
Interconnecting Flink with OBS
Interconnecting Flume with OBS
Interconnecting HDFS with OBS
Interconnecting Hive with OBS
Interconnecting MapReduce with OBS
Interconnecting Spark2x with OBS
Interconnecting Sqoop with External Storage Systems
Interconnecting Hudi with OBS
Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
Interconnecting with OBS Using the Guardian Service
Scenarios
Interconnecting the Guardian Service with OBS
Interconnecting Components with OBS Using Guardian
Interconnecting Hive with OBS
Interconnecting Flink with OBS
Interconnecting Spark with OBS
Interconnecting Hudi with OBS
Interconnecting HetuEngine with OBS
Interconnecting HDFS with OBS
Interconnecting Yarn with OBS
Interconnecting MapReduce with OBS
Accessing Web Pages of Open Source Components Managed in MRS Clusters
Web UIs of Open Source Components
Common Ports of Components
Access Through Direct Connect
EIP-based Access
Access Using a Windows ECS
Creating an SSH Channel for Connecting to an MRS Cluster and Configuring the Browser
Accessing FusionInsight Manager
FusionInsight Manager Operation Guide
Getting Started
FusionInsight Manager Introduction
Querying the FusionInsight Manager Version
Logging In to FusionInsight Manager
Logging In to the Management Node
Home Page
Overview
Managing Monitoring Metric Reports
Cluster
Cluster Management
Performing a Rolling Restart of a Cluster
Managing Expired Configurations
Downloading the Client
Modifying Cluster Attributes
Managing Cluster Configurations
Managing Static Service Pools
Static Service Resources
Configuring Cluster Static Resources
Viewing Cluster Static Resources
Managing Clients
Managing a Client
Batch Upgrading Clients
Updating the hosts File in Batches
Managing a Service
Overview
Other Service Management Operations
Service Details Page
Performing Active/Standby Switchover of a Role Instance
Resource Monitoring
Collecting Stack Information
Switching Ranger Authentication
Service Configuration
Modifying Service Configuration Parameters
Modifying Custom Configuration Parameters of a Service
Instance Management
Overview
Decommissioning and Recommissioning an Instance
Managing Instance Configurations
Viewing the Instance Configuration File
Instance Group
Managing Instance Groups
Viewing Information About an Instance Group
Configuring Instantiation Group Parameters
Hosts
Host Management Page
Viewing the Host List
Viewing the Host Dashboard
Checking Host Processes and Resources
Host Maintenance Operations
Starting and Stopping All Instances on a Host
Performing a Host Health Check
Configuring Racks for Hosts
Isolating a Host
Exporting Host Information
Resource Overview
Distribution
Trend
Cluster
Host
O&M
Alarms
Overview of Alarms and Events
Configuring Alarm Threshold
Configuring the Alarm Masking Status
Log
Log Online Search
Log Download
Perform a Health Check
Viewing a Health Check Task
Managing Health Check Reports
Modifying Health Check Configuration
Configuring Backup and Backup Restoration
Creating a Backup Task
Creating a Backup Restoration Task
Managing Backup and Backup Restoration Tasks
Audit
Overview
Configuring Audit Log Dumping
Tenant Resources
Multi-Tenancy
Overview
Technical Principles
Multi-Tenant Management
Multi-Tenant Model
Resource Overview
Dynamic Resources
Storage Resources
Multi-Tenancy Usage
Overview
Process Overview
Using the Superior Scheduler
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing Tenant Directories
Restoring Tenant Data
Deleting a Tenant
Managing Resources
Adding a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Managing Global User Policies
Using the Capacity Scheduler
Creating Tenants
Adding a Tenant
Adding a Sub-Tenant
Adding a User and Binding the User to a Tenant Role
Managing Tenants
Managing Tenant Directories
Restoring Tenant Data
Deleting a Tenant
Clearing Non-associated Queues of a Tenant
Managing Resources
Adding a Resource Pool
Modifying a Resource Pool
Deleting a Resource Pool
Configuring a Queue
Configuring the Queue Capacity Policy of a Resource Pool
Clearing Queue Configurations
Switching the Scheduler
System Configuration
Configuring Permissions
Managing Users
Creating a User
Modifying User Information
Exporting User Information
Locking a User
Unlocking a User
Deleting a User
Changing a User Password
Initializing a Password
Exporting an Authentication Credential File
Managing User Groups
Managing Roles
Security Policies
Configuring Password Policies
Configuring the Independent Attribute
Configuring Interconnections
Configuring SNMP Northbound Parameters
Configuring Syslog Northbound Parameters
Configuring Monitoring Metric Dumping
Importing a Certificate
OMS Management
Overview of the OMS Page
Modifying OMS Service Configuration Parameters
Component Management
Viewing Component Packages
Cluster Management
Configuring Client
Installing a Client
Using a Client
Updating the Configuration of an Installed Client
Cluster Mutual Trust Management
Overview of Mutual Trust Between Clusters
Changing Manager's Domain Name
Configuring Cross-Manager Mutual Trust Between Clusters
Assigning User Permissions After Cross-Cluster Mutual Trust Is Configured
Configuring Scheduled Backup of Alarm and Audit Information
Modifying the FusionInsight Manager Routing Table
Switching to the Maintenance Mode
Routine Maintenance
Log Management
About Logs
Manager Log List
Configuring the Log Level and Log File Size
Configuring the Number of Local Audit Log Backups
Viewing Role Instance Logs
Backup and Recovery Management
Introduction
Backing Up Data
Backing Up Manager Data
Backing Up CDL Data
Backing Up Containers Metadata
Backing Up ClickHouse Metadata
Backing Up ClickHouse Service Data
Backing Up DBService Data
Backing Up Flink Metadata
Backing Up HBase Metadata
Backing Up HBase Service Data
Backing Up Elasticsearch Service Data
Backing Up MOTService Service Data
Backing Up NameNode Data
Backing Up HDFS Service Data
Backing Up Hive Service Data
Backing Up IoTDB Metadata
Backing Up IoTDB Service Data
Backing Up Kafka Metadata
Backing Up Redis Data
Backing Up RTDService Metadata
Backing Up Solr Metadata
Backing Up Solr Service Data
Recovering Data
Restoring Manager Data
Restoring CDL Data
Restoring Containers Metadata
Restoring ClickHouse Metadata
Restoring ClickHouse Service Data
Restoring DBService Data
Restoring Flink Metadata
Restoring HBase Metadata
Restoring HBase Service Data
Restoring Elasticsearch Service Data
Restoring MOTService Service Data
Restoring NameNode Data
Restoring HDFS Service Data
Restoring Hive Service Data
Restoring IoTDB Metadata
Restoring IoTDB Service Data
Restoring Kafka Metadata
Restoring Redis Data
Restoring RTDService Metadata
Restoring Solr Metadata
Restoring Solr Service Data
Enabling Cross-Cluster Replication
Managing Local Quick Restoration Tasks
Modifying a Backup Task
Viewing Backup and Restoration Tasks
SQL Inspector
Overview
Adding an SQL Inspection
Configuring Hive SQL Inspection
Configuring ClickHouse SQL Inspection
Configuring HetuEngine SQL Inspection
Configuring Spark SQL Inspection
Security Management
Security Overview
Right Model
Right Mechanism
Authentication Policies
Permission Verification Policies
User Account List
Default Permission Information
FusionInsight Manager Security Functions
Account Management
Account Security Settings
Unlocking LDAP Users and Management Accounts
Internal an Internal System User
Enabling and Disabling Permission Verification on Cluster Components
Logging In to a Non-Cluster Node Using a Cluster User in Normal Mode
Changing the Password for a System User
Changing the Password for User admin
Changing the Password for an OS User
Changing the OS User Password Validity Period
Changing the Password for a System Internal User
Changing the Password for the Kerberos Administrator
Changing the Password for the OMS Kerberos Administrator
Changing the Password for a Component Running User
Changing the Password for a Database User
Changing the Password of the OMS Database Administrator
Changing the Password for the Data Access User of the OMS Database
Resetting the Component Database User Password
Resetting the Password for User omm in DBService
Changing the Password for User compdbuser of the DBService Database
Security Hardening
Hardening Policies
Configuring a Trusted IP Address to Access LDAP
HFile and WAL Encryption
Configuring Hadoop Security Parameters
Configuring an IP Address Whitelist for Modification Allowed by HBase
Updating a Key for a Cluster
Changing the Cluster Encryption Mode
Hardening the LDAP
Configuring Kafka Data Encryption During Transmission
Configuring HDFS Data Encryption During Transmission
Configuring HetuEngine Data Encryption During Transmission
Configuring RTD Data Encryption During Transmission
Configuring IoTDB Data Encryption During Transmission
ClickHouse Security Hardening
Hive Metastore Security Hardening
Configuring ZooKeeper SSL
Encrypting the Communication Between the Controller and the Agent
Updating SSH Keys for User omm
Changing the Timeout Duration of the Manager Page
Resetting Sessions During Secondary Authentication Configuration
Security Maintenance
Account Maintenance Suggestions
Password Maintenance Suggestions
Log Maintenance Suggestions
Security Statement
Alarm Reference
ALM-12001 Audit Log Dumping Failure
ALM-12004 Manager OLdap Resource Abnormal
ALM-12005 Manager OKerberos Resource Abnormal
ALM-12006 NodeAgent Process Is Abnormal
ALM-12007 Process Fault
ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes
ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes
ALM-12014 Device Partition Lost
ALM-12015 Partition Filesystem Readonly
ALM-12016 CPU Usage Exceeds the Threshold
ALM-12017 Insufficient Disk Capacity
ALM-12018 Memory Usage Exceeds the Threshold
ALM-12027 Host PID Usage Exceeds the Threshold
ALM-12028 Number of Processes in the D State on a Host Exceeds the Threshold
ALM-12033 Slow Disk Fault
ALM-12034 Periodical Backup Failure
ALM-12035 Unknown Data Status After Recovery Task Failure
ALM-12038 Monitoring Indicator Dumping Failure
ALM-12039 Active/Standby OMS Databases Not Synchronized
ALM-12040 Insufficient OS Entropy
ALM-12041 Incorrect Permission on Key Files
ALM-12042 Incorrect Configuration of Key Files
ALM-12045 Read Packet Dropped Rate Exceeds the Threshold
ALM-12046 Write Packet Dropped Rate Exceeds the Threshold
ALM-12047 Read Packet Error Rate Exceeds the Threshold
ALM-12048 Write Packet Error Rate Exceeds the Threshold
ALM-12049 Network Read Throughput Rate Exceeds the Threshold
ALM-12050 Network Write Throughput Rate Exceeds the Threshold
ALM-12051 Disk Inode Usage Exceeds the Threshold
ALM-12052 TCP Temporary Port Usage Exceeds the Threshold
ALM-12053 Host File Handle Usage Exceeds the Threshold
ALM-12054 Invalid Certificate File
ALM-12055 The Certificate File Is About to Expire
ALM-12057 Metadata Not Configured with the Task to Periodically Back Up Data to a Third-Party Server
ALM-12061 Process Usage Exceeds the Threshold
ALM-12062 OMS Parameter Configurations Mismatch with the Cluster Scale
ALM-12063 Unavailable Disk
ALM-12064 Host Random Port Range Conflicts with Cluster Used Port
ALM-12066 Trust Relationships Between Nodes Become Invalid
ALM-12067 Abnormal Tomcat Resources of Manager
ALM-12068 Abnormal ACS Resources of Manager
ALM-12069 Abnormal AOS Resources of Manager
ALM-12070 Controller Resource Is Abnormal
ALM-12071 Httpd Resource Is Abnormal
ALM-12072 FloatIP Resource Is Abnormal
ALM-12073 CEP Resource Is Abnormal
ALM-12074 FMS Resource Is Abnormal
ALM-12075 PMS Resource Is Abnormal
ALM-12076 GaussDB Resource Is Abnormal
ALM-12077 User omm Expired
ALM-12078 Password of User omm Expired
ALM-12079 User omm Is About to Expire
ALM-12080 Password of User omm Is About to Expire
ALM-12081User ommdba Expired
ALM-12082 User ommdba Is About to Expire
ALM-12083 Password of User ommdba Is About to Expire
ALM-12084 Password of User ommdba Expired
ALM-12085 Service Audit Log Dump Failure
ALM-12087 System Is in the Upgrade Observation Period
ALM-12089 Network Connections Between Nodes Are Abnormal
ALM-12099 Core Dump for Cluster Processes
ALM-12101 AZ Unhealthy
ALM-12102 AZ HA Component Is Not Deployed Based on DR Requirements
ALM-12110 Failed to get ECS temporary AK/SK
ALM-12180 Suspended Disk I/O
ALM-12190 Number of Knox Connections Exceeds the Threshold
ALM-12191 Disk I/O Usage Exceeds the Threshold
ALM-12192 Host Load Exceeds the Threshold
ALM-12200 Password Is About to Expire
ALM-12201 Process CPU Usage Exceeds the Threshold
ALM-12202 Process Memory Usage Exceeds the Threshold
ALM-12203 Process Full GC Duration Exceeds the Threshold
ALM-12204 Wait Duration of a Disk Read Exceeds the Threshold
ALM-12205 Wait Duration of a Disk Write Exceeds the Threshold
ALM-12206 Password Has Expired
ALM-13000 ZooKeeper Service Unavailable
ALM-13001 Available ZooKeeper Connections Are Insufficient
ALM-13002 ZooKeeper Direct Memory Usage Exceeds the Threshold
ALM-13003 GC Duration of the ZooKeeper Process Exceeds the Threshold
ALM-13004 ZooKeeper Heap Memory Usage Exceeds the Threshold
ALM-13005 Failed to Set the Quota of Top Directories of ZooKeeper Components
ALM-13006 Znode Number or Capacity Exceeds the Threshold
ALM-13007 Available ZooKeeper Client Connections Are Insufficient
ALM-13008 ZooKeeper Znode Usage Exceeds the Threshold
ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold
ALM-13010 Znode Usage of a Directory with Quota Configured Exceeds the Threshold
ALM-14000 HDFS Service Unavailable
ALM-14001 HDFS Disk Usage Exceeds the Threshold
ALM-14002 DataNode Disk Usage Exceeds the Threshold
ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold
ALM-14006 Number of HDFS Files Exceeds the Threshold
ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold
ALM-14008 DataNode Heap Memory Usage Exceeds the Threshold
ALM-14009 Number of Dead DataNodes Exceeds the Threshold
ALM-14010 NameService Service Is Abnormal
ALM-14011 DataNode Data Directory Is Not Configured Properly
ALM-14012 JournalNode Is Out of Synchronization
ALM-14013 Failed to Update the NameNode FsImage File
ALM-14014 NameNode GC Time Exceeds the Threshold
ALM-14015 DataNode GC Time Exceeds the Threshold
ALM-14016 DataNode Direct Memory Usage Exceeds the Threshold
ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold
ALM-14018 NameNode Non-heap Memory Usage Exceeds the Threshold
ALM-14019 DataNode Non-heap Memory Usage Exceeds the Threshold
ALM-14020 Number of Entries in the HDFS Directory Exceeds the Threshold
ALM-14021 NameNode Average RPC Processing Time Exceeds the Threshold
ALM-14022 NameNode Average RPC Queuing Time Exceeds the Threshold
ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold
ALM-14024 Tenant Space Usage Exceeds the Threshold
ALM-14025 Tenant File Object Usage Exceeds the Threshold
ALM-14026 Blocks on DataNode Exceed the Threshold
ALM-14027 DataNode Disk Fault
ALM-14028 Number of Blocks to Be Supplemented Exceeds the Threshold
ALM-14029 Number of Blocks in a Replica Exceeds the Threshold
ALM-14030 HDFS Allows Write of Single-Replica Data
ALM-14031 DataNode Process Is Abnormal
ALM-14032 JournalNode Process Is Abnormal
ALM-14033 ZKFC Process Is Abnormal
ALM-14034 Router Process Is Abnormal
ALM-14035 HttpFS Process Is Abnormal
ALM-16000 Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold
ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold
ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold
ALM-16003 Background Thread Usage Exceeds the Threshold
ALM-16004 Hive Service Unavailable
ALM-16005 The Heap Memory Usage of the Hive Process Exceeds the Threshold
ALM-16006 The Direct Memory Usage of the Hive Process Exceeds the Threshold
ALM-16007 Hive GC Time Exceeds the Threshold
ALM-16008 Non-Heap Memory Usage of the Hive Process Exceeds the Threshold
ALM-16009 Map Number Exceeds the Threshold
ALM-16045 Hive Data Warehouse Is Deleted
ALM-16046 Hive Data Warehouse Permission Is Modified
ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
ALM-16048 Tez or Spark Library Path Does Not Exist
ALM-16051 Percentage of Sessions Connected to MetaStore Exceeds the Threshold
ALM-17003 Oozie Service Unavailable
ALM-17004 Oozie Heap Memory Usage Exceeds the Threshold
ALM-17005 Oozie Non Heap Memory Usage Exceeds the Threshold
ALM-17006 Oozie Direct Memory Usage Exceeds the Threshold
ALM-17007 Garbage Collection (GC) Time of the Oozie Process Exceeds the Threshold
ALM-17008 Abnormal Connection Between Oozie and ZooKeeper
ALM-17009 Abnormal Connection Between Oozie and DBService
ALM-17010 Abnormal Connection Between Oozie and HDFS
ALM-17011 Abnormal Connection Between Oozie and Yarn
ALM-18000 Yarn Service Unavailable
ALM-18002 NodeManager Heartbeat Lost
ALM-18003 NodeManager Unhealthy
ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold
ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold
ALM-18010 ResourceManager GC Time Exceeds the Threshold
ALM-18011 NodeManager GC Time Exceeds the Threshold
ALM-18012 JobHistoryServer GC Time Exceeds the Threshold
ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold
ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold
ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold
ALM-18016 Non Heap Memory Usage of ResourceManager Exceeds the Threshold
ALM-18017 Non Heap Memory Usage of NodeManager Exceeds the Threshold
ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold
ALM-18019 Non Heap Memory Usage of JobHistoryServer Exceeds the Threshold
ALM-18020 Yarn Task Execution Timeout
ALM-18021 Mapreduce Service Unavailable
ALM-18022 Insufficient YARN Queue Resources
ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold
ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold
ALM-18025 Number of Terminated Yarn Tasks Exceeds the Threshold
ALM-18026 Number of Failed Yarn Tasks Exceeds the Threshold
ALM-19000 HBase Service Unavailable
ALM-19006 HBase Replication Sync Failed
ALM-19007 HBase GC Time Exceeds the Threshold
ALM-19008 Heap Memory Usage of the HBase Process Exceeds the Threshold
ALM-19009 Direct Memory Usage of the HBase Process Exceeds the Threshold
ALM-19011 RegionServer Region Number Exceeds the Threshold
ALM-19012 HBase System Table Directory or File Lost
ALM-19013 Duration of Regions in transaction State Exceeds the Threshold
ALM-19014 Capacity Quota Usage on ZooKeeper Exceeds the Threshold Severely
ALM-19015 Quantity Quota Usage on ZooKeeper Exceeds the Threshold
ALM-19016 Quantity Quota Usage on ZooKeeper Exceeds the Threshold Severely
ALM-19017 Capacity Quota Usage on ZooKeeper Exceeds the Threshold
ALM-19018 HBase Compaction Queue Size Exceeds the Threshold
ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold
ALM-19020 Number of HBase WAL Files to Be Synchronized Exceeds the Threshold
ALM-19022 HBase Hotspot Detection Is Unavailable
ALM-19023 Region Traffic Restriction for HBase
ALM-19024 RPC Requests P99 Latency on RegionServer Exceeds the Threshold
ALM-19025 Damaged StoreFile in HBase
ALM-19026 Damaged WAL Files in HBase
ALM-19030 P99 Latency of RegionServer RPC Request Exceeds the Threshold
ALM-19031 Number of RegionServer RPC Connections Exceeds the Threshold
ALM-19032 Number of Tasks in the RegionServer RPC Write Queue Exceeds the Threshold
ALM-19033 Number of Tasks in the RegionServer RPC Read Queue Exceeds the Threshold
ALM-19034 Number of RegionServer WAL Write Timesouts Exceeds the Threshold
ALM-19035 Size of the RegionServer Call Queue Exceeds the Threshold
ALM-20002 Hue Service Unavailable
ALM-23001 Loader Service Unavailable
ALM-23003 Loader Task Execution Failed
ALM-23004 Loader Heap Memory Usage Exceeds the Threshold
ALM-23005 Loader Non-Heap Memory Usage Exceeds the Threshold
ALM-23006 Loader Direct Memory Usage Exceeds the Threshold
ALM-23007 GC Duration of the Loader Process Exceeds the Threshold
ALM-24000 Flume Service Unavailable
ALM-24001 Flume Agent Exception
ALM-24003 Flume Client Connection Interrupted
ALM-24004 Exception Occurs When Flume Reads Data
ALM-24005 Exception Occurs When Flume Transmits Data
ALM-24006 Heap Memory Usage of Flume Server Exceeds the Threshold
ALM-24007 Flume Server Direct Memory Usage Exceeds the Threshold
ALM-24008 Flume Server Non Heap Memory Usage Exceeds the Threshold
ALM-24009 Flume Server Garbage Collection (GC) Duration Exceeds the Threshold
ALM-24010 Flume Certificate File Is Invalid or Damaged
ALM-24011 Flume Certificate File Is About to Expire
ALM-24012 Flume Certificate File Has Expired
ALM-24013 Flume MonitorServer Certificate File Is Invalid or Damaged
ALM-24014 Flume MonitorServer Certificate Is About to Expire
ALM-24015 Flume MonitorServer Certificate File Has Expired
ALM-25000 LdapServer Service Unavailable
ALM-25004 Abnormal LdapServer Data Synchronization
ALM-25005 nscd Service Exception
ALM-25006 Sssd Service Exception
ALM-25500 KrbServer Service Unavailable
ALM-25501 Too Many KerberosServer Requests
ALM-27001 DBService Is Unavailable
ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
ALM-27004 Data Inconsistency Between Active and Standby DBServices
ALM-27005 Database Connection Usage Exceeds the Threshold
ALM-27006 Data Directory Disk Usage Exceeds the Threshold
ALM-27007 Database Enters the Read-Only Mode
ALM-33004 BLU Instance Health Status of Containers Is Abnormal
ALM-33005 Maximum Number of Concurrent Containers Requests Exceeds the Threshold
ALM-33006 Failure Rate of Containers Calls Exceeds the Threshold
ALM-33007 ALB TPS of Containers Exceeds the Threshold
ALM-33008 Average Latency of Containers Exceeds the Threshold
ALM-33009 Containers Heap Memory Usage Exceeds the Threshold
ALM-33010 Containers Non-Heap Memory Usage Exceeds the Threshold
ALM-33011 Containers Metaspace Usage Exceeds the Threshold
ALM-33012 Containers' ZooKeeper Client Is Disconnected
ALM-38000 Kafka Service Unavailable
ALM-38001 Insufficient Kafka Disk Space
ALM-38002 Kafka Heap Memory Usage Exceeds the Threshold
ALM-38004 Kafka Direct Memory Usage Exceeds the Threshold
ALM-38005 GC Duration of the Broker Process Exceeds the Threshold
ALM-38006 Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold
ALM-38007 Status of Kafka Default User Is Abnormal
ALM-38008 Abnormal Kafka Data Directory Status
ALM-38009 Busy Broker Disk I/Os
ALM-38010 Topics with Single Replica
ALM-38011 User Connection Usage on Broker Exceeds the Threshold
ALM-41007 RTDService Unavailable
ALM-43001 Spark Service Unavailable
ALM-43006 Heap Memory Usage of the JobHistory Process Exceeds the Threshold
ALM-43007 Non-Heap Memory Usage of the JobHistory Process Exceeds the Threshold
ALM-43008 Direct Memory Usage of the JobHistory Process Exceeds the Threshold
ALM-43009 JobHistory Process GC Duration Exceeds the Threshold
ALM-43010 Heap Memory Usage of the JDBCServer Process Exceeds the Threshold
ALM-43011 Non-Heap Memory Usage of the JDBCServer Process Exceeds the Threshold
ALM-43012 Direct Memory Usage of the JDBCServer Process Exceeds the Threshold
ALM-43013 JDBCServer Process GC Duration Exceeds the Threshold
ALM-43017 JDBCServer Process Full GC Times Exceeds the Threshold
ALM-43018 JobHistory Process Full GC Times Exceeds the Threshold
ALM-43019 Heap Memory Usage of the IndexServer Process Exceeds the Threshold
ALM-43020 Non-Heap Memory Usage of the IndexServer Process Exceeds the Threshold
ALM-43021 Direct Memory Usage of the IndexServer Process Exceeds the Threshold
ALM-43022 IndexServer Process GC Time Exceeds the Threshold
ALM-43023 IndexServer Process Full GC Number Exceeds the Threshold
ALM-43200 Elasticsearch Service Unavailable
ALM-43201 Heap Memory Usage of Elasticsearch Exceeds the Threshold
ALM-43202 Indices in the Yellow State Exist in Elasticsearch
ALM-43203 Indices in the Red State Exist in Elasticsearch
ALM-43204 GC Duration of the Elasticsearch Process Exceeds the Threshold
ALM-43205 Elasticsearch Stored Shard Data Volume Exceeds the Threshold
ALM-43206 Elasticsearch Shard Document Number Exceeds the Threshold
ALM-43207 Elasticsearch Has Indexes Without Replicas
ALM-43208 Elasticsearch Data Directory Usage Exceeds the Threshold
ALM-43209 Total Number of Elasticsearch Instance Shards Exceeds the Threshold
ALM-43210 Total Number of Elasticsearch Shards Exceeds the Threshold
ALM-43600 GraphBase Service Unavailable
ALM-43605 Number of Real-Time Requests on a GraphBase Node Exceeds the Threshold
ALM-43607 Nginx Fault in GraphBase
ALM-43608 Floating IP Address of GraphBase Is Faulty
ALM-43609 TaskManager of GraphBase Is Faulty
ALM-43610 GC Time of the Old-Generation GraphServer Process Exceeds the Threshold
ALM-43611 Number of GC Times of the Old-Generation GraphServer Process Exceeds the Threshold
ALM-43612 GC Duration of the Young-Generation GraphServer Process Exceeds the Threshold
ALM-43613 Number of GC Times of the Young-Generation GraphServer Process Exceeds the Threshold
ALM-43614 Time Spent on a GraphBase Path Query Request Exceeds the Threshold
ALM-43615 Time Spent on a Line Expansion Query Request in GraphBase Exceeds the Threshold
ALM-43616 GraphBase-related Yarn Jobs Are Abnormal
ALM-43617 Number of Waiting Queues for Real-Time Data Import to GraphBase Exceeds the Threshold
ALM-43618 GraphServer Heap Memory Usage Exceeds the Threshold
ALM-43619 Invalid GraphBase HA Certificate Files
ALM-43620 GraphBase HA Certificates Are About to Expire
ALM-43621 GraphBase HA Certificate Files Have Expired
ALM-43850 KMS Service Unavailable
ALM-45000 HetuEngine Service Unavailable
ALM-45001 Faulty HetuEngine Compute Instances
ALM-45003 HetuEngine QAS Disk Capacity Is Insufficient
ALM-45004 Tasks Stacked on HetuEngine Compute Instance
ALM-45005 CPU Usage of HetuEngine Compute Instance Exceeded the Threshold
ALM-45006 Memory Usage of a HetuEngine Compute Instance Exceeded the Threshold
ALM-45007 Number of Workers of a HetuEngine Compute Instance Is Less Than the Threshold
ALM-45191 Failed to Obtain ECS Metadata
ALM-45192 Failed to Obtain the IAM Security Token
ALM-45275 Ranger Service Unavailable
ALM-45276 Abnormal RangerAdmin Status
ALM-45277 RangerAdmin Heap Memory Usage Exceeds the Threshold
ALM-45278 RangerAdmin Direct Memory Usage Exceeds the Threshold
ALM-45279 RangerAdmin Non-Heap Memory Usage Exceeds the Threshold
ALM-45280 RangerAdmin GC Duration Exceeds the Threshold
ALM-45281 UserSync Heap Memory Usage Exceeds the Threshold
ALM-45282 UserSync Direct Memory Usage Exceeds the Threshold
ALM-45283 UserSync Non-Heap Memory Usage Exceeds the Threshold
ALM-45284 UserSync Garbage Collection (GC) Time Exceeds the Threshold
ALM-45285 TagSync Heap Memory Usage Exceeds the Threshold
ALM-45286 TagSync Direct Memory Usage Exceeds the Threshold
ALM-45287 TagSync Non-Heap Memory Usage Exceeds the Threshold
ALM-45288 TagSync Garbage Collection (GC) Time Exceeds the Threshold
ALM-45289 PolicySync Heap Memory Usage Exceeds the Threshold
ALM-45290 PolicySync Direct Memory Usage Exceeds the Threshold
ALM-45291 PolicySync Non-Heap Memory Usage Exceeds the Threshold
ALM-45292 PolicySync GC Duration Exceeds the Threshold
ALM-45293 Ranger User Synchronization Exception
ALM-45425 ClickHouse Service Unavailable
ALM-45426 ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold
ALM-45427 ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold
ALM-45428 ClickHouse Disk I/O Exception
ALM-45429 Table Metadata Synchronization Failed on the Added ClickHouse Node
ALM-45430 Permission Metadata Synchronization Failed on the Added ClickHouse Node
ALM-45434 A Single Replica Exists in the ClickHouse Data Table
ALM-45440 Inconsistency Between ClickHouse Replicas
ALM-45441 Zookeeper Disconnected
ALM-45442 Too Many Concurrent SQL Statements
ALM-45443 Slow SQL Queries in the Cluster
ALM-45444 Abnormal ClickHouse Process
ALM-45445 Failed to Send Data Files to Remote Shards When ClickHouse Writes Data to a Distributed Table
ALM-45446 Mutation Task of ClickHouse Is Not Complete for a Long Time
ALM-45585 IoTDB Service Unavailable
ALM-45586 IoTDBServer Heap Memory Usage Exceeds the Threshold
ALM-45587 IoTDBServer GC Duration Exceeds the Threshold
ALM-45588 IoTDBServer Direct Memory Usage Exceeds the Threshold
ALM-45589 ConfigNode Heap Memory Usage Exceeds the Threshold
ALM-45590 ConfigNode GC Duration Exceeds the Threshold
ALM-45591 ConfigNode Direct Memory Usage Exceeds the Threshold
ALM-45592 IoTDBServer RPC Execution Duration Exceeds the Threshold
ALM-45593 IoTDBServer Flush Execution Duration Exceeds the Threshold
ALM-45594 IoTDBServer Intra-Space Merge Duration Exceeds the Threshold
ALM-45595 IoTDBServer Cross-Space Merge Duration Exceeds the Threshold
ALM-45596 Procedure Execution Failed
ALM-45615 CDL Service Unavailable
ALM-45616 CDL Job Execution Exception
ALM-45617 Data Queued in the CDL Replication Slot Exceeds the Threshold
ALM-45635 FlinkServer Job Execution Failure
ALM-45636 Number of Consecutive Checkpoint Failures of a Flink Job Exceeds the Threshold
ALM-45637 Continuous Back Pressure Time of a Flink Job Exceeds the Threshold
ALM-45638 Number of Restarts After Flink Job Failures Exceeds the Threshold
ALM-45639 Checkpointing of a Flink Job Times Out
ALM-45640 FlinkServer Heartbeat Interruption Between the Active and Standby Nodes
ALM-45641 Data Synchronization Exception Between the Active and Standby FlinkServer Nodes
ALM-45642 RocksDB Continuously Triggers Write Traffic Limiting
ALM-45643 MemTable Size of RocksDB Continuously Exceeds the Threshold
ALM-45644 Number of SST Files at Level 0 of RocksDB Continuously Exceeds the Threshold
ALM-45645 Pending Flush Size of RocksDB Continuously Exceeds the Threshold
ALM-45646 Pending Compaction Size of RocksDB Continuously Exceeds the Threshold
ALM-45647 Estimated Pending Compaction Size of RocksDB Continuously Exceeds the Threshold
ALM-45648 RocksDB Frequently Encounters Write-Stopped
ALM-45649 P95 Latency of RocksDB Get Requests Continuously Exceeds the Threshold
ALM-45650 P95 Latency of RocksDB Write Requests Continuously Exceeds the Threshold
ALM-45652 Flink Service Unavailable
ALM-45653 Invalid Flink HA Certificate File
ALM-45654 Flink HA Certificate Is About to Expire
ALM-45655 Flink HA Certificate File Has Expired
ALM-45736 Guardian Service Unavailable
ALM-45737 Guardian TokenServer Heap Memory Usage Exceeds the Threshold
ALM-45738 Guardian TokenServer Direct Memory Usage Exceeds the Threshold
ALM-45739 Guardian TokenServer Non-Heap Memory Usage Exceeds the Threshold
ALM-45740 Guardian TokenServer GC Duration Exceeds the Threshold
ALM-45741 Guardian Failed to Call the ECS securitykey API
ALM-45742 Guardian Failed to Call the ECS Metadata API
ALM-45743 Guardian Failed to Call the IAM API
ALM-46001 MOTService Unavailable
ALM-46003 MOTService Heartbeat Interruption Between the Active and Standby Nodes
ALM-46004 Data Inconsistency Between Active and Standby MOTService Nodes
ALM-46005 MOTService Database Connection Usage Exceeds the Threshold
ALM-46006 Disk Space Usage of the MOTService Data Directory Exceeds the Threshold
ALM-46007 MOTService Database Enters the Read-Only Mode
ALM-46008 MOTService Memory Usage Exceeds the Threshold
ALM-46009 MOTService CPU Usage Exceeds the Threshold
ALM-46010 MOTService Certificate File Is About to Expire
ALM-46011 MOTService Certificate File Has Expired
ALM-46012 Abnormal Nginx of MOTService
ALM-47000 MemArtsCC Instance Unavailable
ALM-47002 MemArtsCC Disk Fault
ALM-50201 Doris Service Unavailable
ALM-50202 FE CPU Usage Exceeds the Threshold
ALM-50203 FE Memory Usage Exceeds the Threshold
ALM-50205 BE CPU Usage Exceeds the Threshold
ALM-50206 BE Memory Usage Exceeds the Threshold
ALM-50207 Ratio of Connections to the FE MySQL Port to the Maximum Connections Allowed Exceeds the Threshold
ALM-50208 Failures to Clear Historical Metadata Image Files Exceed the Threshold
ALM-50209 Failures to Generate Metadata Image Files Exceed the Threshold
ALM-50210 Maximum Compaction Score of All BE Nodes Exceeds the Threshold
ALM-50211 FE Queue Length of BE Periodic Report Tasks Exceeds the Threshold
ALM-50212 Accumulated Old-Generation GC Duration of the FE Process Exceeds the Threshold
ALM-50213 Number of Tasks Queuing in the FE Thread Pool for Interacting with BE Exceeds the Threshold
ALM-50214 Number of Tasks Queuing in the FE Thread Pool for Task Processing Exceeds the Threshold
ALM-50215 Longest Duration of RPC Requests Received by Each FE Thrift Method Exceeds the Threshold
ALM-50216 Memory Usage of the FE Node Exceeds the Threshold
ALM-50217 Heap Memory Usage of the FE Node Exceeds the Threshold
ALM-50219 Length of the Queue in the Thread Pool for Query Execution Exceeds the Threshold
ALM-50220 Error Rate of TCP Packet Receiving Exceeds the Threshold
ALM-50221 BE Data Disk Usage Exceeds the Threshold
ALM-50222 Disk Status of a Specified Data Directory on BE Is Abnormal
ALM-50223 Maximum Memory Required by BE Is Greater Than the Remaining Memory of the Machine
ALM-50224 Failures a Certain Task Type on BE Are Increasing
ALM-50225 Unavailable FE Instances
ALM-50226 Unavailable BE Instances
ALM-50227 Concurrent Doris Tenant Queries Exceeds the Threshold
ALM-50228 Memory Usage of a Doris Tenant Exceeds the Threshold
ALM-50229 Doris FE Failed to Connect to OBS
ALM-50230 Doris BE Cannot Connect to OBS
ALM-50401 Number of JobServer Waiting Tasks Exceeds the Threshold
ALM-50402 JobGateway Service Unavailable
ALM-51201 LakeSearch Unavailable
ALM-51202 LakeSearch Heap Memory Usage Exceeds the Threshold
ALM-51203 GC Duration of the LakeSearch Instance Exceeds the Threshold
Security Description
Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled
Security Authentication Principles and Mechanisms
High-Risk Operations
Interconnecting Jupyter Notebook with MRS Using Custom Python
Overview
Installing a Client on a Node Outside the Cluster
Installing Python 3
Configuring the MRS Client
Installing Jupyter Notebook
Verifying that Jupyter Notebook Can Access MRS
FAQs
FAQs
Client Usage
How Do I Configure Environment Variables and Run Commands on a Component Client?
How Do I Disable ZooKeeper SASL Authentication?
Web Page Access
How Do I Change the Session Timeout Duration for an Open Source Component Web UI?
Why Cannot I Refresh the Dynamic Resource Plan Page on MRS Tenant Tab?
What Do I Do If the Kafka Topic Monitoring Tab Is Unavailable on Manager?
Alarm Monitoring
In an MRS Streaming Cluster, Can the Kafka Topic Monitoring Function Send Alarm Notifications?
Performance Tuning
Does an MRS Cluster Support System Reinstallation?
Can I Change the OS of an MRS Cluster?
How Do I Improve the Resource Utilization of Core Nodes in a Cluster?
How Do I Stop the Firewall Service?
Job Development
How Do I Get My Data into OBS or HDFS?
What Types of Spark Jobs Can Be Submitted in a Cluster?
Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
How Do I View MRS Job Logs?
How Do I Do If the Message "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page." Is Displayed?
LauncherJob Job Execution Is Failed And the Error Message "jobPropertiesMap is null." Is Displayed
How Do I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
How Do I Do If a SparkStreaming Job Fails After Being Executed Dozens of Hours and the OBS Access 403 Error is Reported?
How Do I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
Why Submitted Yarn Job Cannot Be Viewed on the Web UI?
How Do I Modify the HDFS NameSpace (fs.defaultFS) of an Existing Cluster?
How Do I Do If the launcher-job Queue Is Stopped by YARN due to Insufficient Heap Size When I Submit a Flink Job on the Management Plane?
Cluster Upgrade/Patching
Can I Upgrade an MRS Cluster?
Can I Change the MRS Cluster Version?
Cluster Access
Can I Switch Between the Two Login Modes of MRS?
How Can I Obtain the IP Address and Port Number of a ZooKeeper Instance?
Big Data Service Development
Can MRS Run Multiple Flume Tasks at a Time?
How Do I Change FlumeClient Logs to Standard Logs?
Where Are the .jar Files and Environment Variables of Hadoop Located?
What Compression Algorithms Does HBase Support?
Can MRS Write Data to HBase Through the HBase External Table of Hive?
How Do I View HBase Logs?
How Do I Set the TTL for an HBase Table?
How Do I Balance HDFS Data?
How Do I Change the Number of HDFS Replicas?
How Do I Modify the HDFS Active/Standby Switchover Class?
What Is the Recommended Number Type of DynamoDB in Hive Tables?
Can the Hive Driver Be Interconnected with DBCP2?
Can I Export the Query Result of Hive Data?
How Do I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
How Do I Do If a "hivesql/hivescript" Job Fails to Submit After Hive Is Added?
How Do I Reset Kafka Data?
How Do I Obtain the Client Version of MRS Kafka?
What Access Protocols Are Supported by Kafka?
How Do I Do If Error Message "Not Authorized to access group xxx" Is Displayed When a Kafka Topic Is Consumed?
What Are the Differences Between Sample Project Building and Application Development? Is Python Code Supported?
How Do I Connect to Spark Shell from MRS?
How Do I Connect to Spark Beeline from MRS?
Where Are the Execution Logs of Spark Jobs Stored?
How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
How Do I Check Whether the ResourceManager Configuration of Yarn Is Correct?
How Do I Modify the allow_drop_detached Parameter of ClickHouse?
API
How Do I Configure the node_id Parameter When Using the API for Adjusting Cluster Nodes?
Cluster Management
How Do I View All Clusters?
How Do I View Log Information?
How Do I View Cluster Configuration Information?
How Do I Install Kafka and Flume in an MRS Cluster?
How Do I Stop an MRS Cluster?
Can I Change MRS Cluster Nodes on the MRS Console?
How Do I Shield Cluster Alarm/Event Notifications?
Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
How Do I Configure the knox Memory?
What Is the Python Version Installed for an MRS Cluster?
How Do I View the Configuration File Directory of Each Component?
How Do I Do If the Time on MRS Nodes Is Incorrect?
How Do I Do If Trust Relationships Between Nodes Are Abnormal?
How Do I Adjust the Memory Size of the manager-executor Process?
Kerberos Usage
How Do I Change the Kerberos Authentication Status of a Created MRS Cluster?
What Are the Ports of the Kerberos Authentication Service?
How Do I Deploy the Kerberos Service in a Running Cluster?
How Do I Access Hive in a Cluster with Kerberos Authentication Enabled?
How Do I Access Spark in a Cluster with Kerberos Authentication Enabled?
How Do I Prevent Kerberos Authentication Expiration?
Metadata Management
Where Can I View Hive Metadata?
Troubleshooting
Accessing the Web Pages
Failed to Log In to MRS Manager After the Python Upgrade
Failed to Log In to MRS Manager After Changing the Domain Name
A Blank Page Is Displayed Upon Login to Manager
Cluster Management
Replacing a Disk in an MRS Cluster
MRS Backup Failure
Inconsistency Between df and du Command Output on the Core Node
Disassociating a Subnet from the ACL Network
MRS Becomes Abnormal After hostname Modification
DataNode Restarts Unexpectedly
Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
Failed to Download the MRS Cluster Client
Scale-Out Failure
Error Occurs When MRS Executes the Insert Command Using Beeline
Using CDM to Migrate Data to HDFS
Alarms Are Frequently Generated in the MRS Cluster
Memory Usage of the PMS Process Is High
High Memory Usage of the Knox Process
It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
How Do I Locate a Job Submission Failure?
OS Disk Space Is Insufficient Due to Oversized HBase Log Files
Using ClickHouse
ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
Using DBService
DBServer Instance Is in Abnormal Status
DBServer Instance Remains in the Restoring State
Default Port 20050 or 20051 Is Occupied
DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
DBService Backup Failure
Components Failed to Connect to DBService in Normal State
DBServer Failed to Start
DBService Backup Failed Because the Floating IP Address Is Unreachable
DBService Failed to Start Due to the Loss of the DBService Configuration File
Using Flink
"IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
"IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
Flink Service Program Fails to Read Files on the NFS Disk
Using Flume
Class Cannot Be Found After Flume Submits Jobs to Spark Streaming
Failed to Install a Flume Client
A Flume Client Cannot Connect to the Server
Flume Data Fails to Be Written to the Component
Flume Server Process Fault
Flume Data Collection Is Slow
Failed to Start Flume
Using HBase
Slow Response to HBase Connection
RegionServer Failed to Start Because the Port Is Occupied
HBase Failed to Start Due to Insufficient Node Memory
HBase Failed to Start Due to Inappropriate Parameter Settings
RegionServer Failed to Start Due to Residual Processes
HBase Failed to Start Due to a Quota Set on HDFS
HBase Failed to Start Due to Corrupted Version Files
High CPU Usage Caused by Zero-Loaded RegionServer
HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
RegionServer Instance Is in the Restoring State
HBase Failed to Start in a Newly Installed Cluster
HBase Failed to Start Due to the Loss of the ACL Table Directory
HBase Failed to Start After the Cluster Is Powered Off and On
Failed to Import HBase Data Due to Oversized File Blocks
Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
Using HDFS
All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
Failed to Use Python to Remotely Connect to the Port of HDFS
An Error Is Reported During HDFS and Yarn Startup
HDFS Permission Setting Error
A DataNode of HDFS Is Always in the Decommissioning State
HDFS Failed to Start Due to Insufficient Memory
A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
Common File Read/Write Faults
Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
File Fails to Be Uploaded to HDFS Due to File Errors
After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
Failed to Read Files, and "FileNotFoundException" Is Displayed
Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
Adjusting the Log Level of the Shell Client
File Read Fails, and "No common protection layer" Is Displayed
Failed to Write Files Because the HDFS Directory Quota Is Insufficient
Balancing Fails, and "Source and target differ in block-size" Is Displayed
A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
Uneven Data Distribution Due to Non-HDFS Data Residuals
Uneven Data Distribution Due to the Client Installation on the DataNode
Handling Unbalanced DataNode Disk Usage on Nodes
Locating Common Balance Problems
An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
Using Hive
Content Recorded in Hive Logs
Causes of Hive Startup Failure
How to Specify a Queue When Hive Submits a Job
How to Set Map and Reduce Memory on the Client
Specifying the Output File Compression Format When Importing a Table
desc Table Cannot Be Completely Displayed
NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
A Newly Created User Has No Query Permissions
An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
An Error Is Reported When the "load data inpath" Command Is Executed
An Error Is Reported When the "load data local inpath" Command Is Executed
An Error Is Reported When the "create external table" Command Is Executed
An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
Insufficient Permissions to Execute the set role admin Command
An Error Is Reported When UDF Is Created Using Beeline
Difference Between Hive Service Health Status and Hive Instance Health Status
Hive Alarms and Triggering Conditions
"authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
Failed to Access ZooKeeper from the Client
"Invalid function" Is Displayed When a UDF Is Used
Hive Service Status Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Unknown
Health Status of a HiveServer or MetaStore Instance Is Concerning
Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
Hive Task Failed to Run on the Client But Successful on Yarn
An Error Is Reported When the select Statement Is Executed
Failed to Drop a Large Number of Partitions
Failed to Start a Local Task
Failed to Start WebHCat
Sample Code Error for Hive Secondary Development After Domain Switching
MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
"Failed to execute session hooks: over max connections" Reported by Beeline
beeline Reports the "OutOfMemoryError" Error
Task Execution Fails Because the Input File Number Exceeds the Threshold
Task Execution Fails Because of Stack Memory Overflow
Task Failed Due to Concurrent Writes to One Table or Partition
Failed to Load Data to Hive Tables
HiveServer and HiveHCat Process Faults
An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
Timeout Reported When Adding the Hive Table Field
Failed to Restart the Hive Service
Hive Failed to Delete a Table
An Error Is Reported When msck repair table table_name Is Run on Hive
Using Hue
A Job Is Running on Hue
HQL Fails to Be Executed on Hue Using Internet Explorer
Hue (Active) Cannot Open Web Pages
Failed to Access the Hue Web UI
HBase Tables Cannot Be Loaded on the Hue Web UI
Using Kafka
An Error Is Reported When Kafka Is Run to Obtain a Topic
Flume Normally Connects to Kafka But Fails to Send Messages
Producer Failed to Send Data and Threw "NullPointerException"
Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
Consumer Fails to Consume Data and Remains in the Waiting State
Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
Checking Whether Data Consumed by a Customer Is Lost
Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
Consumer Repeatedly Consumes Data
Leader for the Created Kafka Topic Partition Is Displayed as none
Safety Instructions on Using Kafka
Obtaining Kafka Consumer Offset Information
Adding or Deleting Configurations for a Topic
Reading the Content of the __consumer_offsets Internal Topic
Configuring Logs for Shell Commands on the Client
Obtaining Topic Distribution Information
Kafka HA Usage Description
High Usage of Multiple Disks on a Kafka Cluster Node
Using Oozie
Oozie Jobs Do Not Run When a Large Number of Jobs Are Submitted Concurrently
Using Spark
An Error Occurs When the Split Size Is Changed in a Spark Application
An Error Is Reported When Spark Is Used
A Spark Job Fails to Run Due to Incorrect JAR File Import
An Error Is Reported During Spark Running
Executor Memory Reaches the Threshold Is Displayed in Driver
Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
ApplicationMaster Failed to Start Twice in Yarn-client Mode
Submission Status of the Spark Job API Is Error
Alarm 43006 Is Repeatedly Generated in the Cluster
Failed to Create or Delete a Table in Spark Beeline
Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
Large Number of Shuffle Results Are Lost During Spark Task Execution
Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
Spark Task Submission Failure
Spark Task Execution Failure
JDBCServer Connection Failure
Failed to View Spark Task Logs
Authentication Fails When Spark Connects to Other Services
Using Sqoop
Connecting Sqoop to MySQL
An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
Using Storm
Invalid Hyperlink of Events on the Storm UI
Failed to Submit a Topology
Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
"well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
The GC Parameter Configured for the Service Topology Does Not Take Effect
Internal Server Error Is Displayed When the User Queries Information on the UI
Using Ranger
After Ranger Authentication Is Enabled for Hive, Unauthorized Tables and Databases Can Be Viewed on the Hue Page
Using Yarn
Plenty of Jobs Are Found After Yarn Is Started
"GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
Temporary Files Are Not Deleted When a MapReduce Job Is Abnormal
Failed to View Job Logs on the Yarn Web UI
Using ZooKeeper
Accessing ZooKeeper from an MRS Cluster
Accessing OBS
When Using the MRS Multi-user Access to OBS Function, a User Does Not Have the Permission to Access the /tmp Directory
When the Hadoop Client Is Used to Delete Data from OBS, It Does Not Have the Permission for the .Trash Directory
Appendix
Precautions
Installing the Flume Client
Change History
Component Operation Guide (LTS) (Ankara Region)
Using CarbonData
Overview
CarbonData Overview
Main Specifications of CarbonData
Common CarbonData Parameters
CarbonData Operation Guide
CarbonData Quick Start
CarbonData Table Management
About CarbonData Table
Creating a CarbonData Table
Deleting a CarbonData Table
Modify the CarbonData Table
CarbonData Table Data Management
Loading Data
Deleting Segments
Combining Segments
CarbonData Data Migration
CarbonData Performance Tuning
Tuning Guide
Suggestions for Creating CarbonData Tables
Configurations for Performance Tuning
CarbonData Access Control
CarbonData Syntax Reference
DDL
CREATE TABLE
CREATE TABLE As SELECT
DROP TABLE
SHOW TABLES
ALTER TABLE COMPACTION
TABLE RENAME
ADD COLUMNS
DROP COLUMNS
CHANGE DATA TYPE
REFRESH TABLE
REGISTER INDEX TABLE
DML
LOAD DATA
UPDATE CARBON TABLE
DELETE RECORDS from CARBON TABLE
INSERT INTO CARBON TABLE
DELETE SEGMENT by ID
DELETE SEGMENT by DATE
SHOW SEGMENTS
CREATE SECONDARY INDEX
SHOW SECONDARY INDEXES
DROP SECONDARY INDEX
CLEAN FILES
SET/RESET
Operation Concurrent Execution
API
Spatial Indexes
CarbonData Troubleshooting
Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
Query Performance Deterioration
CarbonData FAQ
Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
How to Avoid Minor Compaction for Historical Data?
How to Change the Default Group Name for CarbonData Data Loading?
Why Does INSERT INTO CARBON TABLE Command Fail?
Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial Executors Is Zero?
Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
Why Do I Fail to Create a Hive Table?
How Do I Logically Split Data Across Different Namespaces?
Why the UPDATE Command Cannot Be Executed in Spark Shell?
How Do I Configure Unsafe Memory in CarbonData?
Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
How Do I Restore the Latest tablestatus File That Has Been Lost or Damaged When TableStatus Versioning Is Enabled?
Using CDL
Instructions for Using CDL
Supported Data Formats
CDL JSON and Open-Source Debezium JSON
Synchronizing Open-source Debezium JSON Data
Using CDL from Scratch
Creating a CDL User
Encrypting Data
Preparing for Creating a CDL Job
Enabling Kafka High Reliability
Logging In to the CDLService Web UI
Uploading a Driver File
Creating a Database Link
Managing ENV
Configuring Heartbeat and Data Consistency Check for a Synchronization Task
Creating a CDL Job
Creating a CDL Data Synchronization Job
Creating a CDL Data Comparison Job
Common CDL Jobs
Importing Data from MySQL to HDFS
Importing Data from Oracle to HDFS
Synchronizing Data from PgSQL to Kafka
Synchronizing Data from Oracle to Hudi
Synchronizing Data from MySQL to Hudi
Synchronizing Data from PgSQL to Hudi
Synchronizing Data from OpenGauss to Hudi
Synchronizing drs-opengauss-json Database from ThirdKafka to Hudi
Synchronizing drs-oracle-json Database from ThirdKafka to Hudi
Synchronizing drs-oracle-avro Database from ThirdKafka to Hudi
Synchronizing Open-Source Debezium JSON Data from ThirdKafka to Hudi
Synchronizing Data from Hudi to GaussDB(DWS)
Synchronizing Data from Hudi to ClickHouse
DDL Operations
Creating a CDL Job
Common CDL Service APIs
CDL Log Overview
CDL FAQs
Error ORA-01284 Is Reported When An Oracle Job Is Started
Hudi Does Not Receive Data After a CDL Job Is Executed
Error 104 or 143 Is Reported After a CDL Job Runs for a Period of Time
Error Is Reported When the Job of Capturing Data From PgSQL to Hudi Is Started
Error 403 Is Reported When a CDL Job Is Stopped
When Ranger Authentication Is Enabled, Why Can a User Still Perform Operations on the Tasks Created by Itself After All Permissions of the User Are Deleted?
How Do I Capture Data from a Specified Location When a MySQL Link Task Is Started?
Why Is the Value of Task configured for the OGG Source Different from the Actual Number of Running Tasks When Data Is Synchronized from OGG to Hudi?
Why Are There Too Many Topic Partitions Corresponding to the CDL Synchronization Task Names?
What Should I When a CDL Task Is Executed to Synchronize Data to the Hudi, an Error Message Indicating that the Current User Does Not Have the Permission to Create Tables in the Database Created by Another User?
What Should I Do If a CDL Task Failed When I Perform DDL Operations?
What Should I Do If a CDL Data Synchronization Task Fails and the YARN Task Waits for More Than 10 Minutes Before Running Again?
Using ClickHouse
Using ClickHouse from Scratch
ClickHouse Permission Management
ClickHouse User and Permission Management
Changing the Passwords of Default and ClickHouse Users
Clearing the Passwords of the Default and ClickHouse Users
ClickHouse Table Engine Overview
Creating a ClickHouse Table
Common ClickHouse SQL Syntax
CREATE DATABASE: Creating a Database
CREATE TABLE: Creating a Table
INSERT INTO: Inserting Data into a Table
DELETE: Lightweight Deleting Table Data
SELECT: Querying Table Data
ALTER TABLE: Modifying a Table Schema
ALTER TABLE: Modifying Table Data
DESC: Querying a Table Structure
DROP: Deleting a Table
SHOW: Displaying Information About Databases and Tables
UPSERT: Writing Data
Migrating ClickHouse Data
Using ClickHouse to Import and Export Data
Using the ClickHouse Data Migration Tool
ClickHouse Batch Data Import
Adaptive MV Usage in ClickHouse
Configuring Interconnection Between ClickHouse and HDFS
Configuring Interconnection Between ClickHouse and Kafka
Interconnecting with Kafka Using a Username and Password
Interconnecting with Kafka Through Kerberos Authentication
Interconnecting with Kafka in Normal Mode
Configuring the Connection Between ClickHouse and Open-Source ClickHouse
Configuring Strong Data Consistency Between ClickHouse Replicas
Configuring the Support for Transactions on ClickHouse
Pre-Caching ClickHouse Metadata to the Memory
Collecting Dumping Logs of the ClickHouse System Tables
ClickHouse Log Overview
ClickHouse FAQ
How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
How Do I Quickly Restore the Status of a Logical Cluster in a Scale-in Fault Scenario?
What Should I Do If a File System Error Is Reported and Core Dump Occurs During Process Startup and part Loading After a ClickHouserServer Instance Node Is Power Cycled?
What Should I Do If an Exception Occurred in the replication_queue and Data Is Inconsistent Between Replicas After a ClickHouse Cluster Is Powered On from a Sudden Poweroff?
Using Containers
Introduction to Containers
Adding or Deleting an Application
Overview
Creating a Group
Adding a BLU
Deleting a BLU
Monitoring Applications
Viewing the Group Status
Viewing the BLU Status
Viewing the BLU Instance Status
Viewing the WebContainer Status
Starting and Stopping Applications
Starting and Stopping a BLU
Starting and Stopping a BLU Instance
Adjusting Application Resources
Overview
Adding a BLU Instance
Deleting a BLU Instance
Adding a Container
Releasing a Container
Modifying Application Configurations
Downloading a Configuration Set
Updating the Configuration Set of a Group
Updating the BLU Configuration
Updating the BLU Log Level
Service Governance
Viewing the Service List
Adjusting Service Governance Parameters
Upgrade Example for a Fixed Version Number on the Consumer Side
Creating a Containers Role
Deploying an ALB
Introduction to Containers Logs
Using DBService
DBService Log Overview
Using Doris
Installing a MySQL Client
Using Doris from Scratch
Permissions Management
Doris Permissions Management
Column Permission Management
Multi-Tenancy
Overview
Managing Doris Tenants
Multi-Tenancy Alarms
Native Web UI
Doris Data Model
Doris Cold and Hot Data Separation
Introduction
Configuring Cold and Hot Data Separation
Data Operations
Data Import
Broker Load
Stream Load
Exporting Data
Exporting Data from HDFS to OBS
Exporting the Query Result Set
Typical SQL Syntax
Creating a Database
Creating a Table
Inserting Data
Modifying a Table Structure
Deleting Tables
Backing Up and Restoring Data
Backing Up Doris Data
Restoring Doris Data
Hive Data Analysis
Multi-Catalog
Hive
Ecosystem
Spark Doris Connector
Flink Doris Connector
Doris FAQs
What Should I Do If "Failed to find enough host with storage medium and tag" Occasionally Occurs During Table Creation Due to the Configuration of the SSD and HDD Data Directories?
What Should I Do If a Query Is Performed on the BE Node Where Some Copies Are Lost or Damaged and an Error Is Reported?
What Should I Do If RPC Timeout Error Is Reported When Stream Load Is Used?
How Do I Restore the FE Service from a Fault?
What Do I Do If the Error Message "plugin not enabled" Is Displayed When the MySQL Client Is Used to Connect to the Doris Database?
How Do I Handle the FE Startup Failure?
How Do I Handle the Startup Failure Due to Incorrect IP Address Matching for the BE Instance?
What Should I Do If Error Message "Read timed out" Is Displayed When the MySQL Client Connects to the Doris?
What Should I Do If an Error Is Reported When the BE Runs a Data Import or Query Task?
What Should I Do If a Timeout Error Is Reported When Broker Load Imports Data?
What Should I Do If the Data Volume of a Broker Load Import Task Exceeds the Threshold?
What Should I Do If an Error Message Is Displayed When Broker Load Is Used to Import Data?
How Do I Rectify the Serialization Exception Reported When Data Is Imported to Spark Load?
What Should I Do If An App ID Cannot Be Obtained When Spark Load Imports Data?
Doris Logs
Using Elasticsearch
Using Elasticsearch from Scratch
Elasticsearch Usage Suggestions
Service Planning Suggestions
Suggestion for Creating an Index
Reasonable Mapping Setting Suggestions
Shard Planning Suggestions
Data Life Cycle
Usage Description of ingest-geoip
Elasticsearch Authentication Mode
Authentication Based on Users and Roles
Authentication Based on Ranger
Switching the Authentication Mode
Switching Authentication Based on Users and Roles
Switching to Ranger-based Authentication
Using the Elasticsearch Client
Running curl Commands in Linux
In-House Plug-Ins
Index Template
Vector Search
Overview
Creating an Index
Importing a Vector
Querying a Vector
Usage of IVF_GRAPH and IVF_GRAPH_PQ Algorithms
Configuration of Other Parameters
Vector Database Management
API Authentication Whitelist Configuration
SSL Encrypted Transmission Configuration
Custom Data Directory
Traffic Control
Index Lifecycle Management
Using UDFs in SQL Queries
Connecting Elasticsearch to Other Components
Overview
Using Basic Authentication to Connect to Other Components
Connecting Elasticsearch to Logstash
Interconnecting Elasticsearch with Beats (Filebeat)
Interconnecting Elasticsearch with Beats (Metricbeat)
Connecting Elasticsearch to Kibana
Using Kerberos Authentication to Interconnect with Other Components
Connecting Elasticsearch to Flume ESSink
Switching the Elasticsearch Security Mode
Synchronizing Index Owner Group
Migrating Data
Migration Tool Overview
Migrating HBase Data Using HBase2ES
Migrating Elasticsearch Data Using ES2ES Tool
Using Scroll to Migrate Data
Using Reindex to Migrate Data
Migrating HDFS Data Using HDFS2ES
Migrating Solr Data Using Solr2ES
Using HBase2ES Tool to Synchronize Solr Data to Elasticsearch
Scenario Description
Migration Solution
Data Migration
Configuration Optimization
Migrating Elasticsearch Data Using Snapshots
Elasticsearch Log Overview
Elasticsearch Performance Tuning
Disabling Swapping
Performance Optimization for Large Clusters
Optimizing Write Performance
Distributing Shards Evenly
Changing Index Refresh Time and Number of Copies
Modifying the Merge Parameter and the Number of Threads
Modifying Transaction Log Parameter translog
Disabling Doc Values
Disabling the _source Field
Data Query and Optimization
Optimizing Mappings
Optimizing Query Statements
Forcibly Merging Segments
Query Based on Filter Conditions
Routing
Configuring the EsClient Role (Coordinator Node)
Scroll Query
Not Using Wildcard Fuzzy Query
Optimizing Aggregation
Timeout Parameters
Common Issues About Elasticsearch
Common Problems About the Reindex Tool
How Can I Delete the Index Data That Has Been Imported?
What Can I Do If Indexes of the Source Cluster Fail to Automatically Create Indexes in the Target Cluster?
What Can I Do If the Query Speed Is Slow in Full-Text Retrieval Scenarios?
What Can I Do If High Read I/O Occurs When Document IDs Are Specified in the Scenario When the Data Written into the Database Reaches a Certain Volume?
Custom Elasticsearch Plug-in Installation Guide
What Can I Do If Status of Elasticsearch Shards (Unassigned Shards) Becomes Down?
Elasticsearch Fails to be Started Because of the Inconsistent Xms and Xmx Configurations of the Memory
What Can I Do If "vm.max_map_count is too low" Is Reported When Elasticsearch Fails to Be Started?
What Can I Do If Instance Startup Failure Is Caused by the Configuration File During the Elasticsearch Startup?
What Can I Do If Elasticsearch Instance Fault Occurs Due to Insufficient Directory Permission?
What Can I Do If the Speed of Writing Data into Elasticsearch Is Slow Due to Fault on an Elasticsearch Node?
What Can I Do If two Different Values Are Returned for hits.total When the Same Statement Is Used to Query Data in Elasticsearch in the Same Condition for Twice?
What Can I Do If the Heap Memory of an EsNode Instance Overflows During the Running of Elasticsearch?
What Can I Do If Data Fails to Be Written Because the Type of the Data to Be Written Is Different from That of the Existing Data?
What Can I Do If the Authentication Failed When Accessing the Index Data?
EsMaster Memory Overflows During Elasticsearch Cluster Restart
Using Flink
Using Flink from Scratch
Viewing Flink Job Information
Configuring Flink Service Parameters
Configuring Flink Security Features
Security Features
Authentication and Encryption
Configuring Kafka
Configuring Pipeline
Configuring the JOIN between Tables and Streams
Configuring and Developing a Flink Visualization Job
Introduction to Flink Web UI
Flink Web UI Permission Management
Creating a FlinkServer Role
Accessing the Flink Web UI
Creating an Application
Creating a Cluster Connection
Creating a Data Connection
Creating a Stream Table
Creating a Job
Restoring a Job
Configuring Dependency Management
Configuring and Managing UDFs
Configuring the FlinkServer UDF Sandbox
Reusing Flink UDFs
Importing and Exporting Jobs
Verifying Flink's Job Inspection
Configuring Interconnection Between FlinkServer and Other Components
Interconnecting FlinkServer with ClickHouse
Interconnecting FlinkServer with Elasticsearch
Interconnecting FlinkServer with GaussDB(DWS)
Interconnecting FlinkServer with JDBC
Interconnecting FlinkServer with HBase
Interconnecting FlinkServer with HDFS
Interconnecting FlinkServer with Hive
Interconnecting FlinkServer with Hudi
Interconnecting FlinkServer with Kafka
Interconnecting FlinkServer with Redis
Flink Log Overview
Flink Performance Tuning
Memory Configuration Optimization
Configuring DOP
Configuring Process Parameters
Optimizing the Design of Partitioning Method
Configuring the Netty Network Communication
State Backend Optimization
RocksDB State Backend Optimization
Enabling Hot-Cold Separation for State Backends
Experience Summary
Common Flink Shell Commands
Reference
Example of Issuing a Certificate
Flink Restart Policy
Enhancements to Flink SQL
Using the DISTRIBUTEBY Feature
Supporting Late Data in Flink SQL Window Functions
Configuring Table-Level Time To Live (TTL) for Joining Multiple Flink Streams
Verifying SQL Statements with the FlinkSQL Client
Submitting a Job on the FlinkSQL Client
Joining Big and Small Tables
Deduplicating Data When Joining Big and Small Tables
Setting Source Parallelism
Limiting Read Rate for Flink SQL Kafka and Upsert-Kafka Connector
Consuming Data in drs-json Format with FlinkSQL Kafka Connector
Using ignoreDelete in JDBC Data Writes
Join-To-Live
Flink on Hudi Development Specifications
Hudi Table Streaming Reads
Hudi Table Streaming Writes
Submitting Flink on Hudi Jobs
Using Flume
Using Flume from Scratch
Overview
Installing the Flume Client
Viewing Flume Client Logs
Stopping or Uninstalling the Flume Client
Using the Encryption Tool of the Flume Client
Flume Service Configuration Guide
Flume Configuration Parameter Description
Using Environment Variables in the properties.properties File
Non-Encrypted Transmission
Configuring Non-encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
Encrypted Transmission
Configuring the Encrypted Transmission
Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
Viewing Flume Client Monitoring Information
Connecting Flume to Kafka in Security Mode
Connecting Flume with Hive in Security Mode
Configuring the Flume Service Model
Overview
Service Model Configuration Guide
Introduction to Flume Logs
Flume Client Cgroup Usage Guide
Secondary Development Guide for Flume Third-Party Plug-ins
Common Issues About Flume
Using HBase
Using HBase from Scratch
Using an HBase Client
Creating HBase Roles
Configuring HBase Replication
Configuring HBase Parameters
Enabling Cross-Cluster Copy
Using the ReplicationSyncUp Tool
GeoMesa Command Line
Using HIndex
Introduction to HIndex
Loading Index Data in Batches
Using an Index Generation Tool
Migrating Index Data
Using Global Secondary Indexes
Introduction
Restrictions
Using the GSI Tool
Creating Indexes
Querying Index Information
Deleting an Index
Changing Index Status
Creating Indexes in Batches
Checking Consistency and Rebuilding Index Data
Loading Index Data in Batches
GSI APIs
Querying Data with Indexes
Configuring HBase DR
Configuring HBase Data Compression and Encoding
Performing an HBase DR Service Switchover
Performing an HBase DR Active/Standby Cluster Switchover
Community BulkLoad Tool
Configuring Secure HBase Replication
Configuring Region In Transition Recovery Chore Service
Enabling the HBase Compaction
Using a Secondary Index
Hot-Cold Data Separation
Overview
Enabling Hot-Cold Data Separation
Cold-Hot Separation Commands
Configuring HBase Table-Level Overload Control
HBase Log Overview
HBase Performance Tuning
Improving the BulkLoad Efficiency
Improving Put Performance
Optimizing Put and Scan Performance
Improving Real-time Data Write Performance
Improving Real-time Data Read Performance
Optimizing JVM Parameters
Optimization for HBase Overload
Enabling CCSMap Functions
Enabling Succinct Trie
Common Issues About HBase
Why Does a Client Keep Failing to Connect to a Server for a Long Time?
Operation Failures Occur in Stopping BulkLoad On the Client
Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
How Do I Restore a Region in the RIT State for a Long Time?
Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Why Does SocketTimeoutException Occur When a Client Queries HBase?
Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
Insufficient Rights When a Tenant Accesses Phoenix
What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
How Do I Fix Region Overlapping?
Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
How Do I View Regions in the CLOSED State in an ENABLED Table?
How Can I Quickly Recover the Service When HBase Files Are Damaged Due to a Cluster Power-Off?
How Do I Disable HDFS Hedged Read on HBase?
Using HetuEngine
Using HetuEngine from Scratch
HetuEngine Permission Management
Overview
HetuEngine Ranger-based Permission Control
HetuEngine MetaStore-based Permission Control
Proxy User Authentication
Creating a HetuEngine User
Creating a HetuEngine Compute Instance
Managing HetuEngine Compute Instances
Configuring Resource Groups
Configuring the Number of Worker Nodes
Configuring a HetuEngine Maintenance Instance
Configuring the Nodes on Which Coordinator Is Running
Importing and Exporting Compute Instance Configurations
Viewing the Instance Monitoring Page
Viewing Coordinator and Worker Logs
Configuring Query Fault Tolerance Execution
Using the HetuEngine Client
Using the HetuEngine Cross-Source Function
Using the HetuEngine Cross-Domain Function
Configuring Data Sources
Before You Start
Configuring a Hive Data Source
Configuring a Co-deployed Hive Data Source
Configuring an Independently Deployed Hive Data Source
Configuring a Hudi Data Source
Configuring a ClickHouse Data Source
Configuring an Elasticsearch Data Source
Configuring a GaussDB Data Source
Configuring an HBase Data Source
Configuring a HetuEngine Data Source
Configuring an IoTDB Data Source
Configuring a MySQL Data Source
Managing Configured Data Sources
Using HetuEngine Materialized Views
Overview of Materialized Views
SQL Statement Example of Materialized Views
Configuring Rewriting of Materialized Views
Configuring Recommendation of Materialized Views
Configuring Caching of Materialized Views
Configuring the Validity Period and Data Update of Materialized Views
Configuring Intelligent Materialized Views
Viewing Automatic Tasks of Materialized Views
Using HetuEngine SQL Diagnosis
Using a Third-Party Visualization Tool to Access HetuEngine
Using DBeaver to Access HetuEngine
Using Tableau to Access HetuEngine
Using Power BI to Access HetuEngine
Using Yonghong BI to Access HetuEngine
Developing and Applying Functions and UDFs
HetuEngine Function Plugin Development and Application
Hive UDF Development and Application
HetuEngine UDF Development and Application
HetuEngine Logs
HetuEngine Performance Tuning
Adjusting the YARN Service Configuration
Adjusting Cluster Node Resource Configurations
Optimizing INSERT Statements
Adjusting Metadata Cache
Enabling Dynamic Filtering
Adjusting the Execution of Adaptive Queries
Adjusting Timeout for Hive Metadata Loading
Tuning Hudi Data Source Performance
HetuEngine FAQ
How Do I Perform Operations After the Domain Name Is Changed?
What Do I Do If Starting a Cluster on the Client Times Out?
How Do I Handle Data Source Loss?
How Do I Handle HetuEngine Alarms?
How Do I Do If an Error Is Reported Indicating that Python Does Not Exist When a Compute Instance Fails to Start?
How Do I Do If a Compute Instance Fails 30 Seconds After It Is Started?
What Do I Do If Data Fails to Be Written to a Table Because the Namespace of the Table Is Different from That of the /tmp Directory in the Federation Scenario?
How Do I Configure HetuEngine SQL Inspection?
Using HDFS
Using Hadoop from Scratch
Configuring Memory Management
Creating an HDFS Role
Using the HDFS Client
Running the DistCp Command
Overview of HDFS File System Directories
Changing the DataNode Storage Directory
Configuring HDFS Directory Permission
Configuring NFS
Planning HDFS Capacity
Configuring ulimit for HBase and HDFS
Configuring HDFS DataNode Data Balancing
Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
Configuring the Number of Files in a Single HDFS Directory
Configuring the Recycle Bin Mechanism
Setting Permissions on Files and Directories
Setting the Maximum Lifetime and Renewal Interval of a Token
Configuring the Damaged Disk Volume
Configuring Encrypted Channels
Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
Configuring the NameNode Blacklist
Optimizing HDFS NameNode RPC QoS
Optimizing HDFS DataNode RPC QoS
Configuring Reserved Percentage of Disk Usage on DataNodes
Configuring HDFS NodeLabel
Configuring HDFS Mover
Using HDFS AZ Mover
Configuring HDFS DiskBalancer
Configuring the Observer NameNode to Process Read Requests
Performing Concurrent Operations on HDFS Files
Introduction to HDFS Logs
HDFS Performance Tuning
Improving Write Performance
Improving Read Performance Using Client Metadata Cache
Improving the Connection Between the Client and NameNode Using Current Active Cache
FAQ
NameNode Startup Is Slow
DataNode Is Normal but Cannot Report Data Blocks
HDFS WebUI Cannot Properly Update Information About Damaged Data
Why Do DistCp Commands Fail to Run in a Security Cluster and Exceptions Are Thrown?
How Do I Rectify the Faulty If DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated?
Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
What Should I Do If Data in the Cache Is Lost When the System Is Powered Off During Small File Storage?
Why Does Array Border-crossing Occur During FileInputFormat Split?
Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
How Do I Handle the Problem that HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
Can I Delete or Modify the Data Storage Directory in DataNode?
Blocks Miss on the NameNode UI After the Successful Rollback
Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
Why are There Two Standby NameNodes After the active NameNode Is Restarted?
When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
NameNode Fails to Be Restarted Due to EditLog Discontinuity
Using Hive
Using Hive from Scratch
Configuring Hive Parameters
Hive SQL
Permission Management
Hive Permission
Creating a Hive Role
Configuring Permissions for Hive Tables, Columns, or Databases
Configuring Permissions to Use Other Components for Hive
Using a Hive Client
Using HDFS Colocation to Store Hive Tables
Using the Hive Column Encryption Function
Customizing Row Separators
Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
Deleting Single-Row Records from Hive on HBase
Configuring HTTPS/HTTP-based REST APIs
Enabling or Disabling the Transform Function
Access Control of a Dynamic Table View on Hive
Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
Using Hive to Read Data in a Relational Database
Supporting Traditional Relational Database Syntax in Hive
Creating User-Defined Hive Functions
Enhancing beeline Reliability
Viewing Table Structures Using the show create Statement as Users with the select Permission
Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
Inserting Data to a Directory That Does Not Exist
Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
Disabling of Specifying the location Keyword When Creating an Internal Hive Table
Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
Authorizing Over 32 Roles in Hive
Restricting the Maximum Number of Maps for Hive Tasks
HiveServer Lease Isolation
Hive Supports Isolation of Metastore instances Based on Components
Switching the Hive Execution Engine to Tez
Hive Supporting Reading Hudi Tables
Hive Supporting Cold and Hot Storage of Partitioned Metadata
Hive Supporting ZSTD Compression Formats
Locating Abnormal Hive Files
Using the ZSTD_JNI Compression Algorithm to Compress Hive ORC Tables
Load Balancing for Hive MetaStore Client Connection
Data Import and Export in Hive
Importing and Exporting Table/Partition Data in Hive
Importing and Exporting Hive Databases
Hive Log Overview
Hive Performance Tuning
Creating Table Partitions
Optimizing Join
Optimizing Group By
Optimizing Data Storage
Optimizing SQL Statements
Optimizing the Query Function Using Hive CBO
Common Issues About Hive
How Do I Delete UDFs on Multiple HiveServers at the Same Time?
Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
How to Perform Operations on Local Files with Hive User-Defined Functions
How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
How Do I Monitor the Hive Table Size?
How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
Description of Hive Table Location (Either Be an OBS or HDFS Path)
Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
Why Does Hive Not Support Vectorized Query?
Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
How Do I Disable the Logging Function of Hive?
Why Hive Tables in the OBS Directory Fail to Be Deleted?
Hive Configuration Problems
How Do I Handle the Error Reported When Setting hive.exec.stagingdir on the Hive Client?
Using Hudi
Getting Started
Common Hudi Parameters
Basic Operations
Hudi Table Schema
Write
Before You Start
Batch Write
Stream Write
Synchronizing Hudi Table Data to Hive
Read
Overview
Reading COW Table Views
Reading MOR Table Views
Data Management and Maintenance
Clustering
Cleaning
Compaction
Savepoint
Single-Table Concurrency Control
Partition Concurrency Control
Deleting Historical Data
Using Hudi Payload
Using the Hudi Client
Operating a Hudi Table Using hudi-cli.sh
Hudi SQL Syntax Reference
Constraints
DDL
CREATE TABLE
CREATE TABLE AS SELECT
DROP TABLE
SHOW TABLE
ALTER RENAME TABLE
ALTER ADD COLUMNS
ALTER ALTER COLUMN
TRUNCATE TABLE
DML
INSERT INTO
MERGE INTO
UPDATE
DELETE
COMPACTION
SET/RESET
ARCHIVELOG
CLEAN
CLEANARCHIVE
CALL COMMAND
CHANGE_TABLE
CLEAN_FILE
SHOW_TIME_LINE
SHOW_HOODIE_PROPERTIES
SAVE_POINT
ROLL_BACK
CLUSTERING
Cleaning
Compaction
SHOW_COMMIT_FILES
SHOW_FS_PATH_DETAIL
SHOW_LOG_FILE
SHOW_INVALID_PARQUET
Setting Default Values for Hudi Columns
Hudi Performance Tuning
Common Issues About Hudi
Data Write
Parquet/Avro schema Is Reported When Updated Data Is Written
UnsupportedOperationException Is Reported When Updated Data Is Written
SchemaCompatabilityException Is Reported When Updated Data Is Written
What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
Hudi Fails to Write Decimal Data with Lower Precision
Data in ro and rt Tables Cannot Be Synchronized to a MOR Table Recreated After Being Deleted Using Spark SQL
Data Collection
IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
HoodieException Is Reported When Data Is Collected
HoodieKeyException Is Reported When Data Is Collected
Hive Synchronization
SQLException Is Reported During Hive Data Synchronization
HoodieHiveSyncException Is Reported During Hive Data Synchronization
SemanticException Is Reported During Hive Data Synchronization
Using Hue
Using Hue from Scratch
Accessing the Hue Web UI
Hue Common Parameters
Using HiveQL Editor on the Hue Web UI
Using the SparkSql Editor on the Hue Web UI
Using the Metadata Browser on the Hue Web UI
Using File Browser on the Hue Web UI
Using Job Browser on the Hue Web UI
Using HBase on the Hue Web UI
Typical Scenarios
HDFS on Hue
Hive on Hue
Oozie on Hue
Hue Log Overview
Common Issues About Hue
Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
Why Does the use database Statement Become Invalid in Hive?
Why Do HDFS Files Fail to Access Through the Hue Web UI?
Why Do Large Files Fail to Upload on the Hue Page
Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
What Should I Do If It Takes a Long Time to Access the Native Hue UI and the File Browser Reports "Read timed out"?
Using IoTDB
Using IoTDB from Scratch
Using the IoTDB Client
Configuring IoTDB Parameters
Data Types and Encodings Supported by IoTDB
IoTDB Permission Management
IoTDB Permissions
Creating an IoTDB Role
IoTDB Log Overview
UDFs
UDF Overview
UDF Sample Code and Operations
IoTDB Data Import and Export
Importing IoTDB Data
Exporting IoTDB Data
Planning IoTDB Capacity
IoTDB Performance Tuning
IoTDB Error Logs
Using JobGateway
Using JobGateway from Scratch
Configuring JobGateway Parameters
JobGateway Logs
Using Kafka
Using Kafka from Scratch
Managing Kafka Topics
Querying Kafka Topics
Managing Kafka User Permissions
Managing Messages in Kafka Topics
Synchronizing Binlog-based MySQL Data to the MRS Cluster
Creating a Kafka Role
Kafka Common Parameters
Safety Instructions on Using Kafka
Kafka Specifications
Using the Kafka Client
Configuring Kafka HA and High Reliability Parameters
Changing the Broker Storage Directory
Checking the Consumption Status of Consumer Group
Kafka Balancing Tool Instructions
Kafka Token Authentication Mechanism Tool Usage
Kafka Encryption and Decryption
Using Kafka UI
Accessing Kafka UI
Kafka UI Overview
Creating a Topic on Kafka UI
Migrating a Partition on Kafka UI
Managing Topics on Kafka UI
Viewing Brokers on Kafka UI
Viewing a Consumer Group on Kafka UI
Kafka Logs
Performance Tuning
Kafka Performance Tuning
Kafka Feature Description
Migrating Data Between Kafka Nodes
Common Issues About Kafka
How Do I Solve the Problem that Kafka Topics Cannot Be Deleted?
Using KMS
Interconnecting HDFS with KMS
Permission Control
Creating a KMS Role
Creating a KMS User
Key Management
Updating a Key
Transparent Encryption of Upper-layer Components
Configuring HDFS Partition Encryption
Configuring Transparent Encryption for HBase
Configuring Transparent Encryption for Hive
Precautions for Transparent Encryption
KMS Log Overview
Using LakeSearch
Permission Management
Overview
Creating a LakeSearch Role
Accessing the LakeSearch Web UI
Managing Knowledge Bases
Using the Experience Platform
Managing Dialogs
LakeSearch Logs
Using Loader
Common Loader Parameters
Creating a Loader Role
Managing Loader Links
Preparing a Driver for MySQL Database Link
Importing Data
Overview
Importing Data Using Loader
Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
Typical Scenario: Importing Data from an SFTP Server to HBase
Typical Scenario: Importing Data from an SFTP Server to Hive
Typical Scenario: Importing Data from an FTP Server to HBase
Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
Typical Scenario: Importing Data from a Relational Database to HBase
Typical Scenario: Importing Data from a Relational Database to Hive
Typical Scenario: Importing Data from HDFS or OBS to HBase
Typical Scenario: Importing Data from a Relational Database to ClickHouse
Exporting Data
Overview
Using Loader to Export Data
Typical Scenario: Exporting Data from HDFS or OBS to an SFTP Server
Typical Scenario: Exporting Data from HBase to an SFTP Server
Typical Scenario: Exporting Data from Hive to an SFTP Server
Typical Scenario: Exporting Data from HDFS or OBS to a Relational Database
Typical Scenario: Exporting Data from HDFS to MOTService
Typical Scenario: Exporting Data from HBase to a Relational Database
Typical Scenario: Exporting Data from Hive to a Relational Database
Typical Scenario: Importing Data from HBase to HDFS or OBS
Typical Scenario: Exporting Data from HDFS to ClickHouse
Managing Jobs
Migrating Loader Jobs in Batches
Deleting Loader Jobs in Batches
Importing Loader Jobs in Batches
Exporting Loader Jobs in Batches
Viewing Historical Job Information
Operator Help
Overview
Input Operators
CSV File Input
Fixed File Input
Table Input
HBase Input
HTML Input
Hive input
Spark Input
Conversion Operators
Long Date Conversion
Null Value Conversion
Constant Field Addition
Random Value Conversion
Concat Fields
Extract Fields
Modulo Integer
String Cut
EL Operation
String Operations
String Reverse
String Trim
Filter Rows
Update Fields Operator
Output Operators
Hive output
Spark Output
Table Output
File Output
HBase Output
ClickHouse Output
Associating, Editing, Importing, or Exporting the Field Configuration of an Operator
Using Macro Definitions in Configuration Items
Operator Data Processing Rules
Client Tools
Running a Loader Job Through CLI
loader-tool Usage Guide
loader-tool Usage Example
schedule-tool Usage Guide
schedule-tool Usage Example
Using loader-backup to Back Up Job Data
Open Source sqoop-shell Tool Usage Guide
Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
Loader Log Overview
Common Issues About Loader
Why Can't I Save Data on Internet Explorer 10 or 11?
Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
Why Data Is Not Imported to HDFS After All Data Types of SQL Server Are Selected?
Using MapReduce
Configuring the Log Archiving and Clearing Mechanism
Reducing Client Application Failure Rate
Transmitting MapReduce Tasks from Windows to Linux
Configuring the Distributed Cache
Configuring the MapReduce Shuffle Address
Configuring the Cluster Administrator List
Introduction to MapReduce Logs
MapReduce Performance Tuning
Optimization Configuration for Multiple CPU Cores
Determining the Job Baseline
Streamlining Shuffle
AM Optimization for Big Tasks
Speculative Execution
Using Slow Start
Optimizing Performance for Committing MR Jobs
Common Issues About MapReduce
How Do I Handle the Problem that MapReduce Task Has No Progress for a Long Time?
Why the Client Hangs During Job Running?
Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
How Do I Set the Task Priority When Submitting a MapReduce Task?
Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
MapReduce Job Failed in Multiple NameService Environment
Why a Fault MapReduce Node Is Not Blacklisted?
Using MemArtsCC
Setting Typical MemArtsCC Parameters
Configuring the Connection Between Hive and MemArtsCC
Integrating MemArtsCC into Spark Tasks
MemArtsCC Logs
Using Metadata
Creating a Metadata Role
Viewing Metadata Details
Configuring Automatic Metadata Extraction
Metadata Log Overview
Using MOTService
Using MOTService from Scratch
MOTService Permissions Management
Overview
User Permission Planning Process
Users and Roles
Creating an MOTService User
Creating a User and Allocating Permissions
Viewing a User's or Role's Permissions
Viewing the Default User
Modifying a User's System Permissions
Locking and Unlocking a User
Deleting a User
Using the MOTService Client
Introduction to the MOTService Maintenance Tool gs_om
MOTService Data Backup and Restoration
Introduction to gs_dump
Backing Up MOTService Data
Restoring MOTService Data
Importing and Exporting MOTService Metadata and Service Data
Reinstalling a MOTService Host
MOTService SQL Coverage and Limitations
MOTService Data Aging Configuration
Introduction to MOTService Logs
Using Oozie
Using Oozie from Scratch
Using the Oozie Client
Checking ShareLib
Using Oozie Client to Submit an Oozie Job
Submitting a Hive Job
Submitting a Spark Job
Submitting a Loader Job
Submitting a Sqoop Job
Submitting a DistCp Job
Submitting Other Jobs
Using Hue to Submit an Oozie Job
Creating a Workflow
Submitting a Workflow Job
Submitting a Hive2 Job
Submitting a Spark Job
Submitting a Java Job
Submitting a Loader Job
Submitting a MapReduce Job
Submitting a Sub-workflow Job
Submitting a Shell Job
Submitting an HDFS Job
Submitting a Streaming Job
Submitting a DistCp Job
Example of Mutual Trust Operations
Submitting an SSH Job
Submitting a Hive Script
Submitting a Coordinator Periodic Scheduling Job
Submitting a Bundle Batch Processing Job
Querying the Operation Results
Oozie Log Overview
Common Issues About Oozie
What Should I Do If Oozie Scheduled Tasks Are Not Executed on Time
Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
Common Oozie Troubleshooting Methods
What Should I Do If the User Who Submits Jobs on the Oozie Client in a Normal Cluster Is Inconsistent with the User Displayed on the Yarn Web UI?
Using Ranger
Logging In to the Ranger Web UI
Enabling Ranger Authentication
Configuring Component Permission Policies
Viewing Ranger Audit Information
Configuring a Security Zone
Changing the Ranger Data Source to LDAP for a Normal Cluster
Viewing Ranger Permission Information
Adding a Ranger Access Permission Policy for CDL
Adding a Ranger Access Permission Policy for HDFS
Adding a Ranger Access Permission Policy for HBase
Adding a Ranger Access Permission Policy for Hive
Adding a Ranger Access Permission Policy for Yarn
Adding a Ranger Access Permission Policy for Spark
Adding a Ranger Access Permission Policy for Kafka
Adding a Ranger Access Permission Policy for HetuEngine
Adding a Ranger Access Permission Policy for Storm
Adding a Ranger Access Permission Policy for Elasticsearch
Adding a Ranger Access Permission Policy for OBS
Hive Tables Supporting Cascading Authorization
Configuring Multi-Instance for RangerKMS
Using the RangerKMS Native UI to Manage Permissions and Keys
Ranger Log Overview
Common Issues About Ranger
Why Ranger Startup Fails During the Cluster Installation?
How Do I Determine Whether the Ranger Authentication Is Used for a Service?
Why Cannot a New User Log In to Ranger After Changing the Password?
When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
How Do I Rectify the Problem that RangerKMS Authentication Fails and the KMS Tab Is Not Displayed on the Ranger Management Page?
Using Redis
Using Redis from Scratch
Common Redis Parameters
Creating a Redis Role
Redis Cluster Management
Creating a Redis Cluster
Checking the Redis Cluster Status
Adjusting the Redis Cluster Capacity
Balancing Redis Cluster Data
Deleting a Redis cluster
Backing Up Redis Data
Redis Data Restoration Tool
Redis Log Overview
Redis Shortcut Keys
Redis Usage Specifications
Using RTDService
Overview
RTDService Permission Management
Permission Management Overview
Creating an RTDService Role
Creating an RTDService Service User
Accessing the RTDService Web UI
Tenant Management
Creating an RTDService Tenant
Importing and Exporting Tenant Information
Updating an RTDService Tenant
Service Management
Configuring Analysis Dimensions
Adding an Event Source
Configuring Event Variables
Adding a Batch Variable
Adding a Real-Time Query Variable
Window Variable Management
Adding a Window Variable
Managing Window Variables
Adding a Scoring Model
Adding an Inference Variable
Adding a Stored Procedure Rule
Adding a Blacklist/Whitelist Rule
Adding a Decision Engine
Database Tools
Database Operations
Managing Stored Procedures
Adding a Scheduled Data Cleaning Task
Managing a Template
Importing and Exporting RTDService Metadata
Modifying the Event Source Executor Configuration on the Web UI
RTDService Logs
Using Solr
Using Solr from Scratch
Creating a Solr Role
Using the Solr Client
Common Service Operations About Solr
Solr Overview
Configuration File managed-schema in the Solr Config Set
Configuration File solrconfig.xml in the Solr Config Set
Shell Client Operation Commands
Operations on the Solr Admin UI
Solr over HDFS
Solr over HBase
curl Commands in Linux
REST Messages Sent in URLs Through Browsers
Solr User Permission Configuration and Management
Word Filter Customization
HBase Full-Text Index
Sensitive Word Filtering
Including Collection Names in Query Results
Solr Multi-System Mutual Trust
Solr Rich Text Indexing
(Recommended) Changing the Collection Data Storage Mode from HDFS to Local Disk
Changing the Index Data Storage Mode from Local Disk to HDFS
Restoring Data using Solr
Solr Log Overview
Solr Performance Tuning
Suggestions on Sharding of Collections
Solr Public Read/Write Optimization Suggestions
Optimization Suggestions on Solr over HBase
Optimization Suggestions on Solr over HDFS
Common Issues About Solr
Why Cannot I Query Index Data Using Internet Explorer 11?
What Can I Do If the CPU Usage Exceeds the Threshold Due to a Large Amount of Data Written to Solr?
How Do Applications Access HBase and Solr at the Same Time?
Using Spark
Basic Operation
Getting Started
Configuring Parameters Rapidly
Common Parameters
Spark on HBase Overview and Basic Applications
Spark on HBase V2 Overview and Basic Applications
SparkSQL Permission Management(Security Mode)
Spark SQL Permissions
Creating a Spark SQL Role
Configuring Permissions for SparkSQL Tables, Columns, and Databases
Configuring Permissions for SparkSQL to Use Other Components
Configuring the Client and Server
Scenario-Specific Configuration
Configuring Multi-active Instance Mode
Configuring the Multi-Tenant Mode
Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
Configuring the Size of the Event Queue
Configuring Executor Off-Heap Memory
Enhancing Stability in a Limited Memory Condition
Viewing Aggregated Container Logs on the Web UI
Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
Configuring the Default Number of Data Blocks Divided by SparkSQL
Configuring the Compression Format of a Parquet Table
Configuring the Number of Lost Executors Displayed in WebUI
Setting the Log Level Dynamically
Configuring Whether Spark Obtains HBase Tokens
Configuring LIFO for Kafka
Configuring Reliability for Connected Kafka
Configuring Streaming Reading of Driver Execution Results
Filtering Partitions without Paths in Partitioned Tables
Configuring Spark Web UI ACLs
Configuring Vector-based ORC Data Reading
Broaden Support for Hive Partition Pruning Predicate Pushdown
Hive Dynamic Partition Overwriting Syntax
Configuring the Column Statistics Histogram for Higher CBO Accuracy
Configuring Local Disk Cache for JobHistory
Configuring Spark SQL to Enable the Adaptive Execution Feature
Configuring Event Log Rollover
Configuring the Spark Native Engine
Configuring Automatic Merging of Small Files
Adapting to the Third-party JDK When Ranger Is Used
Spark Log Overview
Obtaining Container Logs of a Running Spark Application
Small File Combination Tools
Using CarbonData for First Query
Spark Performance Tuning
Spark Core Tuning
Data Serialization
Optimizing Memory Configuration
Setting the DOP
Using Broadcast Variables
Using the external shuffle service to improve performance
Configuring Dynamic Resource Scheduling in Yarn Mode
Configuring Process Parameters
Designing the Direction Acyclic Graph (DAG)
Experience
Spark SQL and DataFrame Tuning
Optimizing the Spark SQL Join Operation
Improving Spark SQL Calculation Performance Under Data Skew
Optimizing Spark SQL Performance in the Small File Scenario
Optimizing the INSERT...SELECT Operation
Multiple JDBC Clients Concurrently Connecting to JDBCServer
Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
Optimizing Small Files
Optimizing the Aggregate Algorithms
Optimizing Datasource Tables
Merging CBO
Optimizing SQL Query of Data of Multiple Sources
SQL Optimization for Multi-level Nesting and Hybrid Join
Spark Streaming Tuning
Spark on OBS Tuning
Spark FAQ
Spark Core
How Do I View Aggregated Spark Application Logs?
Why Cannot Exit the Driver Process?
Why Does FetchFailedException Occur When the Network Connection Is Timed out
How to Configure Event Queue Size If Event Queue Overflows?
What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
Why Tasks Fail When Hash Shuffle Is Used?
What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
Why Does the Stage Retry due to the Crash of the Executor?
Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
NodeManager OOM Occurs During Spark Application Execution
Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
Spark SQL and DataFrame
What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
How to Assign a Parameter Value in a Spark Command?
What Directory Permissions Do I Need to Create a Table Using SparkSQL?
Why Do I Fail to Delete the UDF Using Another Service?
Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
How to Use Cache Table?
Why Are Some Partitions Empty During Repartition?
Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
How Do I Rectify the Exception Occurred When I Perform an Operation on the Table Named table?
Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
Why Do I Fail to Modify MetaData by Running the Hive Command?
Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
Why Are Some Functions Not Available when ThriftJDBCServers Are Connected?
Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
Spark Streaming
What Can I Do If Spark Streaming Tasks Are Blocked?
What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
Why Does the Spark Streaming Application Fail to Be Started from the Checkpoint When the Input Stream Has No Output Logic?
Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
Spark Ranger FAQ
Why Do Ranger Authentication and ACL Authentication Fail?
Why Do spark-sql and spark-submit Fail to Execute When Ranger Authentication Is Used and the Client Is Mounted in Read-Only Mode?
Why Is a Permission Exception Reported When Ranger Authentication and UDFs Are Used?
Why Is the RESTful Interface Information Obtained by Accessing Spark Incorrect?
Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
Why Is not an Application Displayed When I Run the Application with the Empty Part File?
Why Does Spark Fail to Export a Table with Duplicate Field Names?
Why JRE fatal error after running Spark application multiple times?
Why Is "This page can't be displayed" Displayed or an Error Reported When I Use Internet Explorer to Access the Native Web UI of Spark?
How Does Spark Access External Cluster Components?
Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
Why Is an Error Reported When I Access the Native Page of an Application in Spark JobHistory?
Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
Spark Shuffle Exception Handling
Why Cannot Common Users Log In to the Spark Client When There Are Multiple Service Scenarios in Spark?
Why Does the Cluster Port Fail to Connect When a Client Outside the Cluster Is Installed or Used?
How Do I Handle the Exception Occurred When I Query Datasource Avro Formats?
What Should I Do If Statistics of Hudi or Hive Tables Created Using Spark SQLs Are Empty Before Data Is Inserted?
Failed to Query Table Statistics by Partition Using Non-Standard Time Format When the Partition Column in the Table Creation Statement is timestamp
How Do I Use Special Characters with TIMESTAMP and DATE?
What Should I Do If Recycle Bin Version I Set on the Spark Client Does Not Take Effect?
How Do I Change the Log Level to INFO When Using Spark yarn-client?
Using Tez
Precautions
Common Tez Parameters
Accessing TezUI
Log Overview
Common Issues
TezUI Cannot Display Tez Task Execution Details
Error Occurs When a User Switches to the Tez Web UI
Yarn Logs Cannot Be Viewed on the TezUI Page
Table Data Is Empty on the TezUI HiveQueries Page
Using YARN
Common YARN Parameters
Creating Yarn Roles
Using the YARN Client
Configuring Resources for a NodeManager Role Instance
Changing NodeManager Storage Directories
Configuring Strict Permission Control for Yarn
Configuring Container Log Aggregation
Using CGroups with YARN
Configuring the Number of ApplicationMaster Retries
Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
Configuring the Access Channel Protocol
Configuring Memory Usage Detection
Configuring the Additional Scheduler WebUI
Configuring Yarn Restart
Configuring ApplicationMaster Work Preserving
Configuring the Localized Log Levels
Configuring Users That Run Tasks
Yarn Log Overview
Yarn Performance Tuning
Preempting a Task
Setting the Task Priority
Optimizing Node Configuration
Common Issues About Yarn
Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
Why Are Local Logs Not Deleted After YARN Is Restarted?
Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
Application Moved Back to the Original Queue After the ResourceManager Is Restarted?
Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
Why Does the Switchover of ResourceManager Occur Continuously?
Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
Using ZooKeeper
Using ZooKeeper from Scratch
Common ZooKeeper Parameters
Using a ZooKeeper Client
Configuring the ZooKeeper Permissions
ZooKeeper Log Overview
Common Issues About ZooKeeper
Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
How Do I Check Which ZooKeeper Instance Is a Leader?
Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
Appendix
Modifying Cluster Service Configuration Parameters
Change History
API Reference (Ankara Region)
Before You Start
API Overview
Selecting an API Type
Calling APIs
Making an API Request
Authentication
Response
Application Cases
Creating an MRS Cluster
Scaling Out a Cluster
Scaling in a Cluster
Creating a Job
Terminating a Job
Terminating a Cluster
API V2
Cluster Management APIs
Creating a Cluster
Changing a Cluster Name
Creating a Cluster and Submitting a Job
Scaling Out a Cluster
Scaling In a Cluster
Adding Components to a Cluster
Querying the Cluster Node List
Job Management APIs
Adding and Executing a Job
Querying Information About a Job
Querying a List of Jobs
Terminating a Job
Obtaining SQL Results
Deleting Jobs in Batches
Auto Scaling APIs
Viewing Auto Scaling Policies
Updating an Auto Scaling Policy
Deleting an AS policy
Creating an AS policy
Cluster HDFS File API
Obtaining the List of Files from a Specified Directory
SQL APIs
Submitting a SQL Statement
Querying SQL Results
Canceling a SQL Execution Task
Agency Management
Querying the Mapping Between a User (Group) and an IAM Agency
Updating the Mapping Between a User (Group) and an IAM Agency
Data Connection Management
Creating a Data Connection
Querying the Data Connection List
Updating a Data Connection
Deleting a Data Connection
Querying Version Metadata
Obtaining MRS Version List
Querying Available Specifications of an MRS Cluster Version
IAM Synchronization
Obtaining Synchronized IAM Users and User Groups
Synchronizing an IAM User and User Group
Cancelling Synchronization of Specified Users and User Groups
Tag Management APIs
Enabling or Disabling the Default Tag of a Cluster
Querying the Status of Default Cluster Tags
Querying Tag Quotas
API V1.1
Cluster Management APIs
Creating a Cluster and Executing a Job
Resizing a Cluster
Querying a Cluster List
Querying Cluster Details
Querying a Host List
Terminating a Cluster
Auto Scaling APIs
Configuring an Auto Scaling Rule
Tag Management APIs
Adding Tags to a Specified Cluster
Querying Tags of a Specified Cluster
Deleting Tags from a Specified Cluster
Adding Tags to a Cluster in Batches
Deleting Tags from a Cluster in Batches
Querying All Tags
Querying a List of Clusters with Specified Tags
Availability Zones
Querying AZ Information
Version Metadata
Querying the Metadata of a Cluster Version
Out-of-Date APIs
Job API Management (Deprecated)
Adding and Executing a Job (Deprecated)
Querying the exe Object List of Jobs (Deprecated)
Querying exe Object Details (Deprecated)
Deleting a Job Execution Object (Deprecated)
Permissions Policies and Supported Actions
Introduction
Appendix
Status Codes
Error Codes
Obtaining a Project ID
Obtaining Tenant ID
Obtaining the MRS Cluster Information
Roles and components supported by MRS
Change History
General Reference
Glossary
Service Level Agreement
White Papers
Endpoints
Permissions