-
Descripción general del servicio
- Infografías
- ¿Qué es MRS?
- Ventajas de MRS en comparación con Hadoop de desarrollo propio
- Escenarios de aplicación
- Elección de una versión apropiada al comprar un clúster de MRS
-
Componentes
- Lista de versiones de componentes de MRS
- Alluxio
- CarbonData
- ClickHouse
- DBService
- Flink
- Flume
- HBase
- HDFS
- HetuEngine
- Hive
- Hudi
- Hue
- Impala
- IoTDB
- Kafka
- KafkaManager
- KrbServer y LdapServer
- Kudu
- Loader
- Manager
- MapReduce
- Oozie
- OpenTSDB
- Presto
- Ranger
- Spark
- Spark2x
- Storm
- Tez
- YARN
- ZooKeeper
- Funciones
- Seguridad
- Restricciones
- Facturación
- Gestión de permisos
- Servicios relacionados
- Descripción de cuota
- Conceptos comunes
-
Guía del usuario
- Preparación de un usuario
-
Configuración de un clúster
- ¿Cómo comprar un clúster de MRS?
- Comprar rápidamente un clúster MRS
- Compra de un clúster personalizado
- Compra de un clúster de topología personalizado
- Adición de una etiqueta a un clúster
- Autorización de seguridad de comunicación
-
Configuración de reglas de escalado automático
- Descripción
- Configuración del escalado automático durante la creación de clústeres
- Creación de una política de escalado automático para un clúster existente
- Escenario 1: Uso exclusivo de reglas de escalamiento automático
- Escenario 2: Utilización exclusiva de planes de recursos
- Escenario 3: Uso de reglas de escalado automático y planes de recursos
- Modificación de una política de escalado automático
- Eliminación de una política de escalado automático
- Activar o desactivar una política de escalado automático
- Consulta de una política de escalado automático
- Configuración scripts de automatización
- Configuración de métricas de escalado automático
- Gestión de conexiones de datos
- Instalación de software de terceros mediante acciones de arranque
- Consulta de tareas de MRS fallidas
- Consulta de información de un clúster histórico
-
Gestión de clústeres
- Inicio de sesión en un clúster
- Descripción del clúster
- Consulta y personalización de métricas de monitoreo de clústeres
-
O&M de clúster
- Importación y exportación de datos
- Cambio de la subred de un clúster
- Configuración de la notificación de mensaje
- Comprobación del estado de salud
- O&M remoto
- Consulta de registros de operaciones de MRS
- Changing Billing Mode to Yearly/Monthly
- Cancelar la suscripción a un clúster
- Cancelar la suscripción de un nodo especificado en un clúster anual/mensual
- Terminación de un clúster
- Gestión de nodos
-
Gestión de trabajo
- Introducción a los trabajos de MRS
- Ejecución de un trabajo de MapReduce
- Ejecución de un trabajo de SparkSubmit o Spark
- Ejecución de un trabajo de HiveSQL
- Ejecución de un trabajo de SparkSql
- Ejecución de un trabajo de Flink
- Consulta de la configuración de trabajos y registros
- Detener un trabajo
- Eliminación de un trabajo
- Uso de datos de OBS cifrados para la ejecución de trabajos
- Configuración de reglas de notificación de trabajos
-
Gestión de componentes
- Gestión de objetos
- Ver configuraciones
- Gestión de servicios
- Configuración de parámetros de servicio
- Configuración de parámetros de servicio personalizados
- Sincronización de la configuración del servicio
- Gestión de instancias de rol
- Configuración de parámetros de instancia de rol
- Sincronización de configuración de instancia de rol
- Desmantelar y volver a poner en servicio una instancia de rol
- Inicio y detención de un clúster
- Sincronización de la configuración del clúster
- Exportación de configuración de clúster
- Realización de reinicio continuo
- Gestión de alarma
-
Gestión de parches
- Patch Operation Guide for MRS 3.1.5
- Parches rodantes
- Restauración de parches para los hosts aislados
-
Descripción de parche de MRS
- Corregida la vulnerabilidad de escalada de privilegios del usuario omm
- Descripción del parche MRS 2.1.0.11
- Descripción del parche MRS 3.0.5.1
- Descripción del parche MRS 2.1.0.10
- Descripción del parche MRS 2.1.0.9
- Descripción del parche MRS 2.1.0.8
- Descripción del parche de MRS 2.1.0.7
- Descripción de parche de MRS 2.1.0.6
- Descripción del parche de MRS 2.1.0.3
- Descripción del parche de MRS 2.1.0.2
- Descripción del parche de MRS 2.1.0.1
- Descripción del parche de MRS 2.0.6.1
- Descripción del parche MRS 2.0.1.3
- Descripción del parche de MRS 2.0.1.2
- Descripción del parche MRS 2.0.1.1
- Descripción del parche MRS 1.9.3.3
- Descripción del parche MRS 1.9.3.1
- MRS 1.9.2.2 Descripción del parche
- Descripción del parche MRS 1.9.0.8, 1.9.0.9 y 1.9.0.10
- Descripción del parche MRS 1.9.0.7
- Descripción del parche MRS 1.9.0.6
- Descripción del parche MRS 1.9.0.5
- Descripción del parche MRS 1.8.10.1
-
Gestión de tenant
- Antes de comenzar
- Descripción
- Creación de un tenant
- Creación de un subtenant
- Eliminación de un tenant
- Gestión de directorio de tenant
- Restauración de datos de tenant
- Creación de un grupo de recursos
- Modificación de un grupo de recursos
- Eliminación de un grupo de recursos
- Configuración de una cola
- Configuración de la política de capacidad de cola de un grupo de recursos
- Borrar la configuración de una cola
- Acciones de arranque
- Uso de un cliente de MRS
- Acceso a páginas web de componentes de código abierto gestionados en clústeres de MRS
- Acceder a Manager
-
Guía de operación del FusionInsight Manager (aplicable a 3.x)
- Página de inicio
-
Clúster
- Gestión de clúster
- Gestión de un servicio
- Gestión de instancias
- Hosts
- O&M
- Auditoría
- Recursos para tenant
- Sistema
- Gestión de clúster
- Gestión de registros
-
Copia de respaldo y gestión de recuperación
- Introducción
-
Copia de respaldo de datos
- Copia de respaldo de los datos del Manager
- Copia de respaldo de metadatos de ClickHouse
- Copia de respaldo de datos del servicio ClickHouse
- Copia de respaldo de datos de DBService
- Copia de respaldo de metadatos de Flink
- Copia de respaldo de metadatos de HBase
- Copia de respaldo de datos de servicio de HBase
- Copia de respaldo de los datos de NameNode
- Copia de respaldo de datos de servicio de HDFS
- Copia de respaldo de los datos del servicio Hive
- Copia de respaldo de metadatos de Kafka
-
Recuperación de datos
- Restauración de datos del Manager
- Restauración de metadatos de ClickHouse
- Restauración de datos de servicio de ClickHouse
- Restauración de datos de DBService
- Restauración de los metadatos de Flink
- Restauración de metadatos de HBase
- Restauración de datos de servicio de HBase
- Restauración de datos de NameNode
- Restauración de datos de servicio de HDFS
- Restauración de datos de servicio de Hive
- Restauración de metadatos de Kafka
- Habilitación de la replicación entre clústeres
- Gestión de tareas de restauración rápida locales
- Modificación de una tarea de copia de respaldo
- Consulta de tareas de copia de respaldo y restauración
- ¿Cómo configuro el entorno al crear una tarea de copia de respaldo de ClickHouse en el FusionInsight Manager y establecer el tipo de ruta en RemoteHDFS?
-
Gestión de la seguridad
- Descripción de seguridad
-
Gestión de cuentas
- Ajustes de seguridad de la cuenta
- Cambio de la contraseña de un usuario del sistema
-
Cambio de la contraseña de un usuario interno del sistema
- Cambio de la contraseña para el administrador de Kerberos
- Cambio de la contraseña para el administrador de OMS Kerberos
- Cambio de las contraseñas del administrador LDAP y del usuario LDAP (incluido OMS LDAP)
- Cambio de la contraseña del administrador LDAP
- Cambio de la contraseña de un usuario en ejecución de componentes
-
Cambiar la contraseña de un usuario de base de datos
- Cambio de la contraseña del administrador de la base de datos de OMS
- Cambio de la contraseña del usuario de acceso a datos de la base de datos de OMS
- Cambio de la contraseña de un usuario de base de datos de componentes
- Restablecimiento de la contraseña de usuario de la base de datos de componentes
- Cambio de la contraseña para usuario compdbuser de la base de datos de DBService
- Cambiar o restablecer la contraseña para el usuario admin de Manager
- Gestión de certificado
-
Mejoras de seguridad
- Políticas de endurecimiento
- Configuración de una dirección IP de confianza para acceder a LDAP
- Encriptación de HFile y WAL
- Configuración de parámetros de seguridad de Hadoop
- Configuración de una lista blanca de direcciones IP para la modificación permitida por HBase
- Actualización de una clave para un clúster
- Endurecimiento del LDAP
- Configuración del cifrado de datos de Kafka durante la transmisión
- Configuración del cifrado de datos de HDFS durante la transmisión
- Configuración del cifrado de datos de Spark2x durante la transmisión
- Configuración de ZooKeeper SSL
- Cifrado de la comunicación entre el Controller y el Agent
- Actualización de claves de SSH para el usuario omm
- Mantenimiento de seguridad
- Declaración de seguridad
-
Guía de operación de MRS Manager (Aplicable a versiones 2.x y anteriores)
- Introducción a MRS Manager
- Comprobación de tareas en ejecución
- Gestión de monitoreo
- Gestión de alarma
-
Referencia de alarma (aplicable a versiones anteriores a MRS 3.x)
- ALM-12001 Error de volcado del registro de auditoría (Para MRS 2.x o anterior)
- ALM-12002 Recurso de HA anormal (para MRS 2.x o anterior)
- ALM-12004 Recurso OLdap anormal (Para MRS 2.x o anterior)
- ALM-12005 Recursos de OKerberos anormales (Para MRS 2.x o anterior)
- ALM-12006 Falla de nodo (para MRS 2.x o anterior)
- ALM-12007 Falla de proceso (Para MRS 2.x o anterior)
- ALM-12010 Interrupción del latido del Manager entre los nodos activo y en espera (para MRS 2.x o anterior)
- ALM-12011 Excepción de sincronización de datos de entre los nodos activos y en espera de Manager (Para MRS 2.x o anterior)
- ALM-12012 NTP Servicio anormal (Para MRS 2.x o anterior)
- ALM-12014 Partición de dispositivo perdida (Para MRS 2.x o anterior)
- ALM-12015 Sistema de archivos de partición de dispositivo de solo lectura (para MRS 2.x o anterior)
- ALM-12016 El uso de CPU supera el umbral (Para MRS 2.x o anterior)
- ALM-12017 Capacidad de disco insuficiente (para MRS 2.x o anterior)
- ALM-12018 El uso de memoria supera el umbral (Para MRS 2.x o anterior)
- ALM-12027 El uso de PID de host supera el umbral (para MRS 2.x o anterior)
- ALM-12028 Número de procesos en el Estado D en el host supera el umbral (Para MRS 2.x o anterior)
- ALM-12031 Usuario omm o contraseña está a punto de caducar (Para MRS 2.x o anterior)
- ALM-12032 Usuario ommdba o contraseña está a punto de caducar (Para MRS 2.x o anterior)
- ALM-12033 Falla de disco lento (Para MRS 2.x o anterior)
- ALM-12034 Falla de copia de respaldo periódica (Para MRS 2.x o anterior)
- ALM-12035 Estado de datos desconocidos después de un error de tarea de recuperación (para MRS 2.x o anterior)
- ALM-12037 Servidor NTP anormal (Para MRS 2.x o anterior)
- ALM-12038 Falla de volcado de indicador de monitoreo (Para MRS 2.x o anterior)
- ALM-12039 Los datos de GaussDB no están sincronizados (Para MRS 2.x o anterior)
- ALM-12040 Entropía insuficiente del sistema (Para MRS 2.x o anterior)
- ALM-12041 El permiso de archivos clave es anormal (Para MRS 2.x o anterior)
- ALM-12042 Las configuraciones de archivo clave son anormales (Para MRS 2.x o anterior)
- ALM-12043 La duración de análisis de DNS supera el umbral (Para MRS 2.x o anterior)
- ALM-12045 La tasa de paquetes perdidos de lectura supera el umbral (Para MRS 2.x o anterior)
- ALM-12046 La tasa de paquetes de escritura perdidos supera el umbral (Para MRS 2.x o anterior)
- ALM-12047 La tasa de error de paquete de lectura supera el umbral (Para MRS 2.x o anterior)
- ALM-12048 La tasa de error de escritura de paquetes supera el umbral (Para MRS 2.x o anterior)
- ALM-12049 La tasa de rendimiento de lectura supera el umbral (Para MRS 2.x o anterior)
- ALM-12050 La tasa de rendimiento de escritura supera el umbral (Para MRS 2.x o anterior)
- ALM-12051 El uso del Inode de disco supera el umbral (Para MRS 2.x o anterior)
- ALM-12052 El uso de puertos de TCP temporales supera el umbral (Para MRS 2.x o anterior)
- ALM-12053 El uso del identificador de archivo supera el umbral (para MRS 2.x o anterior)
- ALM-12054 Archivo de certificado no válido (Para MRS 2.x o anterior)
- ALM-12055 El archivo de certificado está a punto de caducar (Para MRS 2.x o anterior)
- ALM-12180 Disk Card I/O (For MRS 2.x or Earlier)
- ALM-12357 Error al exportar registros de auditoría a OBS (Para MRS 2.x o anterior)
- ALM-13000 El servicio ZooKeeper no está disponible (Para MRS 2.x o anterior)
- ALM-13001 Las conexiones de ZooKeeper disponibles son insuficientes (Para MRS 2.x o anterior)
- ALM-13002 El uso de memoria de ZooKeeper supera el umbral (Para MRS 2.x o anterior)
- ALM-14000 Servicio HDFS no disponible (para MRS 2.x o anterior)
- ALM-14001 El uso del disco de HDFS supera el umbral (Para MRS 2.x o anterior)
- ALM-14002 El uso del disco de DataNode supera el umbral (Para MRS 2.x o anterior)
- ALM-14003 El número de bloques HDFS perdidos supera el umbral (Para MRS 2.x o anterior)
- ALM-14004 El número de bloques HDFS dañados supera el umbral (Para MRS 2.x o anterior)
- ALM-14006 El número de archivos de HDFS supera el umbral (Para MRS 2.x o anterior)
- ALM-14007 El uso de memoria de NameNode de HDFS supera el umbral (Para MRS 2.x o anterior)
- ALM-14008 El uso de memoria de HDFS DataNode supera el umbral (para MRS 2.x o anterior)
- ALM-14009 El número de DataNodes defectuoso supera el umbral (para MRS 2.x o anterior)
- ALM-14010 NameService es anormal (Para MRS 2.x o anterior)
- ALM-14011 El directorio de datos de HDFS DataNode no está configurado correctamente (Para MRS 2.x o anterior)
- ALM-14012 Los datos de Journalnode de HDFS no están sincronizados (Para MRS 2.x o anterior)
- ALM-16000 Porcentaje de sesiones conectadas al HiveServer al número máximo permitido supera el umbral (Para MRS 2.x o anterior)
- ALM-16001 El uso del espacio de almacén de Hive supera el umbral (Para MRS 2.x o anterior)
- ALM-16002 La tasa de éxito de ejecución de Hive SQL es inferior al umbral (Para MRS 2.x o anterior)
- ALM-16004 El servicio Hive no está disponible (Para MRS 2.x o anterior)
- ALM-16005 Número de ejecuciones de Hive SQL fallidas en el último período supera el umbral (para MRS 2.x o anterior)
- ALM-18000 Servicio de Yarn no disponible (Para MRS 2.x o anterior)
- ALM-18002 Pérdida de latido de NodeManager (Para MRS 2.x o anterior)
- ALM-18003 NodeManager de mal funcionamiento (para MRS 2.x o anterior)
- ALM-18004 La relación de usabilidad del disco NodeManager es inferior al umbral (para MRS 2.x o anterior)
- ALM-18006 Tiempo de espera de ejecución de trabajos de MapReduce (Para MRS 2.x o anterior)
- ALM-18008 Uso de memoria de Heap de Yarn ResourceManager supera el umbral (Para MRS 2.x o anterior)
- ALM-18009 El uso de memoria heap de MapReduce JobHistoryServer supera el umbral (Para MRS 2.x o anterior)
- ALM-18010 Número de tareas pendientes de Yarn excede el umbral (Para MRS 2.x o anterior)
- ALM-18011 Memoria de tareas pendientes de Yarn excede el umbral (Para MRS 2.x o anterior)
- ALM-18012 El número de tareas de Yarn terminadas en el último período supera el umbral (Para MRS 2.x o anterior)
- ALM-18013 El número de tareas de Yarn fallidas en el último período supera el umbral (Para MRS 2.x o anterior)
- ALM-19000 Servicio HBase no disponible (para MRS 2.x o anterior)
- ALM-19006 Error de sincronización de replicación de HBase (Para MRS 2.x o anterior)
- ALM-19007 HBase Merge Queue Exceeds the Threshold (for 2.x and Earlier Versions)
- ALM-20002 Servicio Hue no disponible (para MRS 2.x o anterior)
- ALM-23001 Servicio Loader no disponible (Para MRS 2.x o anterior)
- ALM-24000 Servicio de Flume no disponible (Para MRS 2.x o anterior)
- ALM-24001 El agente de Flume es anormal (Para MRS 2.x o anterior)
- ALM-24003 Conexión de Flume Client interrumpida (Para MRS 2.x o anterior)
- ALM-24004 Flume no puede leer datos (para MRS 2.x o anterior)
- ALM-24005 La transmisión de datos por Flume es anormal (Para MRS 2.x o anterior)
- ALM-25000 El servicio LdapServer no está disponible (Para MRS 2.x o anterior)
- ALM-25004 Sincronización anormal de datos de LdapServer (Para MRS 2.x o anterior)
- ALM-25500 El servicio KrbServer no está disponible (Para MRS 2.x o anterior)
- ALM-26051 Servicio de Storm no disponible (Para MRS 2.x o anterior)
- ALM-26052 El número de supervisores disponibles en Storm es inferior al umbral (Para MRS 2.x o anterior)
- ALM-26053 El uso de la ranura de la Storm supera el umbral (Para MRS 2.x o anterior)
- ALM-26054 El uso de memoria heap de Storm Nimbus supera el umbral (Para MRS 2.x o anterior)
- ALM-27001 DBService no disponible (Para MRS 2.x o anterior)
- ALM-27003 Interrupción del latido del corazón entre los nodos activo y en espera de DBService (Para MRS 2.x o anterior)
- ALM-27004 Incoherencia de datos entre DBServices activos y en espera (Para MRS 2.x o anterior)
- ALM-28001 Servicio de Spark no disponible (Para MRS 2.x o anterior)
- ALM-38000 Servicio Kafka no disponible (Para MRS 2.x o anterior)
- ALM-38001 Capacidad de disco de Kafka insuficiente (para MRS 2.x o anterior)
- ALM-38002 El uso de memoria heap de Kafka supera el umbral (para MRS 2.x o anterior)
- ALM-43001 Servicio Spark no disponible (Para MRS 2.x o anterior)
- ALM-43006 El uso de memoria de Heap del proceso de JobHistory supera el umbral (Para MRS 2.x o anterior)
- ALM-43007 El uso de memoria no heap del proceso de JobHistory supera el umbral (Para MRS 2.x o anterior)
- ALM-43008 El uso de memoria directa del proceso JobHistory supera el umbral (Para MRS 2.x o anterior)
- ALM-43009 El tiempo de GC de JobHistory supera el umbral (Para MRS 2.x o anterior)
- ALM-43010 El uso de memoria heap del proceso de JDBCServer supera el umbral (Para MRS 2.x o anterior)
- ALM-43011 El uso de memoria no heap del proceso de JDBCServer supera el umbral (Para MRS 2.x o anterior)
- ALM-43012 El uso de memoria directa del proceso de JDBCServer supera el umbral (Para MRS 2.x o anterior)
- ALM-43013 Tiempo de JDBCServer GC excede el umbral (Para MRS 2.x o anterior)
- ALM-44004 Las tareas de cola de grupo de recursos de coordinador de Presto superan el umbral (Para MRS 2.x o anterior)
- ALM-44005 El tiempo de GC del proceso del Presto Coordinator supera el umbral (Para MRS 2.x o anterior)
- ALM-44006 El tiempo de GC de proceso de Presto Worker excede el umbral (Para MRS 2.x o anterior)
- ALM-45325 Servicio Presto no disponible (Para MRS 2.x o anterior)
-
Gestión de objeto
- Gestionar objetos
- Consulta de configuraciones
- Gestión de servicios
- Configuración de parámetros de servicio
- Configuración de parámetros de servicio personalizados
- Sincronización de configuraciones de servicio
- Gestión de instancias de rol
- Configuración de parámetros de instancia de rol
- Sincronización de configuración de instancia de rol
- Desmantelar y volver a poner en servicio una instancia de rol
- Gestión de un host
- Aislamiento de un host
- Cancelación del aislamiento del host
- Inicio o detención de un clúster
- Sincronización de configuraciones de clúster
- Exportación de datos de configuración de un clúster
- Gestión de registros
-
Gestión de comprobación de estado
- Realización de una comprobación de estado
- Consulta y exportación de un informe de comprobación de estado
- Configuración del número de informes de comprobación de estado que se van a reservar
- Gestión de informes de comprobación de estado
- Indicadores de comprobación de estado de DBService
- Indicadores de comprobación de estado de Flume
- Indicadores de comprobación de estado de HBase
- Indicadores de comprobación de estado del host
- Indicadores de comprobación de estado de HDFS
- Indicadores de comprobación de salud de Hive
- Indicadores de comprobación de salud de Kafka
- Indicadores de comprobación de estado de KrbServer
- Indicadores de comprobación de estado de LdapServer
- Indicadores de comprobación de estado del Loader
- Indicadores de comprobación de estado de MapReduce
- Indicadores de comprobación de estado de OMS
- Indicadores de comprobación de estado de Spark
- Indicadores de comprobación de estado de Storm
- Indicadores de comprobación de la salud de Yarn
- Indicadores de comprobación de estado de ZooKeeper
- Gestión de grupo de servicio estático
-
Gestión de tenants
- Descripción
- Creación de un tenant
- Creación de un subtenant
- Eliminación de un tenant
- Gestión de directorio de tenant
- Restauración de datos de tenant
- Creación de un grupo de recursos
- Modificación de un grupo de recursos
- Eliminación de un grupo de recursos
- Configuración de una cola
- Configuración de la política de capacidad de cola de un grupo de recursos
- Borrar la configuración de una cola
- Copia de respaldo y restauración
-
Gestión de seguridad
- Usuarios predeterminados de clústeres con autenticación de Kerberos deshabilitada
- Usuarios predeterminados de clústeres con autenticación de Kerberos habilitada
- Cambio de la contraseña de un usuario de sistema operativo
- Cambiar la contraseña del usuario admin
- Cambio de la contraseña del administrador de Kerberos
- Cambio de las contraseñas del administrador LDAP y del usuario LDAP
- Cambio de la contraseña de un usuario en ejecución de componentes
- Cambio de la contraseña del administrador de la base de datos de OMS
- Cambio de la contraseña del usuario de acceso a datos de la base de datos de OMS
- Cambio de la contraseña de un usuario de base de datos de componentes
- Sustitución del certificado de HA
- Actualización de claves de clúster
-
Gestión de permisos
- Creación de un rol
- Creación de un grupo de usuarios
- Creación de un usuario
- Modificación de la información de usuario
- Bloqueo de un usuario
- Desbloquear un usuario
- Eliminación de un usuario
- Cambio de la contraseña de un usuario de operación
- Inicialización de la contraseña de un usuario del sistema
- Descargar un archivo de autenticación de usuario
- Modificación de una política de contraseñas
-
Gestión de permisos multiusuario de MRS
- Usuarios y permisos de clústeres de MRS
- Usuarios predeterminados de clústeres con autenticación de Kerberos habilitada
- Creación de un rol
- Creación de un grupo de usuarios
- Creación de un usuario
- Modificación de la información de usuario
- Bloqueo de un usuario
- Desbloquear un usuario
- Eliminación de usuarios
- Cambio de la contraseña de un usuario de operación
- Inicialización de la contraseña de un usuario del sistema
- Descargar un archivo de autenticación de usuario
- Modificación de una política de contraseñas
- Configuración de relaciones de confianza mutua entre clústeres
- Configuración de usuarios para acceder a los recursos de un clúster de confianza
- Guía de operación de parches
- Restauración de parches para los hosts aislados
- Reinicio rodante
-
Referencia de alarma (aplicable a MRS 3.x)
- ALM-12001 Error de volcado del registro de auditoría
- ALM-12004 Recurso OLdap anormal
- ALM-12005 OKerberos Resource Anormal
- ALM-12006 Falla de nodo
- ALM-12007 Falla de proceso
- ALM-12010 Interrupción del latido del corazón de Manager entre los nodos activo y en espera
- ALM-12011 Excepción de sincronización de datos de Manager entre los nodos activo y en espera
- ALM-12012 El servicio NTP es anormal
- ALM-12014 Partición perdida
- ALM-12015 Sistema de archivos de partición de sólo lectura
- ALM-12016 El uso de la CPU supera el umbral
- ALM-12017 Capacidad de disco insuficiente
- ALM-12018 El uso de memoria supera el umbral
- ALM-12027 El uso de PID de host supera el umbral
- ALM-12028 Número de procesos en el Estado D en un host supera el umbral
- ALM-12033 Falla de disco lento
- ALM-12034 Error de copia de respaldo periódica
- ALM-12035 Estado de datos desconocido después de un error de tarea de recuperación
- ALM-12037 Servidor NTP anormal
- ALM-12038 Error de volcado de indicador de monitoreo
- ALM-12039 Bases de datos de OMS activas/en espera no sincronizadas
- ALM-12040 Entropía del sistema insuficiente
- ALM-12041 Permiso incorrecto en archivos clave
- ALM-12042 Configuración incorrecta de archivos clave
- ALM-12045 La tasa de pérdida de paquetes de lectura supera el umbral
- ALM-12046 La tasa de pérdidas de paquetes de escritura supera el umbral
- ALM-12047 La tasa de error de paquete de lectura supera el umbral
- ALM-12048 La tasa de errores de escritura de paquetes supera el umbral
- ALM-12049 La tasa de rendimiento de lectura de red supera el umbral
- ALM-12050 La tasa de rendimiento de escritura en red supera el umbral
- ALM-12051 El uso de Inode de disco supera el umbral
- ALM-12052 El uso de puerto temporal de TCP supera el umbral
- ALM-12053 El uso del handle de archivos del host supera el umbral
- ALM-12054 Archivo de certificado no válido
- ALM-12055 El archivo de certificado está a punto de caducar
- ALM-12057 Metadatos no configurados con la tarea de realizar una copia de respaldo periódica de datos en un servidor de terceros
- ALM-12061 El uso del proceso supera el umbral
- ALM-12062 Las configuraciones del parámetro OMS no coinciden con la escala del clúster
- ALM-12063 Disco no disponible
- ALM-12064 Conflictos de rango de puertos aleatorios del host con el puerto utilizado del clúster
- ALM-12066 Las relaciones de confianza entre nodos se vuelven inválidas
- ALM-12067 Tomcat Resource es anormal
- ALM-12068 Excepción de recursos de ACS
- ALM-12069 Excepción de recursos de AOS
- ALM-12070 El recurso del controller es anormal
- ALM-12071 El recurso Httpd es anormal
- ALM-12072 El recurso FloatIP es anormal
- ALM-12073 El recurso de CEP es anormal
- ALM-12074 El recurso de FMS es anormal
- ALM-12075 El recurso de PMS es anormal
- ALM-12076 El recurso GaussDB es anormal
- ALM-12077 Usuario omm caducado
- ALM-12078 Contraseña del usuario omm caducado
- ALM-12079 El usuario omm está a punto de caducar
- ALM-12080 La contraseña del usuario omm está a punto de caducar
- ALM-12081Usuario ommdba caducado
- ALM-12082 El usuario ommdba está a punto de caducar
- ALM-12083 La contraseña del usuario ommdba está a punto de caducar
- ALM-12084 Contraseña del usuario ommdba caducada
- ALM-12085 Error de volcado del registro de auditoría de servicio
- ALM-12087 El sistema está en el período de observación de actualización
- ALM-12089 La red entre nodos es anormal
- ALM-12101 AZ de mal funcionamiento
- ALM-12102 El componente AZ HA no se despliega según los requisitos de DR.
- ALM-12103 Excepción de recursos del executor
- ALM-12104 Recursos Knox anormales
- ALM-12110 Error al obtener ECS AK/SK temporal
- ALM-12172 Error al notificar métricas a Cloud Eye
- ALM-12180 E/S de disco suspendido
- ALM-12190 Número de conexiones Knox supera el umbral
- ALM-13000 Servicio ZooKeeper no disponible
- ALM-13001 Las conexiones de ZooKeeper disponibles son insuficientes
- ALM-13002 El uso de memoria directa de ZooKeeper supera el umbral
- ALM-13003 GC La duración del proceso ZooKeeper supera el umbral
- ALM-13004 El uso de memoria heap de ZooKeeper supera el umbral
- ALM-13005 No se pudo establecer la cuota de los principales directorios de los componentes de ZooKeeper
- ALM-13006 El número o la capacidad de Znode supera el umbral
- ALM-13007 Las conexiones de cliente de ZooKeeper disponibles son insuficientes
- ALM-13008 El uso de ZooKeeper Znode supera el umbral
- ALM-13009 El uso de la capacidad de Znode de ZooKeeper supera el umbral
- ALM-13010 El uso de Znode de un directorio con cuota configurada supera el umbral
- ALM-14000 Servicio HDFS no disponible
- ALM-14001 El uso del disco HDFS supera el umbral
- ALM-14002 El uso del disco de DataNode supera el umbral
- ALM-14003 El número de bloques HDFS perdidos supera el umbral
- ALM-14006 Número de archivos HDFS supera el umbral
- ALM-14007 El uso de memoria heap de NameNode supera el umbral
- ALM-14008 El uso de memoria heap de DataNode supera el umbral
- ALM-14009 Número de Dead DataNodes supera el umbral
- ALM-14010 El servicio NameService es anormal
- ALM-14011 El directorio de datos de DataNode no está configurado correctamente
- ALM-14012 El JournalNode no está sincronizado
- ALM-14013 Error al actualizar el archivo NameNode FsImage
- ALM-14014 El tiempo de GC de NameNode supera el umbral
- ALM-14015 El tiempo de GC de DataNode supera el umbral
- ALM-14016 El uso de memoria directa de DataNode supera el umbral
- ALM-14017 El uso de memoria directa NameNode supera el umbral
- ALM-14018 El uso de memoria no heap de NameNode supera el umbral
- ALM-14019 El uso de memoria no heap de DataNode supera el umbral
- ALM-14020 Número de entradas en el directorio de HDFS supera el umbral
- ALM-14021 El tiempo promedio de procesamiento de RPC de NameNode supera el umbral
- ALM-14022 El tiempo medio de cola de RPC de NameNode supera el umbral
- ALM-14023 El porcentaje del espacio total en disco reservado para réplicas supera el umbral
- ALM-14024 El uso del espacio del tenant supera el umbral
- ALM-14025 El uso de objetos de archivo de tenant supera el umbral
- ALM-14026 Bloques en el DataNode superan el umbral
- ALM-14027 Falla de disco de DataNode
- ALM-14028 El número de bloques a complementar supera el umbral
- ALM-14029 Número de bloques en una réplica supera el umbral
- ALM-14030 HDFS permite la escritura de datos de una sola réplica
- ALM-16000 Porcentaje de sesiones conectadas al HiveServer al número máximo permitido supera el umbral
- ALM-16001 El uso del espacio en el almacén de Hive supera el umbral
- ALM-16002 Hive La Tasa de éxito de ejecución SQL es inferior al umbral
- ALM-16003 El uso de subprocesos en segundo plano supera el umbral
- ALM-16004 Servicio Hive no disponible
- ALM-16005 El uso de memoria heap del proceso Hive supera el umbral
- ALM-16006 El uso de la memoria directa del proceso Hive supera el umbral
- ALM-16007 El tiempo de Hive GC supera el umbral
- ALM-16008 El uso de memoria no heap del proceso Hive supera el umbral
- ALM-16009 El número de Map supera el umbral
- ALM-16045 Se elimina el almacén de datos de Hive
- ALM-16046 Se modifica el permiso de almacén de datos de Hive
- ALM-16047 HiveServer se ha dado de baja de ZooKeeper
- ALM-16048 Ruta de biblioteca de Tez o Spark no existe
- ALM-17003 Servicio Oozie no disponible
- ALM-17004 El uso de memoria heap de Oozie supera el umbral
- ALM-17005 El uso de memoria no heap de Oozie supera el umbral
- ALM-17006 El uso de memoria directa de Oozie supera el umbral
- ALM-17007 El tiempo de recolección de basura (GC) del proceso Oozie supera el umbral
- ALM-18000 Servicio de Yarn no disponible
- ALM-18002 Latidos del corazón de NodeManager perdidos
- ALM-18003 NodeManager en mal estado
- ALM-18008 El uso de memoria heap de ResourceManager supera el umbral
- ALM-18009 El uso de memoria heap de JobHistoryServer supera el umbral
- ALM-18010 El tiempo de GC de ResourceManager supera el umbral
- ALM-18011 El tiempo de GC de NodeManager supera el umbral
- ALM-18012 El tiempo de GC de JobHistoryServer supera el umbral
- ALM-18013 El uso de memoria directa de ResourceManager supera el umbral
- ALM-18014 El uso de memoria directa de NodeManager supera el umbral
- ALM-18015 El uso de memoria directa de JobHistoryServer supera el umbral
- ALM-18016 El uso de memoria no heap de ResourceManager supera el umbral
- ALM-18017 El uso de memoria no heap de NodeManager supera el umbral
- ALM-18018 El uso de memoria heap de NodeManager supera el umbral
- ALM-18019 El uso de memoria no heap de JobHistoryServer supera el umbral
- ALM-18020 Tiempo de espera de ejecución de tareas de Yarn
- ALM-18021 El servicio Mapreduce no está disponible
- ALM-18022 Recursos de cola de Yarn insuficientes
- ALM-18023 El número de tareas pendientes de Yarn supera el umbral
- ALM-18024 El uso de memoria de Yarn pendiente supera el umbral
- ALM-18025 El número de tareas de Yarn terminadas supera el umbral
- ALM-18026 El número de tareas de Yarn fallidas supera el umbral
- ALM-19000 Servicio HBase no disponible
- ALM-19006 Error de sincronización de replicación de HBase
- ALM-19007 El tiempo de HBase GC supera el umbral
- ALM-19008 El uso de memoria heap del proceso HBase supera el umbral
- ALM-19009 El uso de memoria directa del proceso HBase supera el umbral
- ALM-19011 El número de región de RegionServer supera el umbral
- ALM-19012 Directorio de tabla de sistema de HBase o archivo perdido
- ALM-19013 La duración de las regiones en estado de transacción supera el umbral
- ALM-19014 El uso de la cuota de capacidad en el ZooKeeper supera severamente el umbral
- ALM-19015 El uso de cuotas de cantidad en el ZooKeeper supera el umbral
- ALM-19016 El uso de cuotas de cantidad en ZooKeeper supera severamente el umbral
- ALM-19017 El uso de la cuota de capacidad en el ZooKeeper supera el umbral
- ALM-19018 El tamaño de la cola de compactación de HBase supera el umbral
- ALM-19019 El número de HBase HFiles que se van a sincronizar supera el umbral
- ALM-19020 El número de archivos HBase WAL a sincronizar supera el umbral
- ALM-19021 El uso de RegionServer handler supera el umbral
- ALM-20002 Servicio de Hue no disponible
- ALM-23001 Servicio de Loader no disponible
- ALM-23003 Error de ejecución de tareas del Loader
- ALM-23004 El uso de memoria heap del Loader supera el umbral
- ALM-23005 El uso de memoria de no heap de Loader supera el umbral
- ALM-23006 El uso de memoria directa del Loader supera el umbral
- ALM-23007 El tiempo de recolección de basura (GC) del proceso del Loader supera el umbral
- ALM-24000 Servicio de Flume no disponible
- ALM-24001 Excepción de Flume Agent
- ALM-24003 Conexión de Flume client interrumpida
- ALM-24004 Se produce una excepción cuando Flume lee datos
- ALM-24005 Se produce una excepción cuando Flume transmite datos
- ALM-24006 El uso de memoria heap de Flume Server supera el umbral
- ALM-24007 El uso de memoria directa del servidor Flume supera el umbral
- ALM-24008 El uso de memoria no heap del Flume Server supera el umbral
- ALM-24009 El tiempo de recolección de basura (GC) del Flume Server supera el umbral
- ALM-25000 Servicio LdapServer no disponible
- ALM-25004 Sincronización anormal de datos de LdapServer
- ALM-25005 Excepción de servicio nscd
- ALM-25006 Excepción de servicio Sssd
- ALM-25500 Servicio KrbServer no disponible
- ALM-26051 Servicio de Storm no disponible
- ALM-26052 El número de Supervisor disponible del servicio de Storm es menor que el umbral
- ALM-26053 El uso de Storm Slot supera el umbral
- ALM-26054 El uso de memoria heap de Nimbus supera el umbral
- ALM-27001 DBService no disponible
- ALM-27003 La interrupción del latido del corazón entre los nodos activo y en espera de DBService
- ALM-27004 Incoherencia de datos entre DBServices activos y en espera
- ALM-27005 El uso de conexiones de base de datos supera el umbral
- ALM-27006 El uso de espacio en disco del directorio de datos supera el umbral
- ALM-27007 La base de datos entra en el modo de solo lectura
- ALM-29000 Servicio Impala no disponible
- ALM-29004 El uso de memoria de proceso Impalad supera el umbral
- ALM-29005 Número de conexiones de Impalad JDBC supera el umbral
- ALM-29006 Número de conexiones de Impalad ODBC supera el umbral
- ALM-29100 Servicio Kudu no disponible
- ALM-29104 El uso de la memoria de proceso Tserver supera el umbral
- ALM-29106 El uso de la CPU del proceso Tserver supera el umbral
- ALM-29107 El uso de la memoria de proceso de Tserver supera el umbral
- ALM-38000 Servicio Kafka no disponible
- ALM-38001 Capacidad de disco Kafka insuficiente
- ALM-38002 El uso de memoria heap de Kafka supera el umbral
- ALM-38004 El uso de memoria directa de Kafka supera el umbral
- ALM-38005 La duración de GC del proceso de Broker supera el umbral
- ALM-38006 El porcentaje de Partition de Kafka que no están completamente sincronizadas supera el umbral
- ALM-38007 El estado del usuario predeterminado de Kafka es anormal
- ALM-38008 Estado anormal del directorio de datos de Kafka
- ALM-38009 E/S ocupado de disco de Broker (Aplicable a versiones posteriores a MRS 3.1.0)
- ALM-38009 Sobrecarga de Kafka Topic (aplicable a MRS 3.1.0 y versiones anteriores)
- ALM-38010 Topics con réplica única
- ALM-38011 El uso de conexión de usuario en el Broker supera el umbral
- ALM-43001 Servicio Spark2x no disponible
- ALM-43006 El uso de memoria heap del proceso JobHistory2x supera el umbral
- ALM-43007 El uso de memoria no heap del proceso JobHistory2x supera el umbral
- ALM-43008 El uso de memoria directa del proceso de JobHistory2x supera el umbral
- ALM-43009 Tiempo de GC de proceso de JobHistory2x excede el umbral
- ALM-43010 El uso de memoria heap del proceso JDBCServer2x supera el umbral
- ALM-43011 El uso de memoria no heap del proceso de JDBCServer2x supera el umbral
- ALM-43012 El uso de memoria heap directa del proceso de JDBCServer2x supera el umbral
- ALM-43013 El tiempo de GC de proceso de JDBCServer2x supera el umbral
- ALM-43017 El número de Full GC del proceso JDBCServer2x supera el umbral
- ALM-43018 Número de Full GC de proceso de JobHistory2x supera el umbral
- ALM-43019 El uso de memoria heap del proceso de IndexServer2x supera el umbral
- ALM-43020 El uso de memoria no heap del proceso IndexServer2x supera el umbral
- ALM-43021 El uso de memoria directa del proceso IndexServer2x supera el umbral
- ALM-43022 El tiempo de GC de proceso de IndexServer2x supera el umbral
- ALM-43023 El número de Full GC del proceso IndexServer2x supera el umbral
- ALM-44000 Servicio Presto no disponible
- ALM-44004 Las tareas en cola del grupo de recursos de Presto Coordinator superan el umbral
- ALM-44005 El tiempo de GC de proceso Presto Coordinator excede el umbral
- ALM-44006 El tiempo de GC de proceso Presto Worker supera el umbral
- ALM-45000 Servicio HetuEngine no disponible
- ALM-45001 Instancias de cómputo de HetuEngine defectuoso
- ALM-45175 El tiempo promedio para invocar a las API de metadatos de OBS es mayor que el umbral
- ALM-45176 La tasa de éxito de las invocaciones a las API de metadatos de OBS es inferior al umbral
- ALM-45177 La tasa de éxito de las invocaciones a las API de lectura de datos de OBS es inferior al umbral
- ALM-45178 La tasa de éxito de las invocaciones a las API de escritura de datos de OBS es menor que el umbral
- ALM-45179 Número de invocaciones a la API de OBS readFully supera el umbral
- ALM-45180 Número de invocaciones a la API de OBS read fallidas supera el umbral
- ALM-45181 El número de invocaciones a la API de OBS write fallidas supera el umbral
- ALM-45182 El número de operaciones de OBS limitadas supera el umbral
- ALM-45275 Servicio Ranger no disponible
- ALM-45276 Estado anormal de RangerAdmin
- ALM-45277 El uso de memoria heap de RangerAdmin supera el umbral
- ALM-45278 El uso de memoria directa de RangerAdmin supera el umbral
- ALM-45279 El uso de memoria no heap de RangerAdmin supera el umbral
- ALM-45280 La duración de GC de RangerAdmin supera el umbral
- ALM-45281 El uso de memoria heap de UserSync supera el umbral
- ALM-45282 El uso de memoria directa de UserSync supera el umbral
- ALM-45283 El uso de memoria no heap de UserSync supera el umbral
- ALM-45284 El tiempo de recolección de basura (GC) de UserSync supera el umbral
- ALM-45285 El uso de memoria heap de TagSync supera el umbral
- ALM-45286 El uso de memoria directa de TagSync supera el umbral
- ALM-45287 El uso de memoria no heap de TagSync supera el umbral
- ALM-45288 El tiempo de recolección de basura (GC) de TagSync supera el umbral
- ALM-45425 Servicio ClickHouse no disponible
- ALM-45426 El uso de la cuota de cantidad del servicio ClickHouse en ZooKeeper supera el umbral
- ALM-45427 El uso de la cuota de capacidad del servicio ClickHouse en ZooKeeper supera el umbral
- ALM-45428 Excepción de E/S de disco de ClickHouse
- ALM-45429 Error de sincronización de metadatos de tabla en el nodo ClickHouse añadido
- ALM-45430 Error de sincronización de metadatos de permisos en el nodo ClickHouse agregado
- ALM-45431 Distribución inadecuada de instancias ClickHouse para la asignación de topologías
- ALM-45432 Falla el proceso de sincronización de usuario de ClickHouse
- ALM-45433 Excepción de topología de ClickHouse AZ
- ALM-45434 Existe una única réplica en la tabla de datos de ClickHouse
- ALM-45585 Servicio IoTDB no disponible
- ALM-45586 El uso de memoria de heap de IoTDBServer supera el umbral
- ALM-45587 La duración de GC de IoTDBServer supera el umbral
- ALM-45588 El uso de memoria directa de IoTDBServer supera el umbral
- ALM-45589 El uso de memoria heap de ConfigNode supera el umbral
- ALM-45590 La duración de GC de ConfigNode supera el umbral
- ALM-45591 El uso de memoria directa de ConfigNode supera el umbral
- ALM-45592 La duración de ejecución de IoTDBServer RPC supera el umbral
- ALM-45593 La duración de ejecución de descarga de IoTDBServer supera el umbral
- ALM-45594 La duración de la fusión intraespacial de IoTDBServer supera el umbral
- ALM-45595 La duración de la fusión entre espacios de IoTDBServer supera el umbral
- ALM-45615 Servicio CDL no disponible
- ALM-45616 Excepción de ejecución de trabajo de CDL
- ALM-45617 Los datos en cola en la ranura de replicación CDL superan el umbral
- ALM-45635 Error de ejecución de trabajos de FlinkServer
- ALM-45636 Checkpoints de trabajo de FlinkServer siguen fallando
- ALM-45637 Task de FlinkServer está continuamente bajo presión de retorno
- ALM-45638 El número de reinicios tras fallas de trabajo de FlinkServer supera el umbral
- ALM-45639 Checkpointing of a Flink Job Times Out
- ALM-45640 Interrupción de latidos de FlinkServer entre los nodos activos y en espera
- ALM-45641 Excepción de sincronización de datos entre los nodos FlinkServer activo y en espera
- ALM-45736 Servicio Guardian no disponible
- Descripción de seguridad
- Interconectar Jupyter Notebook con MRS usando Python personalizado
- Apéndice
-
Referencia de la API
- Antes de comenzar
- Descripción de la API
- Invocaciones a las API
- Casos de aplicación
- API V2
-
API V1.1
- Las API de gestión de clústeres
- Las API de escalado automático
-
Las API de gestión de etiquetas
- Adición de etiquetas a un clúster especificado
- Consulta de etiquetas de un clúster especificado
- Eliminación de etiquetas de un clúster especificado
- Adición de etiquetas a un clúster en lotes
- Eliminación de etiquetas de un clúster en lotes
- Consulta de todas las etiquetas
- Consulta de una lista de clústeres con etiquetas especificadas
- Las API obsoletas
- Políticas de permisos y acciones admitidas
- Apéndice
-
Pasos iniciales
- Compra y uso de un clúster MRS
- Instalación y uso del cliente de clúster
- Uso de clústeres con autenticación Kerberos habilitada
- Uso de Hadoop desde el principio
- Uso de Kafka desde principio
- Uso de HBase desde principio
- Modificación de configuraciones de MRS
- Configuración del escalado automático para un clúster MRS
- Configuración de Hive con almacenamiento y cómputo desacoplado
- Envío de tareas de Spark a nuevos nodos de Task
- Configuración de Umbrales para Alarmas
-
Desarrollo de aplicaciones de componentes MRS
- Desarrollo de aplicaciones de HBase
- Desarrollo de aplicaciones de HDFS
- Desarrollo de aplicaciones de Hive JDBC
- Desarrollo de aplicaciones de Hive HCatalog
- Desarrollo de aplicaciones de Kafka
- Desarrollo de aplicaciones de Flink
- Desarrollo de aplicaciones de ClickHouse
- Desarrollo de aplicaciones de Spark
- Prácticas
-
Preguntas frecuentes
-
Descripción de MRS
- ¿Para qué se utiliza el MRS?
- ¿Qué tipos de almacenamiento distribuido admite MRS?
- ¿Cómo creo un clúster MRS mediante un grupo de seguridad personalizado?
- ¿Cómo uso MRS?
- Región y AZ
- ¿Puedo configurar un grupo de conexiones Phoenix?
- ¿Apoya MRS el cambio del segmento de la red?
- ¿Puedo degradar las especificaciones de un nodo de clúster MRS?
- ¿Cuál es la relación entre Hive y otros componentes?
- ¿Un clúster de MRS soporta Hive en Spark?
- ¿Cuáles son las diferencias entre las versiones de Hive?
- ¿Qué versión de clúster de MRS admite la conexión Hive y la sincronización de usuarios?
- ¿Cuáles son las diferencias entre OBS y HDFS en el almacenamiento de datos?
- ¿Cómo obtengo la herramienta de prueba de presión de Hadoop?
- ¿Cuál es la relación entre Impala y otros componentes?
- Declaración sobre las direcciones IP públicas en el SDK de terceros de código abierto integrado por MRS
- ¿Cuál es la relación entre Kudu y HBase?
- ¿Admite MRS la ejecución de Hive en Kudu?
- ¿Cuáles son las soluciones para procesar mil millones de registros de datos?
- ¿Puedo cambiar la dirección IP de DBService?
- ¿Puedo borrar MRS sudo Logs?
- ¿Cuáles son las restricciones en el tamaño del registro de Storm en un clúster MRS 2.1.0?
- ¿Qué es Spark ThriftServer?
- ¿Qué protocolos de acceso admite Kafka?
- Se notifica el error 408 cuando un nodo MRS accede a OBS (Palabra clave, no enviada a la nube operada conjuntamente)
- ¿Cuáles son las ventajas de la relación de compresión de zstd?
- ¿Por qué los componentes HDFS, YARN y MapReduce no están disponibles cuando se compra un clúster de MRS?
- ¿Por qué no está disponible el componente ZooKeeper cuando se compra un clúster MRS?
- ¿Qué versiones de Python son compatibles con las tareas de Spark en clústeres MRS 3.1.0?
- ¿Cómo puedo habilitar diferentes programas de servicio para usar diferentes colas de YARN?
- Diferencias y relaciones entre la consola de gestión de MRS y el Manager de clústeres
- ¿Cómo desvinculo una EIP del FusionInsight Manager de un clúster MRS?
- ¿Cuáles son los sistemas operativos de hosts en clústeres MRS de diferentes versiones?
-
Facturación
- ¿Cómo se factura MRS?
- ¿Por qué no se muestra el precio durante la creación del clúster MRS?
- ¿Cómo se factura el escalado automático de un clúster MRS?
- ¿Cómo se renueva MRS?
- ¿Cómo se factura el nodo de tareas en un clúster MRS?
- ¿Por qué falla mi cancelación de la suscripción a ECS después de cancelar mi suscripción a MRS?
- Cuenta y contraseña
-
Cuenta y permiso
- ¿Es compatible un clúster MRS con el control de permisos de acceso si la autenticación Kerberos no está activada?
- ¿Cómo asigno permiso de gestión de tenant a una cuenta nueva?
- ¿Cómo personalizo una política de MRS?
- ¿Por qué no puedo encontrar la gestión de usuarios en la configuración del sistema en MRS Manager?
- ¿Proporciona Hue la función de configurar los permisos de la cuenta?
- ¿Por qué no puedo enviar trabajos en la consola después de que mi cuenta de IAM esté asignada con permisos relacionados?
- ¿Qué debo hacer si se notifica un error que indica una autenticación no válida cuando envío una orden de compra de clúster de MRS?
- Uso del cliente
-
Acceso a páginas Web
- ¿Cómo cambio la duración del tiempo de espera de la sesión para una interfaz de usuario web de componente de código abierto?
- ¿Por qué no puedo actualizar la página Dynamic Resource Plan en la pestaña de tenant de MRS?
- ¿Qué hago si la pestaña Kafka Topic Monitoring no está disponible en Manager?
- ¿Cómo lo hago si se informa de un error o algunas funciones no están disponibles cuando accedo a las interfaces de usuario web de HDFS, Hue, YARN, HetuEngine y Flink?
- ¿Cómo cambio el modo de acceso a MRS Manager (no para la nube operada conjuntamente)?
-
Monitoreo de alarmas
- En un clúster de streaming de MRS, ¿puede la función de monitoreo de Kafka topic enviar notificaciones de alarma?
- ¿Dónde puedo ver las colas de recursos en ejecución cuando se generan ALM-18022 Recursos de cola de Yarn insuficientes?
- ¿Cómo entiendo las estadísticas de gráficos de varios niveles en la métrica de solicitudes de operación de HBase?
- Ajuste de rendimiento
-
Desarrollo del trabajo
- ¿Cómo obtengo mis datos en OBS o HDFS?
- ¿Qué tipos de trabajos de Spark se pueden enviar en un clúster?
- ¿Puedo ejecutar varias tareas de Spark al mismo tiempo después de que los recursos mínimos del tenant de un clúster MRS se cambien a 0?
- ¿Qué hago si no se pueden identificar los parámetros de trabajo separados por espacios?
- ¿Cuáles son las diferencias entre el modo de client y el modo de cluster de los trabajos de Spark?
- ¿Cómo puedo ver los registros de trabajos de MRS?
- ¿Cómo lo hago si se muestra el mensaje "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page."?
- La ejecución de trabajo de LauncherJob se ha fallado y el mensaje de error "jobPropertiesMap is null." se muestra
- ¿Cómo lo hago si el estado del trabajo de Flink en la consola MRS es incompatible con el del Yarn?
- ¿Cómo lo hago si un trabajo de SparkStreaming falla después de haber sido ejecutado docenas de horas y se reporta el error OBS Access 403?
- ¿Cómo hago si se informa de una alarma que indica que la memoria es insuficiente cuando ejecuto una sentencia SQL en el cliente ClickHouse?
- ¿Cómo lo hago si se muestra un mensaje de error "java.io.IOException: Connection reset by peer" durante la ejecución de un trabajo de Spark?
- ¿Cómo hago si se muestra un mensaje de error "requestId=4971883851071737250" cuando un trabajo de Spark accede a OBS?
- ¿Cómo lo hago si se reporta el error "UnknownScannerExeception" del trabajo de Spark?
- ¿Por qué DataArtsStudio ocasionalmente no puede programar trabajos de Spark y la reprogramación también falla?
- ¿Cómo hago si un trabajo de Flink no se ejecuta y se muestra el mensaje de error "java.lang.NoSuchFieldError: SECURITY_SSL_ENCRYPT_ENABLED"?
- ¿Por qué no se puede ver el trabajo de Yarn enviado en la interfaz de usuario web?
- ¿Cómo modifico el HDFS NameSpace (fs.defaultFS) de un clúster existente?
- ¿Cómo lo hago si YARN detiene la cola de launcher-job debido a un heap size insuficiente cuando envío un trabajo de Flink en el plano de gestión?
- ¿Cómo lo hago si se muestra el mensaje de error "slot request timeout" cuando envío un trabajo de Flink?
- Importación y exportación de datos de trabajos de DistCP
- ¿Cómo puedo ver las sentencias SQL para los trabajos de Hive en la YARN Web UI?
- ¿Cómo puedo ver los registros de una tarea de Yarn especificada?
- Actualización/Instalación de parches de clústeres
-
Interconexión del ecosistema periférico
- ¿Se puede utilizar MRS para realizar operaciones de lectura y escritura en tablas DLI?
- ¿OBS soporta el protocolo ListObjectsV2?
- ¿Se pueden almacenar datos MRS en un sistema de archivos paralelo proporcionado por OBS?
- ¿Se puede desplegar el servicio Crawler en MRS?
- ¿DWS y MRS soportan la eliminación segura (evitar la recuperación después de la eliminación)?
- ¿Por qué no se encuentra el clúster MRS autenticado por Kerberos cuando se configura una conexión desde DLF?
- ¿Cómo uso PySpark en un ECS para conectarse a un clúster MRS Spark con autenticación de Kerberos habilitada, en la Intranet?
- ¿Por qué los campos asignados no existen en la base de datos después de que HBase sincronice los datos con CSS?
- ¿Puede Flume leer datos de OBS?
- ¿Se puede conectar MRS a un KDC externo?
- ¿Cómo soluciono el problema de compatibilidad de la versión Jetty en la interconexión de código abierto Kylin 3.x y MRS 1.9.3?
- ¿Qué sucede si los datos no se exportan desde MRS a un bucket cifrado de OBS?
- ¿Cómo interconecto MRS con LTS?
- ¿Cómo instalo HSS en nodos de clúster MRS?
- Acceso al clúster
-
Desarrollo de servicios de big data
- ¿Puede MRS ejecutar múltiples tareas de Flume a la vez?
- ¿Cómo cambio los registros de FlumeClient a los registros estándar?
- ¿Dónde se almacenan los archivos JAR y las variables de entorno de Hadoop?
- ¿Qué algoritmos de compresión admite HBase?
- ¿Puede MRS escribir datos en HBase a través de la tabla externa de HBase de Hive?
- ¿Cómo veo los registros de HBase?
- ¿Cómo configuro el TTL para una tabla HBase?
- ¿Cómo me conecto a HBase de MRS a través de HappyBase?
- ¿Cómo cambio el número de réplicas HDFS?
- ¿Cómo modifico la clase de conmutación HDFS activo y en espera?
- ¿Cuál es el tipo de número recomendado de DynamoDB en las tablas Hive?
- ¿Se puede interconectar el controlador Hive con DBCP2?
- ¿Cómo puedo ver la tabla Hive creada por otro usuario?
- ¿Dónde puedo descargar el paquete de dependencias (com.huawei.gaussc10) en el proyecto de ejemplo de Hive?
- ¿Puedo exportar el resultado de la consulta de datos de Hive?
- ¿Cómo lo hago si ocurre un error cuando Hive ejecuta el comando beeline -e para ejecutar varias sentencias?
- ¿Cómo lo hago si un trabajo "hivesql/hivescript" no se envía después de agregar Hive?
- ¿Qué pasa si un archivo de Excel descargado en Hue no se puede abrir?
- ¿Cómo lo hago si las sesiones no se liberan después de que Hue se conecta a HiveServer y se muestra el mensaje de error "ver max user connections"?
- ¿Cómo se restablecen los datos de Kafka?
- ¿Cómo obtengo la versión de cliente de MRS Kafka?
- ¿Qué protocolos de acceso son compatibles con Kafka?
- ¿Cómo lo hago si se muestra un mensaje de error "Not Authorized to access group xxx" cuando se consume un Kafka topic?
- ¿Qué algoritmos de compresión soporta Kudu?
- ¿Cómo puedo ver los registros de Kudu?
- ¿Cómo manejo las excepciones del servicio Kudu generadas durante la creación de clústeres?
- ¿Cuáles son las diferencias entre la construcción de proyectos de ejemplo y el desarrollo de aplicaciones? ¿Es compatible el código Python?
- ¿OpenTSDB soporta las API de Python?
- ¿Cómo configuro otras fuentes de datos en Presto?
- ¿Cómo actualizo el certificado Ranger?
- ¿Cómo me conecto a Spark Shell desde MRS?
- ¿Cómo me conecto a Spark Beeline desde MRS?
- ¿Dónde se almacenan los registros de ejecución de los trabajos de Spark?
- ¿Cómo especifico una ruta de acceso de registro al enviar una tarea en un clúster MRS Storm?
- ¿Cómo puedo comprobar si la configuración ResourceManager de Yarn es correcta?
- ¿Cómo modifico el parámetro allow_drop_detached de ClickHouse?
- ¿Cómo lo hago si se informa de una alarma que indica memoria insuficiente durante la ejecución de la tarea de Spark?
- ¿Cómo agrego una política de eliminación periódica para evitar registros de tabla de sistema ClickHouse de gran tamaño?
- ¿Cómo obtengo un archivo Spark JAR?
- ¿Por qué se genera una alarma cuando el proceso NameNode no se reinicia después de modificar el archivo hdfs-site.xml?
- Se necesita mucho tiempo para que Spark SQL acceda a las tablas particionadas de Hive antes del inicio de Job
- ¿Qué debo hacer si spark.yarn.executor.memoryOverhead no tiene efecto?
- ¿Cómo cambio la zona horaria del servicio ClickHouse?
- ¿Qué debo hacer si falla la conexión con el servidor ClickHouse? Código de error: 516
- API
-
Gestión de clúster
- ¿Cómo puedo ver todos los clústeres?
- ¿Cómo puedo ver la información de registro?
- ¿Cómo puedo ver la información de configuración del clúster?
- ¿Cómo agrego servicios a un clúster MRS?
- ¿Cómo instalo Kafka y Flume en un clúster de MRS?
- ¿Cómo detengo un clúster MRS?
- ¿Es necesario apagar un nodo Master antes de actualizar sus especificaciones?
- ¿Puedo agregar componentes a un clúster existente?
- ¿Puedo eliminar componentes instalados en un clúster MRS?
- ¿Puedo cambiar los nodos de clúster MRS en la consola MRS?
- ¿Cómo puedo proteger las notificaciones de eventos/alarmas de clúster?
- ¿Por qué la memoria del grupo de recursos mostrada en el cluster MRS es menor que la memoria real del cluster?
- ¿Cómo configuro la memoria knox?
- ¿Qué es la versión de Python instalada para un clúster MRS?
- ¿Cómo puedo ver el directorio de archivos de configuración de cada componente?
- ¿Cómo subo un archivo local a un nodo dentro de un clúster?
- ¿Cómo lo hago si el tiempo en los nodos MRS es incorrecto?
- ¿Cómo puedo consultar la hora de inicio de un nodo MRS?
- ¿Cómo lo hago si las relaciones de confianza entre nodos son anormales?
- ¿Cómo ajusto el tamaño de memoria del proceso manager-executor?
- ¿Puedo modificar un nodo master en un clúster MRS existente?
-
Uso de Kerberos
- ¿Cómo cambio el estado de autenticación de Kerberos de un clúster MRS creado?
- ¿Cuáles son los puertos del servicio de autenticación de Kerberos?
- ¿Cómo despliego el servicio Kerberos en un clúster en ejecución?
- ¿Cómo accedo a Hive en un clúster con autenticación de Kerberos habilitada?
- ¿Cómo puedo acceder de Presto en un clúster con autenticación de Kerberos habilitada?
- ¿Cómo accedo a Spark en un clúster con autenticación Kerberos habilitada?
- ¿Cómo puedo evitar que la autenticación Kerberos caduque?
- Gestión de metadatos
-
Descripción de MRS
- Actualmente, el contenido no está disponible en el idioma seleccionado. Sugerimos consultar la versión en inglés.
- What's New
- Function Overview
- Product Bulletin
- Billing
-
Component Operation Guide (LTS)
-
Using CarbonData
- CarbonData Data Types
- CarbonData Table User Permissions
- Creating a CarbonData Table Using the Spark Client
- CarbonData Data Analytics
- CarbonData Performance Tuning
- Typical CarbonData Configuration Parameters
-
CarbonData Syntax Reference
- CREATE TABLE
- CREATE TABLE As SELECT
- DROP TABLE
- SHOW TABLES
- ALTER TABLE COMPACTION
- TABLE RENAME
- ADD COLUMNS
- DROP COLUMNS
- CHANGE DATA TYPE
- REFRESH TABLE
- REGISTER INDEX TABLE
- LOAD DATA
- UPDATE CARBON TABLE
- DELETE RECORDS from CARBON TABLE
- INSERT INTO CARBON TABLE
- DELETE SEGMENT by ID
- DELETE SEGMENT by DATE
- SHOW SEGMENTS
- CREATE SECONDARY INDEX
- SHOW SECONDARY INDEXES
- DROP SECONDARY INDEX
- CLEAN FILES
- SET/RESET
- Concurrent CarbonData Table Operations
- CarbonData Segment API
- CarbonData Tablespace Index
-
Common Issues About CarbonData
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why Data Load Performance Decreases due to Bad Records?
- Why Data loading Fails During off heap?
- Why Do I Fail to Create a Hive Table?
- How Do I Logically Split Data Across Different Namespaces?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Does CarbonData Become Abnormal After the Disk Space Quota of the HDFS Storage Directory Is Set?
- Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
- How Do I Restore the Latest tablestatus File That Has Been Lost or Damaged When TableStatus Versioning Is Enabled?
-
CarbonData Troubleshooting
- Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter
- Query Performance Deteriorated Due to Insufficient Executor Memory
- Data Query or Loading Failed, and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Was Reported
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
-
Using CDL
- Integrating CDL Data
- CDL User Permission Management
- Creating a Data Synchronization Job with CDL
- Preparing for Creating a CDL Job
-
Creating a CDL Job
- Creating a CDL Data Synchronization Job
- Creating a CDL Data Comparison Job
- Synchronizing Data from PgSQL to Kafka Using CDL
- Synchronizing Data from PgSQL to Hudi Using CDL
- Synchronizing Data from openGauss to Hudi Using CDL
- Synchronizing Data from Hudi to DWS Using CDL
- Synchronizing Data from Hudi to ClickHouse Using CDL
- Synchronizing openGauss Data to Hudi Using CDL (ThirdKafka)
- Synchronizing drs-oracle-json Database to Hudi Using CDL (ThirdKafka)
- Synchronizing drs-oracle-avro Database to Hudi Using CDL (ThirdKafka)
- CDL Job DDL Changes
- CDL Log Overview
- Common Issues About CDL
-
CDL Troubleshooting
- Error 403 Is Reported When a CDL Job Is Stopped
- Error 104 or 143 Is Reported After a CDL Job Runs for a Period of Time
- Why Is the Value of Task configured for the OGG Source Different from the Actual Number of Running Tasks When Data Is Synchronized from OGG to Hudi?
- Why Are There Too Many Topic Partitions Corresponding to the CDL Synchronization Task Names?
- What Should I When a CDL Task Is Executed to Synchronize Data to the Hudi, an Error Message Indicating that the Current User Does Not Have the Permission to Create Tables?
- Error Is Reported When the Job of Capturing Data From PgSQL to Hudi Is Started
-
Using ClickHouse
- ClickHouse Overview
- ClickHouse User Permission Management
- ClickHouse Client Practices
-
ClickHouse Data Import
- Interconnecting ClickHouse with RDS for MySQL
- Interconnecting ClickHouse with OBS
- Interconnecting ClickHouse with HDFS (MRS 3.2.0-LTS)
- Interconnecting ClickHouse with HDFS (MRS 3.3.0-LTS or later)
- Configuring Interconnection Between ClickHouse and Kafka
- Synchronizing Kafka Data to ClickHouse
- Importing DWS Table Data to ClickHouse
- Importing ClickHouse Data in Batches
- Using ClickHouse to Import and Export Data
-
Enterprise-Class Enhancements of ClickHouse
- ClickHouse Multi-Tenancy
- Checking Slow SQL Statements in ClickHouse
- Checking Monitoring Metrics of ClickHouse Replication Table Data Synchronization
- Configuring Strong Data Consistency Between ClickHouse Replicas
- Configuring the Support for Transactions on ClickHouse
- Accessing ClickHouse Through ELB
- Storing ClickHouse Cold and Hot Data Separately
- Configuring the Connection Between ClickHouse and Open-Source ClickHouse
- Pre-Caching ClickHouse Metadata to the Memory
- ClickHouse Performance Tuning
-
ClickHouse O&M Management
- ClickHouse Log Overview
- Collecting Dumping Logs of the ClickHouse System Tables
- Enabling the Read-Only Mode for ClickHouse Tables
- Migrating Data Between ClickHouseServer Nodes in a Cluster
- Migrating ClickHouse Data from One MRS Cluster to Another
- Expanding the Disk Capacity of the ClickHouse Node
- Backing Up and Restoring ClickHouse Data Using a Data File
- Configuring the Default ClickHouse User Password (MRS 3.1.2-LTS)
- Configuring the Default ClickHouse User Password (MRS 3.3.0-LTS or later)
- Clearing the Passwords of Default ClickHouse Users
-
Common ClickHouse SQL Syntax
- CREATE DATABASE: Creating a Database
- CREATE TABLE: Creating a Table
- INSERT INTO: Inserting Data into a Table
- DELETE: Lightweight Deleting Table Data
- SELECT: Querying Table Data
- ALTER TABLE: Modifying a Table Structure
- ALTER TABLE: Modifying Table Data
- DESC: Querying a Table Structure
- DROP: Deleting a Table
- SHOW: Displaying Information About Databases and Tables
- UPSERT: Writing Data
-
Common Issues About ClickHouse
- How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
- How Do I Migrate Data from Hive/HDFS to ClickHouse?
- How Do I Migrate Data from OBS/S3 to ClickHouse?
- An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data
- How Do I Grant the Select Permission at the Database Level to ClickHouse Users?
- How Do I Quickly Restore ClickHouse When Concurrent Requests Are Stacked for a Long Time?
- Using DBService
-
Using Doris
- Overview of the Doris Data Model
- Managing Doris User Permissions
- Using the MySQL Client to Connect to Doris
- Getting Started with Doris
- Importing Doris Data
- Analyzing Doris Data
- Enterprise-Class Enhancements of Doris
- Doris O&M Management
- Typical SQL Syntax of Doris
-
Common Issues About Doris
- What Should I Do If Occasionally Occurs During Table Creation Due to the Configuration of the SSD and HDD Data Directories?
- What Should I Do If RPC Timeout Error Is Reported When Stream Load Is Used?
- What Do I Do If the Error Message "plugin not enabled" Is Displayed When the MySQL Client Is Used to Connect to the Doris Database?
- How Do I Handle the FE Startup Failure?
- How Do I Handle the Startup Failure Due to Incorrect IP Address Matching for the BE Instance?
- What Should I Do If Error Message "Read timed out" Is Displayed When the MySQL Client Connects to the Doris?
- What Should I Do If an Error Is Reported When the BE Runs a Data Import or Query Task?
- What Should I Do If a Timeout Error Is Reported When Broker Load Imports Data?
- What Should I Do If an Error Message Is Displayed When Broker Load Is Used to Import Data?
- Doris Troubleshooting
-
Using Flink
- Flink Job Engine
- Flink User Permission Management
- Using the Flink Client
- Preparing for Creating a FlinkServer Job
-
Creating a FlinkServer Job
- Creating a FlinkServer Job and Writing Data to a ClickHouse Table
- Creating a FlinkServer Job to Interconnect with a Doris Table
- Creating a FlinkServer Job to Interconnect with a GaussDB(DWS) Table
- Creating a FlinkServer Job to Interconnect with JDBC
- Creating a FlinkServer Job to Write Data to a DWS
- Creating a FlinkServer Job to Write Data to an HBase Table
- Creating a FlinkServer Job to Write Data to an HDFS
- Creating a FlinkServer Job to Write Data to a Hive Table
- Creating a FlinkServer Job to Write Data to a Hudi Table
- Creating a FlinkServer Job to Write Data to a Kafka Message Queue
-
Managing FlinkServer Jobs
- Viewing the Health Status of FlinkServer Jobs
- Importing and Exporting FlinkServer Job Information
- Configuring Automatic Clearing of FlinkServer Job Residuals
- Configuring the FlinkServer Job Restart Policy
- Adding Third-Party Dependency JAR Packages to a FlinkServer Job
- Using UDFs in FlinkServer Jobs
- Configuring the FlinkServer UDF Sandbox
-
Enterprise-Class Enhancements of Flink
- Flink SQL Syntax Enhancement
- Table-Level TTL for Stream Joins
- Configuring Flink SQL Client to Support SQL Verification
- Enhancing the Joins of Large and Small Tables in Flink Jobs
- Exiting FlinkSQL OVER Window Upon Expiration
- Limiting Read Rate for Flink SQL Kafka and Upsert-Kafka Connector
- Consuming Data in drs-json Format with FlinkSQL Kafka Connector
- Using ignoreDelete in JDBC Data Writes
- Join-To-Live
- Row Filter
- FlinkSQL Operator Parallelism
- Optimizing FlinkSQL JSON_VALUE Performance
- Reusing FlinkSQL Lookup Operator
- FlinkSQL Function Enhancements
- Using the MultiJoin Operator in Flink SQL
- Flink O&M Management
- Flink Performance Tuning
- Typical Commands of the Flink Client
- Common Flink SQL Syntax
- Common Issues About Flink
- Flink Troubleshooting
-
Using Flume
- Flume Log Collection Overview
- Flume Service Model Configuration
- Installing the Flume Client
- Quickly Using Flume to Collect Node Logs
-
Configuring a Non-Encrypted Flume Data Collection Task
- Generating Configuration Files for the Flume Server and Client
- Using Flume Server to Collect Static Logs from Local Host to Kafka
- Using Flume Server to Collect Static Logs from Local Host to HDFS
- Using Flume Server to Collect Dynamic Logs from Local Host to HDFS
- Using Flume Server to Collect Logs from Kafka to HDFS
- Using Flume Client to Collect Logs from Kafka to HDFS
- Using Cascaded Agents to Collect Static Logs from Local Host to HBase
- Configuring an Encrypted Flume Data Collection Task
- Enterprise-Class Enhancements of Flume
- Flume O&M Management
- Common Issues About Flume
- Using Guardian
-
Using HBase
- Creating an HBase Permission Role
- Using the HBase Client
- Using HBase for Offline Data Analysis
- Migrating Data to HBase Using BulkLoad
- HBase Data Operations
-
Enterprise-Class Enhancements of HBase
-
Configuring HBase Global Secondary Indexes for Faster Queries
- Introduction to HBase Global Secondary Indexes
- Creating an HBase Global Secondary Index
- Querying an HBase Global Secondary Index
- Changing Status of HBase Global Secondary Indexes
- Creating HBase Global Secondary Indexes in Batches
- Checking HBase Global Secondary Index Data Consistency
- Querying HBase Table Data with Global Secondary Indexes
- Configuring HBase Local Secondary Indexes for Faster Queries
- Improving HBase BulkLoad Data Migration
- Using the Spark BulkLoad Tool to Synchronize Data to HBase Tables
- Configuring Hot-Cold Data Separate in HBase
- Configuring RSGroup to Manage RegionServer Resource
- Checking Slow and Oversized HBase Requests
- Configuring HBase Table-Level Overload Control
- Enabling the HBase Multicast Function
-
Configuring HBase Global Secondary Indexes for Faster Queries
-
HBase Performance Tuning
- Improving the Batch Loading Efficiency of HBase BulkLoad
- Improving HBase Continuous Put Performance
- Improving HBase Put and Scan Performance
- Improving HBase Real-Time Write Efficiency
- Improving HBase Real-Time Read Efficiency
- Accelerating HBase Compaction During Off-Peak Hours
- Tuning HBase JVM Parameters
- Optimization for HBase Overload
- Enabling CCSMap Functions
- Enabling Succinct Trie
- HBase O&M Management
-
Common Issues About HBase
- Operation Failures Occur in Stopping BulkLoad On the Client
- How Do I Restore a Region in the RIT State for a Long Time?
- Why Does HMaster Exits Due to Timeout When Waiting for the NameSpace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
- When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
- Insufficient Rights When Accessing Phoenix
- Insufficient Rights When Useing the HBase Bulkload Function
- How Do I Fix Region Overlapping?
- Restrictions on using the Phoenix BulkLoad Tool
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
- Introduction to HBase Global Secondary Index APIs
- How Do I Disable HDFS Hedged Read on HBase?
-
HBase Troubleshooting
- Why Does a Client Keep Failing to Connect to a Server for a Long Time?
- Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
- Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
- Why Does the HBase BulkLoad Task Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
- Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
- What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
- How Do I Delete Residual Table Names in the table-lock Directory of ZooKeeper?
- Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
- HMaster Fails to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata
- Why Messages Containing FileNotFoundException Frequently Displayed in the HMaster Logs?
- Why Does the ImportTsv Tool Display "Permission denied"
- Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
- HBase Fails to Recover a Task
- Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
- Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed?
- Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
- How Do I View Regions in the CLOSED State in an ENABLED Table?
- How Can I Quickly Recover the Service When HBase Files Are Damaged Due to a Cluster Power-Off?
- How Do I Quickly Restore HBase After HDFS Enters the Safe Mode and the HBase Service Is Abnormal?
-
Using HDFS
- Overview of HDFS File System Directories
- HDFS User Permission Management
- Using the HDFS Client
- Using Hadoop from Scratch
- Configuring the Recycle Bin Mechanism
- Configuring HDFS DataNode Data Balancing
- Configuring HDFS DiskBalancer
- Configuring HDFS Mover
- Configuring HDFS NodeLabel
- Configuring Memory Management
- Configuring ulimit for HBase and HDFS
- Configuring the Number of Files in a Single HDFS Directory
-
Enterprise-Class Enhancements of HDFS
- Configuring the HDFS Quick File Close Function
- Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
- Configuring Reserved Percentage of Disk Usage on DataNodes
- Configuring the NameNode Blacklist
- Configuring Encrypted Channels
- Configuring HDFS Hedged Read
- Configuring Fine-Grained Locks of HDFS
- HDFS Auto Recovery from Cluster Power-off
-
HDFS Performance Tuning
- Improving Write Performance
- Improving Read Performance Using Client Metadata Cache
- Improving the Connection Between the Client and NameNode Using Current Active Cache
- Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Performing Concurrent Operations on HDFS Files
- Configuring LZC Compression
- Asynchronously Deleting HDFS Data
- HDFS O&M Management
- Common commands of the HDFS client
-
Common Issues About HDFS
- Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- HDFS WebUI Cannot Properly Update Information About Damaged Data
- The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- DataNode Is Normal but Cannot Report Data Blocks
- Can I Delete or Modify the Data Storage Directory in DataNode?
- Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
- Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- Blocks Miss on the NameNode UI After the Successful Rollback
-
HDFS Troubleshooting
- Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
- NameNode Startup Is Slow
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
- Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
- Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
- Why Does Array Border-crossing Occur During FileInputFormat Split?
- The Standby NameNode Fails to Be Started Because It Is Not Started for a Long Time
-
Using HetuEngine
- Overview of HetuEngine Interactive Query
- HetuEngine User Permission Management
- Quickly Using HetuEngine to Access Hive Data Source
- Creating a HetuEngine Compute Instance
-
Adding a HetuEngine Data Source
- Using HetuEngine to Access Data Sources Across Sources and Domains
- Adding a Hive Data Source
- Adding a Hudi Data Source
- Adding a ClickHouse Data Source
- Adding a GaussDB Data Source
- Adding an HBase Data Source
- Adding a Cross-Cluster HetuEngine Data Source
- Adding an IoTDB Data Source
- Adding a MySQL Data Source
- Adding an Oracle Data Source
- Adding a GBase Data Source
-
Configuring HetuEngine Materialized Views
- Overview of HetuEngine Materialized Views
- SQL Examples of HetuEngine Materialized Views
- Rewriting of HetuEngine Materialized Views
- HetuEngine Materialized View Recommendation
- HetuEngine Materialized View Caching
- Validity Period and Data Update of HetuEngine Materialized Views
- HetuEngine Intelligent Materialized Views
- Automatic Tasks of HetuEngine Materialized Views
- HetuEngine SQL Diagnosis
- Developing and Deploying HetuEngine UDFs
- Managing a HetuEngine Data Source
-
Managing HetuEngine Compute Instances
- Configuring HetuEngine Resource Groups
- Configuring the Number of HetuEngine Worker Nodes
- Configuring a HetuEngine Maintenance Instance
- Configuring the Nodes on Which HetuEngine Coordinator Is Running
- Importing and Exporting HetuEngine Compute Instance Configurations
- Viewing the HetuEngine Instance Monitoring Page
- Viewing HetuEngine Coordinator and Worker Logs
- Configuring HetuEngine Query Fault Tolerance
-
HetuEngine Performance Tuning
- Adjusting YARN Resource Allocation
- Adjusting HetuEngine Cluster Node Resource Configurations
- Optimizing HetuEngine INSERT Statements
- Adjusting HetuEngine Metadata Caching
- Enabling Dynamic Filtering in HetuEngine
- Adjusting the Execution of Adaptive Queries in HetuEngine
- Adjusting Timeout for Hive Metadata Loading
- Tuning Hudi Data Source Performance
- HetuEngine Log Overview
-
Common HetuEngine SQL Syntax
- HetuEngine Data Type
-
HetuEngine DDL SQL Syntax
- CREATE SCHEMA
- CREATE VIRTUAL SCHEMA
- CREATE TABLE
- CREATE TABLE AS
- CREATE TABLE LIKE
- CREATE VIEW
- CREATE FUNCTION
- CREATE MATERIALIZED VIEW
- ALTER MATERIALIZED VIEW STATUS
- ALTER MATERIALIZED VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER SCHEMA
- DROP SCHEMA
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- DROP MATERIALIZED VIEW
- REFRESH MATERIALIZED VIEW
- TRUNCATE TABLE
- COMMENT
- VALUES
- SHOW Syntax Overview
- SHOW CATALOGS
- SHOW SCHEMAS (DATABASES)
- SHOW TABLES
- SHOW TBLPROPERTIES TABLE|VIEW
- SHOW TABLE/PARTITION EXTENDED
- SHOW STATS
- SHOW FUNCTIONS
- SHOW SESSION
- SHOW PARTITIONS
- SHOW COLUMNS
- SHOW CREATE TABLE
- SHOW VIEWS
- SHOW CREATE VIEW
- SHOW MATERIALIZED VIEWS
- SHOW CREATE MATERIALIZED VIEW
- HetuEngine DML SQL Syntax
- HetuEngine TCL SQL Syntax
- HetuEngine DQL SQL Syntax
-
HetuEngine SQL Functions and Operators
- Logical Operators
- Comparison Functions and Operators
- Condition Expression
- Lambda Expression
- Conversion Functions
- Mathematical Functions and Operators
- Bitwise Functions
- Decimal Functions and Operators
- String Functions and Operators
- Regular Expressions
- Binary Functions and Operators
- JSON Functions and Operators
- Date and Time Functions and Operators
- Aggregate Functions
- Window Functions
- Array Functions and Operators
- Map Functions and Operators
- URL Function
- Geospatial Function
- HyperLogLog Functions
- UUID Function
- Color Function
- Session Information
- Teradata Function
- Data Masking Functions
- IP Address Functions
- Quantile Digest Functions
- T-Digest Functions
- Set Digest Functions
- HetuEngine Auxiliary Command Syntax
- HetuEngine Reserved Keywords
- HetuEngine Implicit Data Type Conversion
- Data Preparation for the Sample Table
- HetuEngine Syntax Compatibility with Common Data Sources
-
Common Issues About HetuEngine
- What Should I Do After the HetuEngine Domain Name Is Changed?
- What Can I Do If Starting the HetuEngine Cluster on the Client Times Out?
- How Do I Handle Data Loss in a HetuEngine Data Source?
- What Do I Do If the View Owner Does Not Have the Permission on Functions?
- What Do I Do If Error "Encountered too many errors" Is Reported During HetuEngine SQL Execution?
- HetuEngine Troubleshooting
-
Using Hive
- Hive User Permission Management
- Using the Hive Client
- Using Hive for Data Analysis
- Configuring Hive Data Storage and Encryption
- Hive on HBase
- Using Hive to Read Data in a Relational Database
- Hive Supporting Reading Hudi Tables
-
Enterprise-Class Enhancements of Hive
- Storing Hive Table Partitions to OBS and HDFS
- Configuring Automatic Removal of Old Data in the Hive Directory to the Recycle Bin
- Configuring Hive to Insert Data to a Directory That Does Not Exist
- Forbidding Location Specification When Hive Internal Tables Are Created
- Creating a Foreign Table in a Directory (Read and Execute Permission Granted)
- Configuring HTTPS/HTTP-based REST APIs
- Configuring Hive Transform
- Switching the Hive Execution Engine to Tez
- Hive Load Balancing
- Configuring Access Control Permission for the Dynamic View of a Hive Single Table
- Allowing Users without ADMIN Permission to Create Temporary Functions
- Allowing Users with Select Permission to View the Table Structure
- Allowing Only the Hive Administrator to Create Databases and Tables in the Default Database
- Configuring Hive to Support More Than 32 Roles
- Creating User-Defined Hive Functions
- Configuring High Reliability for Hive Beeline
- Detecting Statements That Overwrite a Table with Its Own Data
- Configuring Hive Dynamic Data Masking
- Hive Performance Tuning
- Hive O&M Management
- Common Hive SQL Syntax
-
Common Issues About Hive
- How Do I Delete UDFs on Multiple HiveServers?
- Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- Which special characters are not supported by Hive in complex field names
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Data Loss Caused by Misoperations of the insert overwrite Statement?
- Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
- Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Description of Hive Table Location (Either Be an OBS or HDFS Path)
- Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
- Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
- Does Hive Support Vectorized Query?
- Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
- How Do I Disable the Logging Function of Hive?
- Why Hive Tables in the OBS Directory Fail to Be Deleted?
- Why Does an OBS Quickly Deleted Directory Not Take Effect After Being Added to the Customized Hive Configuration?
- Hive Troubleshooting
-
Using Hudi
- Hudi Table Overview
- Creating a Hudi Table Using Spark Shell
- Operating a Hudi Table Using spark-sql
- Operating a Hudi Table Using hudi-cli.sh
- Hudi Write Operation
- Hudi Read Operation
- Hudi Data Management and Maintenance
- Hudi SQL Syntax Reference
- Hudi Schema Evolution
- Configuring Default Values for Hudi Data Columns
- Partial Update
- Aggregate Functions in Hudi
- Typical Hudi Configuration Parameters
- Hudi Performance Tuning
-
Common Issues About Hudi
- "Parquet/Avro schema" Is Reported When Updated Data Is Written
- UnsupportedOperationException Is Reported When Updated Data Is Written
- SchemaCompatabilityException Is Reported When Updated Data Is Written
- What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
- Hudi Fails to Write Decimal Data with Lower Precision
- Data in ro and rt Tables Cannot Be Synchronized to a MOR Table Recreated After Being Deleted Using Spark SQL
- IllegalArgumentException Is Reported When Kafka Is Used to Collect Data
- SQLException Is Reported During Hive Data Synchronization
- HoodieHiveSyncException Is Reported During Hive Data Synchronization
- SemanticException Is Reported During Hive Data Synchronization
-
Using Hue
- Accessing the Hue Web UI
- Creating a Hue Job
- Configuring HDFS Cold and Hot Data Migration
- Typical Hue Parameters
- Hue Log Overview
- Common Issues About Hue
-
Hue Troubleshooting
- Why Does the use database Statement Become Invalid in Hive?
- Why Do HDFS Files Fail to Access Through the Hue Web UI?
- Why Do Large Files Fail to Upload on the Hue Page
- Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- What Should I Do If It Takes a Long Time to Access the Native Hue UI and the File Browser Reports "Read timed out"?
- Using Impala
- Using IoTDB
- Using JobGateway
- Using Kafka
- Using Kudu
-
Using Loader
- Overview of Importing and Exporting Loader Data
- Loader User Permission Management
- Uploading the MySQL Database Connection Driver
-
Creating a Loader Data Import Job
- Using Loader to Import Data to an MRS Cluster
- Using Loader to Import Data from an SFTP Server to HDFS or OBS
- Using Loader to Import Data from an SFTP Server to HBase
- Using Loader to Import Data from an SFTP Server to Hive
- Using Loader to Import Data from an FTP Server to HBase
- Using Loader to Import Data from a Relational Database to HDFS or OBS
- Using Loader to Import Data from a Relational Database to HBase
- Using Loader to Import Data from a Relational Database to Hive
- Using Loader to Import Data from HDFS or OBS to HBase
- Using Loader to Import Data from a Relational Database to ClickHouse
- Using Loader to Import Data from HDFS to ClickHouse
-
Creating a Loader Data Export Job
- Using Loader to Export Data from an MRS Cluster
- Using Loader to Export Data from HDFS or OBS to an SFTP Server
- Using Loader to Export Data from HBase to an SFTP Server
- Using Loader to Export Data from Hive to an SFTP Server
- Using Loader to Export Data from HDFS or OBS to a Relational Database
- Using Loader to Export Data from HDFS to MOTService
- Using Loader to Export Data from HBase to a Relational Database
- Using Loader to Export Data from Hive to a Relational Database
- Using Loader to Export Data from HBase to HDFS or OBS
- Using Loader to Export Data from HDFS to ClickHouse
- Managing Loader Jobs
- Loader O&M Management
- Loader Operator Help
-
Loader Client Tools
- Running a Loader Job by Using Commands
- loader-tool Usage Guide
- loader-tool Usage Example
- schedule-tool Usage Guide
- schedule-tool Usage Example
- Using loader-backup to Back Up Job Data
- Open Source sqoop-shell Tool Usage Guide
- Importing Data to HDFS Using sqoop-shell
- Importing Data to HDFS Using sqoop-shell
-
Common Issues About Loader
- Data Cannot Be Saved When Loader Jobs Are Configured
- Differences Among Connectors Used During the Process of Importing Data from the Oracle Database to HDFS
- Why Data Is Not Imported to HDFS After All Data Types of SQL Server Are Selected?
- An Error Is Reported When a Large Amount of Data Is Written to HDFS
- Failed to Run Jobs Related to the sftp-connector Connector
-
Using MapReduce
- Configuring the Distributed Cache
- Configuring the MapReduce Shuffle Address
- Configuring the MapReduce Cluster Administrator List
- Transmitting MapReduce Tasks from Windows to Linux
- Configuring the Archiving and Clearing Mechanism for MapReduce Task Logs
- MapReduce Performance Tuning
- Mapreduce Log Overview
-
Common Issues About MapReduce
- After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time
- Why Does a MapReduce Task Stay Unchanged for a Long Time?
- Why the Client Hangs During Job Running?
- Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
- MapReduce Job Failed in Multiple NameService Environment
- Why a Fault MapReduce Node Is Not Blacklisted?
-
Using Oozie
- Submitting a Job Using the Oozie Client
-
Using Hue to Submit an Oozie Job
- Creating a Workflow Using Hue
- Submitting an Oozie Hive2 Job Using Hue
- Submitting an Oozie HQL Script Using Hue
- Submitting an Oozie Spark2x Job Using Hue
- Submitting an Oozie Java Job Using Hue
- Submitting an Oozie Loader Job Using Hue
- Submitting an Oozie MapReduce Job Using Hue
- Submitting an Oozie Sub-workflow Job Using Hue
- Submitting an Oozie Shell Job Using Hue
- Submitting an Oozie HDFS Job Using Hue
- Submitting an Oozie Streaming Job Using Hue
- Submitting an Oozie DistCp Job Using Hue
- Submitting an Oozie SSH Job Using Hue
- Submitting a Coordinator Periodic Scheduling Job Using Hue
- Submitting a Bundle Batch Processing Job Using Hue
- Querying Oozie Job Results on the Hue Page
- Configuring Mutual Trust Between Oozie Nodes
- Enterprise-Class Enhancements of Oozie
- Oozie Log Overview
- Common Issues About Oozie
-
Using Ranger
- Enabling Ranger Authentication for MRS Cluster Services
- Logging In to the Ranger Web UI
- Adding a Ranger Permission Policy
-
Configuration Examples for Ranger Permission Policy
- Adding a Ranger Access Permission Policy for CDL
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark2x
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for HetuEngine
- Adding a Ranger Access Permission Policy for OBS
- Hive Tables Supporting Cascading Authorization
- Viewing Ranger Audit Information
- Configuring Ranger Security Zone
- Viewing Ranger User Permission Synchronization Information
- Ranger Performance Tuning
- Ranger Log Overview
-
Common Issues About Ranger
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- What Should I Do If I Cannot View the Created MRS User on the Ranger Management Page?
- What Should I Do If MRS Users Failed to Be Synchronized to the Ranger Web UI
- Ranger Troubleshooting
-
Using Spark/Spark2x
- Spark Usage Instruction
- Spark User Permission Management
- Using the Spark Client
- Accessing the Spark Web UI
- Submitting a Spark Job as a Proxy User
- Configuring Spark to Read HBase Data
- Configuring Spark Tasks Not to Obtain HBase Token Information
-
Spark Core Enterprise-Class Enhancements
- Configuring Spark HA to Enhance HA
- Configuring the Spark Native Engine
- Configuring the Size of the Spark Event Queue
- Configuring the Compression Format of a Parquet Table
- Adapting to the Third-party JDK When Ranger Is Used
- Using the Spark Small File Combination Tool
- Using the Spark Small File Combination Tool
- Configuring Streaming Reading of Spark Driver Execution Results
- Enabling a Spark Executor to Execute Custom Code When Exiting
- Configuring Spark Dynamic Masking
- Configuring Distinct Aggregation Optimization
- Clearing Residual Files When a Spark Job Fails to Be Configured
- Configuring Spark to Load Third-Party JAR Packages for UDF Registration or SparkSQL Extension
-
Spark SQL Enterprise-Class Enhancements
- Configuring Vector-based ORC Data Reading
- Filtering Partitions Without Paths in a Partitioned Table
- Configuring the Drop Partition Command to Support Batch Deletion
- Configuring Dynamic Overwriting for Hive Table Partitions
- Configuring Spark SQL to Enable the Adaptive Execution Feature
- Using Spark SQL Statements Without Aggregate Functions for Correlated Subqueries
- Spark Streaming Enterprise-Class Enhancements
-
Spark Core Performance Tuning
- Spark Core Data Serialization
- Spark Core Memory Tuning
- Setting Spark Core DOP
- Configuring Spark Core Broadcasting Variables
- Configuring Heap Memory Parameters for Spark Executor
- Using the External Shuffle Service to Improve Spark Core Performance
- Configuring Spark Dynamic Resource Scheduling in YARN Mode
- Adjusting Spark Core Process Parameters
- Spark DAG Design Specifications
- Experience Summary
-
Spark SQL Performance Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the Spark INSERT SELECT Statement
- Configuring Multiple Concurrent Clients to Connect to JDBCServer
- Configuring the Default Number of Data Blocks Divided by SparkSQL
- Optimizing Memory When Data Is Inserted into Spark Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Performance Tuning
- Spark on OBS Performance Tuning
-
Spark O&M Management
- Configuring Spark Parameters Rapidly
- Spark Common Configuration Parameters
- Spark Log Overview
- Obtaining Container Logs of a Running Spark Application
- Changing Spark Log Levels
- Viewing Container Logs on the Web UI
- Configuring the Number of Lost Executors Displayed on the Web UI
- Configuring Local Disk Cache for JobHistory
- Configuring Spark Event Log Rollback
- Enhancing Stability in a Limited Memory Condition
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Configuring the Column Statistics Histogram for Higher CBO Accuracy
- Using CarbonData for First Query
-
Common Issues About Spark
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- "Failed to CREATE_FILE" Is Displayed When Data Is Inserted into the Dynamic Partitioned Table Again
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- NodeManager OOM Occurs During Spark Application Execution
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- How Do I Rectify the Exception Occurred When I Perform an Operation on the Table Named table?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Functions Cannot Be Used When Different JDBCServers Are Connected?
- Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
- Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
- Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
-
Spark Streaming
- Same DAG Log Is Recorded Twice for a Streaming Task
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why Does the Spark Streaming Application Fail to Be Started from the Checkpoint When the Input Stream Has No Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- What Should I Do If Recycle Bin Version I Set on the Spark Client Does Not Take Effect?
- How Do I Change the Log Level to INFO When Using Spark yarn-client?
-
Spark Core
-
Spark Troubleshooting
- Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Apps Cannot Be Displayed on the JobHistory Page When an Empty Part File Is Loaded
- Why Does Spark Fail to Export a Table with the Same Field Name?
- Why JRE fatal error after running Spark application multiple times?
- Native Spark2x UI Fails to Be Accessed or Is Incorrectly Displayed when Internet Explorer Is Used for Access
- How Does Spark2x Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
- Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
- Spark Shuffle Exception Handling
- Why Cannot Common Users Log In to the Spark Client When There Are Multiple Service Scenarios in Spark?
- Why Does the Cluster Port Fail to Connect When a Client Outside the Cluster Is Installed or Used?
- How Do I Handle the Exception Occurred When I Query Datasource Avro Formats?
- What Should I Do If Statistics of Hudi or Hive Tables Created Using Spark SQLs Are Empty Before Data Is Inserted?
- Failed to Query Table Statistics by Partition Using Non-Standard Time Format When the Partition Column in the Table Creation Statement is timestamp
- How Do I Use Special Characters with TIMESTAMP and DATE?
- Using Sqoop
- Using Tez
-
Using YARN
- Yarn User Permission Management
- Submitting a Task Using the Yarn Client
- Configuring Container Log Aggregation
- Enabling Yarn CGroups to Limit the Container CPU Usage
- Configuring HA for TimelineServer
-
Enterprise-Class Enhancements of Yarn
- Configuring the Yarn Permission Control
- Specifying the User Who Runs Yarn Tasks
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring ApplicationMaster Work Preserving
- Configuring the Access Channel Protocol
- Configuring the Additional Scheduler WebUI
- Configuring Resources for a NodeManager Role Instance
- Configuring Yarn Restart
- Yarn Performance Tuning
- Yarn O&M Management
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Configuring the ZooKeeper Permissions
- ZooKeeper Common Configuration Parameters
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
-
Using CarbonData
-
Component Operation Guide (Normal)
- Using Alluxio
- Using CarbonData (for Versions Earlier Than MRS 3.x)
-
Using CarbonData (for MRS 3.x or Later)
- CarbonData Data Types
- CarbonData Table User Permissions
- Creating a CarbonData Table Using the Spark Client
- CarbonData Data Analytics
- CarbonData Performance Tuning
- Typical CarbonData Configuration Parameters
- CarbonData Syntax Reference
- CarbonData Troubleshooting
-
CarbonData FAQs
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why Data Load Performance Decreases due to Bad Records?
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
- Why Data loading Fails During off heap?
- Why Do I Fail to Create a Hive Table?
- How Do I Logically Split Data Across Different Namespaces?
- Why Does the Missing Privileges Exception Occur When the Database Is Dropped?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Does CarbonData Become Abnormal After the Disk Space Quota of the HDFS Storage Directory Is Set?
- Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
- Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
-
Using ClickHouse
- ClickHouse Overview
- ClickHouse User Permission Management
- Using the ClickHouse Client
- Creating a ClickHouse Table
- ClickHouse Data Import
- Enterprise-Class Enhancements of ClickHouse
- ClickHouse Performance Tuning
- ClickHouse O&M Management
-
Common ClickHouse SQL Syntax
- CREATE DATABASE: Creating a Database
- CREATE TABLE: Creating a Table
- INSERT INTO: Inserting Data into a Table
- SELECT: Querying Table Data
- ALTER TABLE: Modifying a Table Structure
- ALTER TABLE: Modifying Table Data
- DESC: Querying a Table Structure
- DROP: Deleting a Table
- SHOW: Displaying Information About Databases and Tables
-
ClickHouse FAQ
- How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
- How Do I Migrate Data from Hive/HDFS to ClickHouse?
- An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data
- How Do I Grant the Select Permission at the Database Level to ClickHouse Users?
- Using DBService
-
Using Flink
- Flink Job Engine
- Flink User Permission Management
- Using the Flink Client
- Preparing for Creating a FlinkServer Job
- Creating a FlinkServerJob
- Managing FlinkServer Jobs
- Flink O&M Management
- Flink Performance Tuning
- Typical Commands of the Flink Client
- Common Issues About Flink
- Example of Issuing a Certificate
-
Using Flume
- Flume Log Collection Overview
- Flume Service Model Configuration
- Installing the Flume Client
- Quickly Using Flume to Collect Node Logs
-
Configuring a Non-Encrypted Flume Data Collection Task
- Generating Configuration Files for the Flume Server and Client
- Using Flume Server to Collect Static Logs from Local Host to Kafka
- Using Flume Server to Collect Static Logs from Local Host to HDFS
- Using Flume Server to Collect Dynamic Logs from Local Host to HDFS
- Using Flume Server to Collect Logs from Kafka to HDFS
- Using Flume Client to Collect Logs from Kafka to HDFS
- Using Cascaded Agents to Collect Static Logs from Local Host to HBase
- Configuring an Encrypted Flume Data Collection Task
- Enterprise-Class Enhancements of Flume
- Flume O&M Management
- Common Issues About Flume
-
Using HBase
- Creating HBase Roles
- Using the HBase Client
- Quickly Using HBase for Offline Data Analysis
- Migrating Data to HBase Using BulkLoad
- HBase Data Operations
- Enterprise-Class Enhancements of HBase
- HBase Performance Tuning
- HBase O&M Management
-
Common Issues About HBase
- Operation Failures Occur in Stopping BulkLoad On the Client
- How Do I Restore a Region in the RIT State for a Long Time?
- What Should I Do If HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- What Should I Do If Error Message "java.lang.UnsatisfiedLinkError: Permission denied" Is Displayed When I Start the HBase Shell?
- When Will the" Dead Region Servers" Information Displayed on the HMaster Web UI Be Cleared After a RegionServer Is Stopped?
- What Can I Do If a Message Indicating Insufficient Permission Is Displayed When I Access HBase Phoenix?
- What Can I Do If a Message Indicating Insufficient Permission Is Displayed When a Tenant Uses HBase BulkLoad?
- How Do I Restore an HBase Region in Overlap State?
- Phoenix BulkLoad Use Restrictions
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
-
HBase Troubleshooting
- The HBase Client Failed to Connect to the Server for a Long Time
- An Exception Occurred When HBase Deletes and Creates a Table Consecutively
- Other Services Are Unstable When Too Many HBase Connections Occupy the Network Ports
- HBase BulkLoad Tasks of 210,000 Map Tasks and 10,000 Reduce Tasks failed To Be Executed
- Modified and Deleted Data Can Still Be Queried by the Scan Command
- Failed to Create Tables When the Region is in FAILED_OPEN State
- How to Delete the residual Table Name on the ZooKeeper table-lock Node After a Table Creation Failure
- HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS
- HMaster Failed to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata
- FileNotFoundException Is Frequently Printed in HMaster Logs
- "Permission denied" Was Displayed When the ImportTsv Tool Failed to Run
- Data Is Successfully Imported Using HBase BulkLoad, but Different Results May Be Returned To the Same Query
- HBase Data Restoration Task Failed to Be Rolled Back
- RegionServer Failed to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB
- When LoadIncrementalHFiles Is Used to Import Data in Batches on Cluster Nodes, the Insufficient Permission Error Is Reported
- "import argparse" Is Reported When the Phoenix Sqlline Script Is Used
-
Using HDFS
- Overview of HDFS File System Directories
- HDFS User Permission Management
- Using the HDFS Client
- Using Hadoop
- Configuring the Recycle Bin Mechanism
- Configuring HDFS DataNode Data Balancing
- Configuring HDFS Disk Balancing
- Using HDFS Mover to Migrate Data
- Configuring the Label Policy (NodeLabel) for HDFS File Directories
- Configuring NameNode Memory Parameters
- Setting the Number Limit of HBase and HDFS Handles
- Configuring the Number of Files in a Single HDFS Directory
- Enterprise-Class Enhancements of HDFS
-
HDFS Performance Tuning
- Improving HDFS Write Performance
- Improving Read Performance By HDFS Client Metadata Caching
- Improving the HDFS Client Connection Performance with Active NameNode Caching
- Optimization for Unstable HDFS Network
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Performing Concurrent Operations on HDFS Files
- Using the LZC Compression Algorithm to Store HDFS Files
-
HDFS O&M Management
- HDFS Common Configuration Parameters
- HDFS Log Overview
- Viewing the HDFS Capacity
- Changing the DataNode Storage Directory
- Adjusting Parameters Related to Damaged DataNode Disk Volumes
- Configuring the Maximum Lifetime of an HDFS Token
- Using DistCp to Copy HDFS Data Across Clusters
- Configuring the NFS Server to Store NameNode Metadata
-
Common Issues About HDFS
- What Should I Do If an Error Is Reported When I Run DistCp Commands?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- What Should I Do If the HDFS Web UI Cannot Update the Information About the Damaged Data?
- What Should I Do If the HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- Why Does DataNode Fail to Report Data Blocks?
- Can I Modify the DataNode Data Storage Directory?
- What Can I Do If the DataNode Capacity Is Incorrectly Calculated?
- Why Is Data in the Cache Lost When Small Files Are Stored?
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- Why Some Blocks Are Missing on the NameNode UI?
-
HDFS Troubleshooting
- Why Is "java.net.SocketException" Reported When Data Is Written to HDFS
- It Takes a Long Time to Restart NameNode After a Large Number of Files Are Deleted
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
- The standby NameNode Fails to Be Started After It Is Powered Off During Metadata Storage
- DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated
- "ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat
-
Using Hive
- Hive User Permission Management
- Using the Hive Client
- Using Hive for Data Analysis
- Configuring Hive Data Storage and Encryption
- Hive on HBase
- Using Hive to Read Data in a Relational Database
-
Enterprise-Class Enhancement of Hive
- Configuring Automatic Removal of Old Data in the Hive Directory to the Recycle Bin
- Configuring Hive to Insert Data to a Directory That Does Not Exist
- Forbidding Location Specification When Hive Internal Tables Are Created
- Creating a Foreign Table in a Directory (Read and Execute Permission Granted)
- Configuring HTTPS/HTTP-based REST APIs
- Configuring Hive Transform
- Switching the Hive Execution Engine to Tez
- Hive Load Balancing
- Configuring Access Control Permission for the Dynamic View of a Hive Single Table
- Allowing Users without ADMIN Permission to Create Temporary Functions
- Allowing Users with Select Permission to View the Table Structure
- Allowing Only the Hive Administrator to Create Databases and Tables in the Default Database
- Configuring Hive to Support More Than 32 Roles
- Creating User-Defined Hive Functions
- Configuring High Reliability for Hive Beeline
- Hive Performance Tuning
- Hive O&M Management
- Common Hive SQL Syntax
-
Common Issues About Hive
- How Do I Delete All Permanent Functions from HiveServer?
- Why Cannot the DROP Operation Be Performed on a Backed Up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- What Are the Special Characters Not Supported by Hive in Complex Field Names?
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Data Loss Caused by Misoperations of the insert overwrite Statement?
- How Do I Handle a Slow Hive on Spark Task When HBase Is Not Installed?
- What Should I Do If an Error Is Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in Hive?
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Does the Location of a Hive Table Support Cross-OBS and Cross-HDFS Paths?
- What Should I Do If the MapReduce Engine Cannot Query the Data Written by the Union Statement Running on Tez?
- Does Hive Support Concurrent Data Writing to the Same Table or Partition?
- Does Hive Support Vectorized Query?
- What Should I Do If the Task Fails When the HDFS Data Directory of the Hive Table Is Deleted By Mistake, But The Metadata Still Exists?
- How Do I Disable the Logging Function of Hive?
- Why Is the OBS Quick Deletion Directory Not Applied After Being Added to the Custom Hive Configuration?
- Hive Configuration Problems
- Hive Troubleshooting
-
Using Hudi
- Hudi Table Overview
- Creating a Hudi Table Using Spark Shell
- Operating a Hudi Table Using hudi-cli.sh
- Hudi Write Operation
- Hudi Read Operation
- Data Management and Maintenance
- Typical Hudi Configuration Parameters
- Hudi Performance Tuning
-
Common Issues About Hudi
-
Data Write
- Parquet/Avro schema Is Reported When Updated Data Is Written
- UnsupportedOperationException Is Reported When Updated Data Is Written
- SchemaCompatabilityException Is Reported When Updated Data Is Written
- What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
- Hudi Fails to Write Decimal Data with Lower Precision
- Data Collection
- Hive Synchronization
-
Data Write
- Using Hue (Versions Earlier Than MRS 3.x)
-
Using Hue (MRS 3.x or Later)
- Accessing the Hue Web UI
- Using Hue WebUI to Operate Hive Tables
- Creating a Hue Job
- Typical Application Scenarios of the Hue Web UI
- Typical Hue Configurations
- Hue Log Overview
-
Common Issues About Hue
- Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
- Why Does the use database Statement Become Invalid in Hive?
- Why Do HDFS Files Fail to Access Through the Hue Web UI?
- Why Do Large Files Fail to Upload on the Hue Page
- Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- How Do I Solve the Problem of Setting the Time Zone of the Oozie Editor on the Hue Web UI?
- What Should I Do If It Takes a Long Time to Access the Native Hue UI and the File Browser Reports "Read timed out"?
- Using Impala
-
Using Kafka
- Kafka Data Consumption
- Kafka User Permission Management
- Using the Kafka Client
- Quickly Using Kafka to Produce and Consume Data
- Creating a Kafka Topic
- Checking the Consumption Information of Kafka Topics
- Managing Kafka Topics
- Enterprise-Class Enhancements of Kafka
- Kafka Performance Tuning
- Kafka O&M Management
- Common Issues About Kafka
- Using KafkaManager
-
Using Loader
- Using Loader from Scratch
- How to Use Loader
- Common Loader Parameters
- Creating a Loader Role
- Loader Link Configuration
- Managing Loader Links (Versions Earlier Than MRS 3.x)
- Managing Loader Links (MRS 3.x and Later Versions)
- Source Link Configurations of Loader Jobs
- Destination Link Configurations of Loader Jobs
- Managing Loader Jobs
- Preparing a Driver for MySQL Database Link
-
Importing Data
- Overview
- Importing Data Using Loader
- Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
- Typical Scenario: Importing Data from an SFTP Server to HBase
- Typical Scenario: Importing Data from an SFTP Server to Hive
- Typical Scenario: Importing Data from an FTP Server to HBase
- Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
- Typical Scenario: Importing Data from a Relational Database to HBase
- Typical Scenario: Importing Data from a Relational Database to Hive
- Typical Scenario: Importing Data from HDFS or OBS to HBase
- Typical Scenario: Importing Data from a Relational Database to ClickHouse
- Typical Scenario: Importing Data from HDFS to ClickHouse
-
Exporting Data
- Overview
- Using Loader to Export Data
- Typical Scenario: Exporting Data from HDFS or OBS to an SFTP Server
- Typical Scenario: Exporting Data from HBase to an SFTP Server
- Typical Scenario: Exporting Data from Hive to an SFTP Server
- Typical Scenario: Exporting Data from HDFS or OBS to a Relational Database
- Typical Scenario: Exporting Data from HBase to a Relational Database
- Typical Scenario: Exporting Data from Hive to a Relational Database
- Typical Scenario: Importing Data from HBase to HDFS or OBS
- Managing Jobs
- Operator Help
-
Client Tools
- Running a Loader Job by Using Commands
- loader-tool Usage Guide
- loader-tool Usage Example
- schedule-tool Usage Guide
- schedule-tool Usage Example
- Using loader-backup to Back Up Job Data
- Open Source sqoop-shell Tool Usage Guide
- Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
- Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
- Loader Log Overview
- Example: Using Loader to Import Data from OBS to HDFS
- Common Issues About Loader
- Using Kudu
-
Using MapReduce
- Configuring the Distributed Cache to Execute MapReduce Jobs
- Configuring the MapReduce Shuffle Address
- Configuring the MapReduce Cluster Administrator List
- Submitting a MapReduce Task on Windows
- Configuring the Archiving and Clearing Mechanism for MapReduce Task Logs
-
MapReduce Performance Tuning
- MapReduce Optimization Configuration for Multiple CPU Cores
- Configuring the Baseline Parameters for MapReduce Jobs
- MapReduce Shuffle Tuning
- AM Optimization for Big MapReduce Tasks
- Configuring Speculative Execution for MapReduce Tasks
- Tuning MapReduce Tasks Using Slow Start
- Optimizing the Commit Phase of MapReduce Tasks
- Improving MapReduce Client Task Reliability
- MapReduce Log Overview
-
Common Issues About MapReduce
- After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time
- How Do I Handle the Problem that MapReduce Task Has No Progress for a Long Time?
- Why Is the Client Unavailable When a Task Is Running?
- What Should I Do If HDFS_DELEGATION_TOKEN Cannot Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- What Should I Do If MapReduce Job Information Cannot Be Opened Through Tracking URL on the ResourceManager Web UI?
- Why MapReduce Tasks Fails in the Environment with Multiple NameServices?
- What Should I Do If the Partition-based Task Blacklist Is Abnormal?
- Using OpenTSDB
-
Using Oozie
- Using Oozie Client to Submit an Oozie Job
-
Using Hue to Submit an Oozie Job
- Creating a Workflow Using Hue
- Submitting an Oozie Hive2 Job Using Hue
- Submitting an Oozie HQL Script Using Hue
- Submitting an Oozie Spark2x Job Using Hue
- Submitting an Oozie Java Job Using Hue
- Submitting an Oozie Loader Job Using Hue
- Submitting an Oozie MapReduce Job Using Hue
- Submitting an Oozie Sub Workflow Job Using Hue
- Submitting an Oozie Shell Job Using Hue
- Submitting an Oozie HDFS Job Using Hue
- Submitting an Oozie Streaming Job Using Hue
- Submitting an Oozie Distcp Job Using Hue
- Submitting an Oozie SSH Job Using Hue
- Submitting a Coordinator Periodic Scheduling Job Using Hue
- Submitting a Bundle Batch Processing Job Using Hue
- Querying Oozie Job Results on the Hue Web UI
- Configuring Mutual Trust Between Oozie Nodes
- Enabling Oozie High Availability (HA)
- Oozie Log Overview
- Common Issues About Oozie
- Using Presto
- Using Ranger (MRS 1.9.2)
-
Using Ranger (MRS 3.x)
- Logging In to the Ranger Web UI
- Enabling Ranger Authentication for MRS Cluster Services
- Adding a Ranger Permission Policy
-
Configuration Examples for Ranger Permission Policy
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Impala
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark2x
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for Storm
- Viewing Ranger Audit Information
- Configuring Ranger Security Zone
- Changing the Ranger Data Source to LDAP for a Normal Cluster
- Viewing Ranger User Permission Synchronization Information
- Ranger Log Overview
-
Common Issues About Ranger
- Why Ranger Startup Fails During the Cluster Installation?
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
- Why Can't I View the Created MRS User on the Ranger Management Page?
- What Should I Do If MRS Users Failed to Be Synchronized to the Ranger Web UI?
- Using Spark (for Versions Earlier Than MRS 3.x)
-
Using Spark2x (for MRS 3.x or Later)
- Spark User Permission Management
- Using the Spark Client
- Configuring Spark to Read HBase Data
- Configuring Spark Tasks Not to Obtain HBase Token Information
- Spark Core Enterprise-Class Enhancements
- Spark SQL Enterprise-Class Enhancements
- Spark Streaming Enterprise-Class Enhancements
-
Spark Core Performance Tuning
- Spark Core Data Serialization
- Spark Core Memory Tuning
- Spark Core Memory Tuning
- Configuring Spark Core Broadcasting Variables
- Configuring Heap Memory Parameters for Spark Executor
- Using the External Shuffle Service to Improve Spark Core Performance
- Configuring Spark Dynamic Resource Scheduling in YARN Mode
- Adjusting Spark Core Process Parameters
- Spark DAG Design Specifications
- Experience
-
Spark SQL Performance Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the Spark INSERT SELECT Statement
- Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Performance Tuning
-
Spark O&M Management
- Configuring Parameters Rapidly
- Common Parameters
- Spark2x Logs
- Changing Spark Log Levels
- Viewing Container Logs on the Web UI
- Obtaining Container Logs of a Running Spark Application
- Configuring Spark Event Log Rollback
- Configuring the Number of Lost Executors Displayed in WebUI
- Configuring Local Disk Cache for JobHistory
- Enhancing Stability in a Limited Memory Condition
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
- Using CarbonData for First Query
-
Common Issues About Spark2x
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- "Failed to CREATE_FILE" Is Displayed When Data Is Inserted into the Dynamic Partitioned Table Again
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- NodeManager OOM Occurs During Spark Application Execution
- Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- How Do I Rectify the Exception Occurred When I Perform an Operation on the Table Named table?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Do I Fail to Modify MetaData by Running the Hive Command?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Does the --hivevar Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
- Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Functions Cannot Be Used When Different JDBCServers Are Connected?
- Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
- Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
- Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
-
Spark Streaming
- Same DAG Log Is Recorded Twice for a Streaming Task
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why Does the Spark Streaming Application Fail to Be Started from the Checkpoint When the Input Stream Has No Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Why Is not an Application Displayed When I Run the Application with the Empty Part File?
- Why Does Spark2x Fail to Export a Table with the Same Field Name?
- Why JRE fatal error after running Spark application multiple times?
- Native Spark2x UI Fails to Be Accessed or Is Incorrectly Displayed when Internet Explorer Is Used for Access
- How Does Spark2x Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
- Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
- Spark Shuffle Exception Handling
-
Spark Core
-
Using Sqoop
- Using Sqoop from Scratch
- Adapting Sqoop 1.4.7 to MRS 3.x Clusters
- Common Sqoop Commands and Parameters
-
Common Issues About Sqoop
- What Should I Do If Class QueryProvider Is Unavailable?
- What Should I Do If Method getHiveClient Does Not Exist?
- How Do I Do If PostgreSQL or GaussDB Fails to Connect?
- What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
- What Should I Do If Data Failed to Be Synchronized Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
- What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
-
Using Storm
- Using Storm from Scratch
- Using the Storm Client
- Submitting Storm Topologies on the Client
- Accessing the Storm Web UI
- Managing Storm Topologies
- Querying Storm Topology Logs
- Storm Common Parameters
- Configuring a Storm Service User Password Policy
- Migrating Storm Services to Flink
- Storm Log Introduction
- Performance Tuning
- Using Tez
-
Using YARN
- YARN User Permission Management
- Submitting a Task Using the Yarn Client
- Configuring Container Log Aggregation
- Enabling Yarn CGroups to Limit the Container CPU Usage
-
Enterprise-Class Enhancement of YARN
- Configuring the Yarn Permission Control
- Specifying the User Who Runs Yarn Tasks
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring ApplicationMaster Work Preserving
- Configuring the Access Channel Protocol
- Configuring the Additional Scheduler WebUI
- Configuring Resources for a NodeManager Role Instance
- Configuring Yarn Restart
- Yarn Performance Tuning
- YARN O&M Management
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Configuring the ZooKeeper Permissions
- Common ZooKeeper Parameters
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
-
Best Practices
-
Data Analytics
- Using Spark2x to Analyze IoV Drivers' Driving Behavior
- Using Hive to Load HDFS Data and Analyze Book Scores
- Using Hive to Load OBS Data and Analyze Enterprise Employee Information
- Using Flink Jobs to Process OBS Data
- Consuming Kafka Data Using Spark Streaming Jobs
- Using Flume to Collect Log Files from a Specified Directory to HDFS
- Kafka-based WordCount Data Flow Statistics Case
-
Data Migration
- Data Migration Solution
- Information Collection Before Data Migrated to MRS
- Preparing the Network Before Data Migration to MRS
- Migrating Data from Hadoop to MRS
- Migrating Data from HBase to MRS
- Migrating Data from Hive to MRS
- Using BulkLoad to Import Data to HBase in Batches
- Migrating MySQL Data to MRS Hive with CDM
- Migrating Data from MRS HDFS to OBS with CDM
- Interconnection with Other Cloud Services
-
Interconnection with Ecosystem Components
- Using DBeaver to Access Phoenix
- Using DBeaver to Access HetuEngine
- Using Tableau to Access HetuEngine
- Using Yonghong BI to Access HetuEngine
- Interconnecting Hive with External Self-Built Relational Databases
- Interconnecting Hive with External LDAP
- Interconnecting MRS Kafka with Kafka Eagle
- Using Jupyter Notebook to Connect to MRS Spark
- MRS Cluster Management
-
Data Analytics
-
Developer Guide
-
Developer Guide (LTS)
- Introduction to MRS Application Development
- Obtaining the MRS Application Development Sample Project
- MRS Application Security Authentication Description
- Preparing MRS Application Development User
- Rapidly Develop MRS Component Applications
- ClickHouse Development Guide (Security Mode)
- ClickHouse Development Guide (Normal Mode)
-
Flink Development Guide (Security Mode)
- Flink Application Development Overview
- Flink Application Development Process
- Environment Preparation
-
Developing a Flink Application
- Flink DataStream Sample Program
- Flink Kafka Sample Program
- Sample Program for Starting Checkpoint on Flink
- Flink Job Pipeline Sample Program
- Flink Join Sample Program
- Flink Jar Job Submission SQL Sample Program
- FlinkServer REST API Sample Program
- Flink Sample Program for Reading HBase Tables
- Sample Program for Reading Hudi Tables on Flink
- PyFlink Sample Program
- Commissioning the Flink Application
-
FAQs in Flink Application Development
- Common Flink APIs
- What If the Chrome Browser Cannot Display the Title
- What If the Page Is Displayed Abnormally on Internet Explorer 10/11
- What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
- What If yarn-session Start Fails When blob.storage.directory Is Set to /home
- Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
- When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
- Why Cannot I Access the Apache Flink Dashboard?
- How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
- Incorrect GLIBC Version
-
Flink Development Guide (Normal Mode)
- Overview
- Environment Preparation
-
Developing an Application
- DataStream Application
- Interconnecting with Kafka
- Asynchronous Checkpoint Mechanism
- Job Pipeline Program
- Stream SQL Join Program
- Flink Jar Job Submission SQL Sample Program
- FlinkServer REST API Sample Program
- Flink Reading Data from and Writing Data to HBase
- Sample Program for Reading Hudi Tables on Flink
- PyFlink Sample Program
- Debugging the Application
-
More Information
- Introduction to Common APIs
- Overview of RESTful APIs
- Overview of Savepoints CLI
- Introduction to Flink Client CLI
-
FAQ
- Savepoints-related Problems
- What If the Chrome Browser Cannot Display the Title
- What If the Page Is Displayed Abnormally on Internet Explorer 10/11
- What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
- What If yarn-session Start Fails When blob.storage.directory Is Set to /home
- Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
- When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
- Why Cannot I Access the Apache Flink Dashboard?
- How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
- Incorrect GLIBC Version
-
HBase Development Guide (Security Mode)
- Overview
- Environment Preparation
-
Developing an Application
-
HBase data read/write sample program
- Typical Scenario Description
- Development Idea
- Creating Configuration
- Creating Connection
- Creating a Table
- Deleting a Table
- Inserting Data
- Deleting Data
- Modifying a Table
- Reading Data Using Get
- Reading Data Using Scan
- Filtering Data
- Creating a Secondary Index
- Deleting an Index
- Secondary Index-based Query
- Multi-Point Region Division
- Creating a Phoenix Table
- Writing Data to the PhoenixTable
- Reading the PhoenixTable
- Using HBase Dual-Read
- Configuring Log4j Log Output
- HBase Rest API Invoking Sample Program
- Accessing the HBase ThriftServer Sample Program
- Sample Program for HBase to Access Multiple ZooKeepers
-
HBase data read/write sample program
- Application Commissioning
-
More Information
- SQL Query
- HBase Dual-Read Configuration Items
- External Interfaces
- Phoenix Command Line
-
FAQs
- How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
- What Are the Application Scenarios of the Bulkload and put Data-loading Modes?
- An Error Occurred When Building a JAR Package
-
HBase Development Guide (Normal Mode)
- Overview
- Environment Preparation
-
Developing an Application
-
HBase Data Read/Write Sample Program
- Typical Scenario Description
- Development Idea
- Creating Configuration
- Creating Connection
- Creating a Table
- Deleting a Table
- Modifying a Table
- Inserting Data
- Deleting Data
- Reading Data Using Get
- Reading Data Using Scan
- Filtering Data
- Creating a Secondary Index
- Deleting an Index
- Secondary Index-based Query
- Multi-Point Region Division
- Creating a Phoenix Table
- Writing Data to the PhoenixTable
- Reading the PhoenixTable
- Using HBase Dual-Read
- Configuring Log4j Log Output
- HBase Rest API Invoking Sample Program
- Accessing the HBase ThriftServer Sample Program
- Sample Program for HBase to Access Multiple ZooKeepers
-
HBase Data Read/Write Sample Program
- Application Commissioning
-
More Information
- SQL Query
- HBase Dual-Read Configuration Items
- External Interfaces
- Phoenix Command Line
-
FAQs
- How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
- What Are the Application Scenarios of the bulkload and put Data-loading Modes?
- An Error Occurred When Building a JAR Package
- HDFS Development Guide (Security Mode)
- HDFS Development Guide (Normal Mode)
-
HetuEngine Development Guide (Security Mode)
- Overview
- Environment Preparation
- Application Development
- Application Commissioning
- HetuEngine Development Guide (Normal Mode)
- Hive Development Guide (Security Mode)
- Hive Development Guide (Normal Mode)
-
IoTDB Development Guide (Security Mode)
- Overview
- Environment Preparations
- Application Development
- Application Commissioning
- More Information
-
IoTDB Development Guide (Normal Mode)
- Overview
- Environment Preparations
- Application Development
- Application Commissioning
- More Information
- Kafka Development Guide (Security Mode)
- Kafka Development Guide (Normal Mode)
- MapReduce Development Guide (Security Mode)
- MapReduce Development Guide (Normal Mode)
- Oozie Development Guide (Security Mode)
- Oozie Development Guide (Normal Mode)
-
Spark2x Development Guide (Security Mode)
- Spark Application Development Overview
- Spark Application Development Process
- Preparing a Spark Application Development Environment
-
Developing Spark Applications
- Spark Core Sample Projects
- Spark SQL Sample Projects
- Sample Projects for Accessing Spark SQL Through JDBC
-
Sample Projects for Spark to Read HBase Tables
- Operating Data in Avro Format
- Performing Operations on the HBase Data Source
- Using the BulkPut API
- Using the BulkGet API
- Using the BulkDelete API
- Using the BulkLoad API
- Using the foreachPartition API
- Distributedly Scanning HBase Tables
- Using the mapPartition API
- Writing Data to HBase Tables In Batches Using Spark Streaming
- Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
- Sample Projects for Spark to Implement Data Transition Between Hive and HBase
- Sample Projects for Connecting Spark Streaming to Kafka0-10
- Spark Structured Streaming Sample Projects
- Sample Project for Interconnecting Spark Structured Streaming with Kafka
- Sample Project for Spark Structured Streaming Status Operations
- Sample Project for Spark Concurrent Access to Two HBase Sample Projects
- Sample Project for Spark to Synchronize HBase Sata to CarbonData
- Using Spark to Execute the Hudi Sample Project
- Sample Project for Customizing Configuration Items in Hudi
- Commissioning a Spark Application
-
FAQs About Spark Application Development
- Common Spark APIs
- Structured Streaming Functions and Reliability
- How to Add a User-Defined Library
- How to Automatically Load Jars Packages?
- Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
- Privilege Control Mechanism of SparkSQL UDF Feature
- Why Does Kafka Fail to Receive the Data Written Back by SLog in to the node where the client is installed as the client installation user.park Streaming?
- Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
- Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
- How Do I Perform Remote Debugging Using IDEA?
- How Do I Submit the Spark Application Using Java Commands?
- A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
- Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?
- Restrictions on Restoring the Spark Application from the Checkpoint
- Support for Third-party JAR Packages on x86 and TaiShan Platforms
- What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
- Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
- What Should I Do If the Method of Submitting Structured Streaming Tasks Is Changed?
- Common JAR File Conflicts
-
Spark2x Development Guide (Common Mode)
- Spark Application Development Overview
- Spark Application Development Process
- Preparing a Spark Application Development Environment
-
Developing Spark Applications
- Spark Core Sample Projects
- Spark SQL Sample Projects
- Sample Projects for Accessing Spark SQL Through JDBC
-
Sample Projects for Spark to Read HBase Tables
- Operating Data in Avro Format
- Performing Operations on the HBase Data Source
- Using the BulkPut API
- Using the BulkGet API
- Using the BulkDelete API
- Using the BulkLoad API
- Using the foreachPartition API
- Distributedly Scanning HBase Tables
- Using the mapPartition API
- Writing Data to HBase Tables In Batches Using Spark Streaming
- Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
- Sample Projects for Spark to Implement Data Transition Between Hive and HBase
- Sample Projects for Connecting Spark Streaming to Kafka0-10
- Spark Structured Streaming Sample Projects
- Sample Project for Interconnecting Spark Structured Streaming with Kafka
- Sample Project for Spark Structured Streaming Status Operations
- Sample Project for Spark to Synchronize HBase Sata to CarbonData
- Using Spark to Execute the Hudi Sample Project
- Sample Project for Customizing Configuration Items in Hudi
- Commissioning a Spark Application
-
FAQs About Spark Application Development
- Common Spark APIs
- Structured Streaming Functions and Reliability
- How to Add a User-Defined Library
- How to Automatically Load Jars Packages?
- Why the "Class Does not Exist" Error Is Reported While the SparkStresmingKafka Project Is Running?
- Why Does Kafka Fail to Receive the Data Written Back by Spark Streaming?
- Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
- Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
- How Do I Perform Remote Debugging Using IDEA?
- How Do I Submit the Spark Application Using Java Commands?
- A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
- Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?
- Restrictions on Restoring the Spark Application from the Checkpoint
- Support for Third-party JAR Packages on x86 and TaiShan Platforms
- What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
- Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
- What Should I Do If the Method of Submitting Structured Streaming Tasks Is Changed?
- Common JAR File Conflicts
- YARN Development Guide (Security Mode)
- YARN Development Guide (Normal Mode)
-
Manager Management Development Guide
- Overview
- Environment Preparation
- Developing an Application
- Application Commissioning
-
More Information
- External Interfaces
-
FAQ
- JDK1.6 Fails to Connect to the FusionInsight System Using JDK1.8
- An Operation Fails and "authorize failed" Is Displayed in Logs
- An Operation Fails and "log4j:WARN No appenders could be found for logger(basicAuth.Main)" Is Displayed in Logs
- An Operation Fails and "illegal character in path at index 57" Is Displayed in Logs
- Run the curl Command to Access REST APIs
- Using Open-source JAR File Conflict Lists
- Mapping Between Maven Repository JAR Versions and MRS Component Versions
-
Developer Guide (Normal_3.x)
- Introduction to MRS Application Development
- Obtaining the MRS Application Development Sample Project
- Sample Projects of MRS Components
- Using Open-source JAR File Conflict Lists
- Mapping Between Maven Repository JAR Versions and MRS Component Versions
- Security Authentication
- ClickHouse Development Guide (Security Mode)
- ClickHouse Development Guide (Normal Mode)
-
Flink Development Guide (Security Mode)
- Overview
- Environment Preparation
- Developing an Application
- Debugging the Application
-
More Information
- Introduction to Common APIs
- Overview of RESTful APIs
- Overview of Savepoints CLI
- Introduction to Flink Client CLI
-
FAQ
- Savepoints-related Problems
- What If the Chrome Browser Cannot Display the Title
- What If the Page Is Displayed Abnormally on Internet Explorer 10/11
- What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
- What If yarn-session Start Fails When blob.storage.directory Is Set to /home
- Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
- When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
- Why Cannot I Access the Apache Flink Dashboard?
- How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
- Incorrect GLIBC Version
-
Flink Development Guide (Normal Mode)
- Overview
- Environment Preparation
- Developing an Application
- Debugging the Application
-
More Information
- Introduction to Common APIs
- Overview of RESTful APIs
- Overview of Savepoints CLI
- Introduction to Flink Client CLI
-
FAQ
- Savepoints-related Problems
- What If the Chrome Browser Cannot Display the Title
- What If the Page Is Displayed Abnormally on Internet Explorer 10/11
- What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large
- What If yarn-session Start Fails When blob.storage.directory Is Set to /home
- Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
- When I Use a Newly Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
- Why Cannot I Access the Apache Flink Dashboard?
- How Do I View the Debugging Information Printed Using System.out.println or Export the Debugging Information to a Specified File?
- Incorrect GLIBC Version
-
HBase Development Guide (Security Mode)
- Overview
-
Environment Preparation
- Preparing the Development Environment
- Preparing the Configuration Files for Connecting to the Cluster
- Configuring and Importing a Sample Project
-
Preparing for Security Authentication
- Security Authentication for HBase Data Read and Write (Single-Cluster Scenario)
- HBase Service Data Read/Write Example Security Authentication (Multi-Cluster Mutual Trust Scenario)
- Accessing HBase REST Service Security Authentication
- Authentication for Accessing the ThriftServer Service
- Authentication for Accessing Multiple ZooKeepers
-
Developing an Application
-
Reading/Writing Data
- Typical Scenario Description
- Development Idea
- Creating Configuration
- Creating Connection
- Creating a Table
- Deleting a Table
- Modifying a Table
- Inserting Data
- Deleting Data
- Reading Data Using Get
- Reading Data Using Scan
- Filtering Data
- Creating a Secondary Index
- Deleting an Index
- Secondary Index-based Query
- Multi-Point Region Division
- Creating a Phoenix Table
- Writing Data to the PhoenixTable
- Reading the PhoenixTable
- Using HBase Dual-Read Capability
- Configuring Log4j Log Output
- Calling REST Interfaces
- Accessing HBase ThriftServer
- Accessing Multiple ZooKeepers with HBase
-
Reading/Writing Data
- Application Commissioning
-
More Information
- SQL Query
- HBase Dual-Read Configuration Items
- External Interfaces
- HBase Access Configuration on Windows Using EIPs
- Phoenix Command Line
-
FAQs
- How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
- What Are the Application Scenarios of the Bulkload and put Data-loading Modes?
- An Error Occurred When Building a JAR Package
-
HBase Development Guide (Normal Mode)
- Overview
- Environment Preparation
-
Developing an Application
-
Reading/Writing Data
- Service Scenario Description
- Application Development Approach
- Creating Configuration
- Creating Connection
- Creating a Table
- Deleting a Table
- Modifying a Table
- Inserting Data
- Deleting Data
- Reading Data Using Get
- Reading Data Using Scan
- Filtering Data
- Creating a Secondary Index
- Deleting an Index
- Secondary Index-based Query
- Multi-Point Region Division
- Creating a Phoenix Table
- Writing Data to the PhoenixTable
- Reading the PhoenixTable
- Using HBase Dual-Read Capability
- Configuring Log4j Log Output
- Calling REST Interfaces
- Accessing HBase ThriftServer
- Accessing Multiple ZooKeepers with HBase
-
Reading/Writing Data
- Application Commissioning
-
More Information
- SQL Query
- HBase Dual-Read Configuration Items
- External Interfaces
- HBase of the Cluster in Normal Mode Access Configuration on Windows Using EIPs
- Phoenix Command Line
-
FAQs
- How to Rectify the Fault When an Exception Occurs During the Running of an HBase-developed Application and "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory" Is Displayed in the Error Information?
- What Are the Application Scenarios of the bulkload and put Data-loading Modes?
- An Error Occurred When Building a JAR Package
-
HDFS Development Guide (Security Mode)
- HDFS Application Development Overview
- HDFS Application Development Process
- HDFS Sample Project
- Preparing an HDFS Application Development Environment
-
Developing an HDFS Application
- HDFS Application Development Approach
- Initializing the HDFS
- Creating an HDFS Directories
- Writing Data into an HDFS File
- Appending Data to an HDFS File
- Reading Data from an HDFS File
- Deleting a File
- Deleting Directories
- Multi-Thread Tasks
- Setting Storage Policies
- Configuring the HDFS Colocation Policy
- Commissioning an HDFS Application
- FAQs in HDFS Application Development
-
HDFS Development Guide (Normal Mode)
- HDFS Application Development Overview
- HDFS Application Development Process
- HDFS Sample Project
- Preparing an HDFS Application Development Environment
-
Developing an HDFS Application
- HDFS Application Development Approach
- Initializing the HDFS
- Creating an HDFS Directories
- Writing Data into an HDFS File
- Writing Data into an HDFS File
- Reading Data from an HDFS File
- Deleting a File
- Deleting Directories
- Multi-Thread Tasks
- Setting Storage Policies
- Configuring the HDFS Colocation Policy
- Commissioning an HDFS Application
- FAQs in HDFS Application Development
-
Hive Development Guide (Security Mode)
- Overview
- Preparing the Environment
- Developing an Application
- Debugging the Application
- More Information
- Hive Development Guide (Normal Mode)
- Impala Development Guide (Security Mode)
- Impala Development Guide (Normal Mode)
- Kafka Development Guide (Security Mode)
- Kafka Development Guide (Normal Mode)
- Kudu Development Guide (Security Mode)
- Kudu Development Guide (Normal Mode)
- MapReduce Development Guide (Security Mode)
- MapReduce Development Guide (Normal Mode)
- Oozie Development Guide (Security Mode)
- Oozie Development Guide (Normal Mode)
-
Spark2x Development Guide (Security Mode)
- Introduction to Spark Application Development
- Spark Application Development Process
- Spark2x Sample Project
-
Preparing a Spark Application Development Environment
- Preparing a Local Application Development Environment
- Preparing the Configuration File for Connecting Spark to the Cluster
- Importing and Configuring Spark Sample Projects
- (Optional) Creating Spark Sample Projects
- Configuring Security Authentication for Spark Applications
- Configuring the Spark Python3 Sample Project
-
Developing a Spark Application
- Spark Core Sample Projects
- Spark SQL Sample Projects
- Sample Projects for Accessing Spark SQL Through JDBC
-
Sample Projects for Spark to Read HBase Tables
- Performing Operations on Data in Avro Format
- Performing Operations on the HBase Data Source
- Using the BulkPut Interface
- Using the BulkGet Interface
- Using the BulkDelete Interface
- Using the BulkLoad Interface
- Using the foreachPartition Interface
- Distributedly Scanning HBase Tables
- Using the mapPartition Interface
- Writing Data to HBase Tables In Batches Using SparkStreaming
- Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
- Sample Projects for Spark to Implement Data Transition Between Hive and HBase
- Sample Projects for Connecting Spark Streaming to Kafka0-10
- Spark Structured Streaming Sample Projects
- Sample Project for Interconnecting Spark Structured Streaming with Kafka
- Sample Project for Spark Structured Streaming Status Operations
- Sample Project for Spark Concurrent Access to Two HBase Sample Projects
- Sample Project for Spark to Synchronize HBase Sata to CarbonData
- Using Spark to Execute the Hudi Sample Project
- Sample Project for Customizing Configuration Items in Hudi
- Commissioning a Spark Application
-
FAQs About Spark Application Development
- Common Spark APIs
- Structured Streaming Functions and Reliability
- How to Add a User-Defined Library
- How to Automatically Load Jars Packages?
- Why the "Class Does not Exist" Error Is Reported While the SparkStresmingKafka Project Is Running?
- Privilege Control Mechanism of SparkSQL UDF Feature
- Why Does Kafka Fail to Receive the Data Written Back by SLog in to the node where the client is installed as the client installation user.park Streaming?
- Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
- Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
- How to Perform Remote Debugging Using IDEA?
- How to Submit the Spark Application Using Java Commands?
- A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
- Application Fails When ApplicationManager Is Terminated During Data Processing in the Cluster Mode of Structured Streaming
- Restrictions on Restoring the Spark Application from the checkpoint
- Support for Third-party JAR Packages on x86 and TaiShan Platforms
- What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
- Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
- What Should I Do If the Structured Streaming Task Submission Way Is Changed?
- Common JAR File Conflicts
-
Spark2x Development Guide (Normal Mode)
- Introduction to Spark Application Development
- Spark Application Development Process
- Spark2x Sample Project
- Preparing a Spark Application Development Environment
-
Developing Spark Applications
- Spark Core Sample Projects
- Spark SQL Sample Projects
- Sample Projects for Accessing Spark SQL Through JDBC
-
Sample Projects for Spark to Read HBase Tables
- Performing Operation on Data in Avro Format
- Performing Operations on the HBase Data Source
- Using the BulkPut Interface
- Using the BulkGet Interface
- Using the BulkDelete Interface
- Using the BulkLoad Interface
- Using the foreachPartition Interface
- Distributedly Scanning HBase Tables
- Using the mapPartition Interface
- Writing Data to HBase Tables In Batches Using SparkStreaming
- Sample Projects for Spark to Implement Bidirectional Data Exchange with HBase
- Sample Projects for Spark to Implement Data Transition Between Hive and HBase
- Sample Projects for Connecting Spark Streaming to Kafka0-10
- Spark Structured Streaming Sample Projects
- Sample Project for Interconnecting Spark Structured Streaming with Kafka
- Sample Project for Spark Structured Streaming Status Operations
- Sample Project for Spark to Synchronize HBase Sata to CarbonData
- Using Spark to Execute the Hudi Sample Project
- Sample Project for Customizing Configuration Items in Hudi
- Commissioning a Spark Application
-
FAQs About Spark Application Development
- Common Spark APIs
- Structured Streaming Functions and Reliability
- How to Add a User-Defined Library
- How to Automatically Load Jars Packages?
- Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
- Why Does Kafka Fail to Receive the Data Written Back by Spark Streaming?
- Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
- Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
- How to Perform Remote Debugging Using IDEA?
- How to Submit the Spark Application Using Java Commands?
- A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
- Application Fails When ApplicationManager Is Terminated During Data Processing in the Cluster Mode of Structured Streaming
- Restrictions on Restoring the Spark Application from the checkpoint
- Support for Third-party JAR Packages on x86 and TaiShan Platforms
- What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
- Error Code 139 Reported When Python Pipeline Runs in the ARM Environment
- What Should I Do If the Structured Streaming Task Submission Way Is Changed?
- Common JAR File Conflicts
-
Storm Development Guide (Security Mode)
- Overview
- Environment Preparation
- Developing an Application
- Running an Application
- More Information
-
Storm Development Guide (Normal Mode)
- Overview
- Environment Preparation
- Developing an Application
- Running an Application
- More Information
- YARN Development Guide (Security Mode)
- YARN Development Guide (Normal Mode)
-
Developer Guide (Normal_Earlier Than 3.x)
- Introduction to MRS Application Development
- Obtaining the MRS Application Development Sample Project
- Sample Projects of MRS Components
- Alluxio Development Guide
-
Flink Development Guide
- Flink Application Development Overview
- Preparing a Flink Application Development Environment
- Developing Flink Applications
- Commissioning a Flink Application
-
FAQs About Flink Application Development
- Flink Savepoints CLI
- Flink Client CLI
- Flink Performance Tuning Suggestions
- Savepoints FAQs
- What Should I Do If Running a Checkpoint Is Slow When RocksDBStateBackend is Set for the Checkpoint and a Large Amount of Data Exists?
- What Should I Do If yarn-session Failed to Be Started When blob.storage.directory Is Set to /home?
- Why Does Non-static KafkaPartitioner Class Object Fail to Construct FlinkKafkaProducer010?
- When I Use the Newly-Created Flink User to Submit Tasks, Why Does the Task Submission Fail and a Message Indicating Insufficient Permission on ZooKeeper Directory Is Displayed?
- Why Can't I Access the Flink Web Page?
-
HBase Development Guide
- HBase Application Development Overview
- Preparing an HBase Application Development Environment
-
Developing an HBase Application
- HBase Development Plan
- Creating the Configuration Object
- Creating a Connection Object
- Creating an HBase Table
- Deleting an HBase Table
- Modifying an HBase Table
- Inserting HBase Data
- Deleting HBase Data
- Reading HBase Data Using the GET Command
- Reading HBase Data Using the Scan Command
- Using an HBase Filter
- Adding an HBase Secondary Index
- Enabling or Disabling an HBase Secondary Index
- Querying the HBase Secondary Index List
- Using an HBase Secondary Index to Read Data
- Deleting an HBase Secondary Index
- HBase Multi-Point Region Splitting
- Configuring HBase ACL Security Policies
- Commissioning an HBase Application
- FAQs About HBase Application Development
- HDFS Development Guide
- Hive Development Guide
- Impala Development Guide
-
Kafka Development Guide
- Kafka Application Development Overview
- Preparing a Kafka Application Development Environment
-
Developing a Kafka Application
- Kafka Development Plan
- Kafka Old Producer API Usage Sample
- Kafka Old Consumer API Usage Sample
- Kafka Producer API Usage Sample
- Kafka Consumer API Usage Sample
- Kafka Multi-Thread Producer API Usage Sample
- Kafka Multi-Thread Consumer API Usage Sample
- Kafka SimpleConsumer API Usage Sample
- Kafka Configuration File
- Commissioning a Kafka Application
- FAQs About Kafka Application Development
- MapReduce Development Guide
- OpenTSDB Development Guide
- Presto Development Guide
-
Spark Development Guide
- Spark Application Development Overview
-
Preparing a Spark Application Development Environment
- Spark Application Development Environment
- Preparing a Spark Application Development User
- Preparing a Java Development Environment for Spark
- Preparing a Scala Development Environment for Spark
- Preparing a Python Development Environment for Spark
- Preparing a Spark Application Running Environment
- Importing and Configuring Spark Sample Projects
- (Optional) Creating a Spark Application Development Project
- Configuring Security Authentication for Spark Applications
-
Developing a Spark Application
- Spark Core Application
- Spark SQL Application
- Spark Streaming Application
- Application for Accessing Spark SQL Through JDBC
- Spark on HBase Application
- Reading Data from HBase and Writing Data Back to HBase
- Reading Data from Hive and Write Data to HBase
- Using Streaming to Read Data from Kafka and Write Data to HBase
- Application for Connecting Spark Streaming to Kafka0-10
- Structured Streaming Application
- Commissioning a Spark Application
-
FAQs About Spark Application Development
- Spark APIs
-
Spark Application Tuning
-
Spark Core Tuning
- Data Serialization
- Memory Configuration Optimization
- Setting a Degree of Parallelism
- Using Broadcast Variables
- Using the External Shuffle Service to Improve Performance
- Configuring Dynamic Resource Scheduling in Yarn Mode
- Configuring Process Parameters
- Designing a Direction Acyclic Graph (DAG)
- Experience Summary
- SQL and DataFrame Tuning
- Spark Streaming Tuning
- Spark CBO Tuning
-
Spark Core Tuning
- How Do I Add a Dependency Package with Customized Codes?
- How Do I Handle the Dependency Package That Is Automatically Loaded?
- Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
- Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?
- Why the Name of the Spark Application Submitted in Yarn-Cluster Mode Does not Take Effect?
- How Do I Submit the Spark Application Using Java Commands?
- How Does the Permission Control Mechanism Work for the UDF Function in SparkSQL?
- Why Does Kafka Fail to Receive the Data Written Back by Spark Streaming?
- How Do I Perform Remote Debugging Using IDEA?
- A Message Stating "Problem performing GSS wrap" Is Displayed When IBM JDK Is Used
- What Should I Do If FileNotFoundException Occurs When spark-submit Is Used to Submit a Job in Spark on Yarn Client Mode?
- What Should I Do If the "had a not serializable result" Error Is Reported When a Spark Task Reads HBase Data?
- How Do I Connect to Hive and HDFS of an MRS Cluster when the Spark Program Is Running on a Local Host?
- Storm Development Guide
-
Component Development Specifications
- ClickHouse
- Doris
- Flink
- HBase
- HDFS
- Hive
-
Hudi
- Hudi Development Specifications Overview
- Hudi Data Sheet Design Specification
- Hudi Data Table Management Operation Specifications
-
Spark on Hudi Development Specifications
-
Spark Read/Write Hudi Development Specifications
- SparkSQL table creation parameter specifications
- Specifications for Spark to read Hudi parameters in incremental mode
- Specifications for setting the compaction parameter in the Spark asynchronous task execution table
- Spark Table Data Maintenance Specifications
- Suggestions for Spark Concurrently Write Hudi Data
- Suggestions on configuring resources for Spark read and write Hudi resources
- Spark On Hudi Performance Optimization
-
Spark Read/Write Hudi Development Specifications
- Bucket Tuning Example
- IoTDB
- Kafka
- Mapreduce
- Spark
-
Developer Guide (LTS)
- SDK Reference
-
Troubleshooting
- Account Passwords
- Account Permissions
-
Common Exceptions in Logging In to the Cluster Manager
- Failed to Access Manager of an MRS Cluster
-
Accessing the Web Pages
- Error "502 Bad Gateway" Is Reported During the Access to MRS Manager
- An Error Message Occurs Indicating that the VPC Request Is Incorrect During the Access
- Error 503 Is Reported When Manager Is Accessed Through Direct Connect
- Error Message "You have no right to access this page." Is Displayed When Users log in to the Cluster Page
- Error Message "Invalid credentials" Is Displayed When a User Logs In to Manager
- Failed to Log In to the Manager After Timeout
- Failed to Log In to MRS Manager After the Python Upgrade
- Failed to Log In to MRS Manager After Changing the Domain Name
- Manager Page Is Blank After a Success Login
- Cluster Login Fails Because Native Kerberos Is Installed on Cluster Nodes
- Using Google Chrome to Access MRS Manager on macOS
- How Do I Unlock a User Who Logs in to Manager?
- Why Does the Manager Page Freeze?
-
Common Exceptions in Accessing the MRS Web UI
- How Do I Do If an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of HDFS, Hue, YARN, HetuEngine, and Flink?
- Error 500 Is Reported When a User Accesses the Component Web UI
- [HBase WebUI] Users cannot switch from the HBase WebUI to the RegionServer WebUI
- [HDFS WebUI] When users access the HDFS WebUI, an error message is displayed indicating that the number of redirections is too large
- [HDFS WebUI] Failed to access the HDFS WebUI using the Internet Explorer
- [Hue Web UI] A "No Permission" Error Is Displayed When a User Log In to the Hue Web UI
- [Hue Web UI] Failed to Access the Hue Web UI
- [Hue WebUI] The error "Proxy Error" is reported when a user accesses the Hue WebUI
- [Hue WebUI] Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- Hue (Active) Cannot Open Web Pages
- [Ranger WebUI] Why Cannot a New User Log In to Ranger After Changing the Password?
- [Tez WebUI] Error 404 is reported when users access the Tez WebUI
- [Spark WebUI] Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- [Spark WebUI] What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- [Spark WebUI] Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
- [Spark WebUI] The Spark2x WebUI fails to be accessed using the Internet Explorer
- [Yarn Web UI] Failed to Access the Yarn Web UI
- APIs
-
Cluster Management
- Failed to Reduce Task Nodes
- OBS Certificate in a Cluster Expired
- Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
- Replacing a Disk in an MRS Cluster (Applicable to 3.x)
- Failed to Execute an MRS Backup Task
- Inconsistency Between df and du Command Output on the Core Node
- Disassociating a Subnet from a Network ACL
- MRS Cluster Becomes Abnormal After the Hostname of a Node Is Changed
- Processes Are Terminated Unexpectedly
- Failed to Configure Cross-Cluster Mutual Trust for MRS
- Network Is Unreachable When Python Is Installed on an MRS Cluster Node Using Pip3
- Connecting the Open-Source confluent-kafka-go to an MRS Security Cluster
- Failed to Execute the Periodic Backup Task of an MRS Cluster
- Failed to Download the MRS Cluster Client
- An Error Is Reported When a Flink Job Is Submitted in an MRS Cluster with Kerberos Authentication Enabled
- An Error Is Reported When the Insert Command Is Executed on the Hive Beeline CLI
- Upgrading the OS to Fix Vulnerabilities for an MRS Cluster Node
- Failed to Migrate Data to MRS HDFS Using CDM
- Alarms Indicating Heartbeat Interruptions Between Nodes Are Frequently Generated in the MRS Cluster
- High Memory Usage of the PMS Process
- High Memory Usage of the Knox Process
- It Takes a Long Time to Access HBase from a Client Outside a Security Cluster
- Failed to Submit Jobs
- OS Disk Space Is Insufficient Due to Oversized HBase Log Files
- OS Disk Space Is Insufficient Due to Oversized HDFS Log Files
- An Exception Occurs During Specifications Upgrade of Nodes in an MRS Cluster
- Failed to Delete a New Tenant on FusionInsight Manager
- MRS Cluster Becomes Unavailable After the VPC Is Changed
- Failed to Submit Jobs on the MRS Console
- Error "symbol xxx not defined in file libcrypto.so.1.1" Is Displayed During HA Certificate Generation
- Some Instances Fail to Be Started After Core Nodes Are Added to the MRS Cluster
- Using Alluixo
- Using ClickHouse
-
Using DBService
- DBServer Instance Is in Abnormal Status
- DBServer Instance Remains in the Restoring State
- Default Port 20050 or 20051 of DBService Is Occupied
- DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
- Failed to Execute a DBService Backup Task
- Components Failed to Connect to DBService in Normal State
- DBServer Failed to Start
- DBService Backup Failed Because the Floating IP Address Is Unreachable
- DBService Failed to Start Due to the Loss of the DBService Configuration File
-
Using Flink
- Error Message "Error While Parsing YAML Configuration File: Security.kerberos.login.keytab" Is Displayed When a Command Is Executed on the Flink Client
- Error Message "Error while parsing YAML configuration file : security.kerberos.login.principal:pippo" Is Displayed When a Command Is Executed on the Flink Client
- Error Message "Could Not Connect to the Leading JobManager" Is Displayed When a Command Is Executed on the Flink Client
- Failed to Create a Flink Cluster by Running yarn-session As Different Users
- Flink Service Program Fails to Read Files on the NFS Disk
- Failed to Customize the Flink Log4j Log Level
- Using Flume
-
Using HBase
- Slow Response to HBase Connection
- Failed to Authenticate the HBase User
- RegionServer Failed to Start Because the Port Is Occupied
- HBase Failed to Start Due to Insufficient Node Memory
- HBase Service Unavailable Due to Poor HDFS Performance
- HBase Failed to Start Due to Inappropriate Parameter Settings
- RegionServer Failed to Start Due to Residual Processes
- HBase Failed to Start Due to a Quota Set on HDFS
- HBase Failed to Start Due to Corrupted Version Files
- High CPU Usage Caused by Zero-Loaded RegionServer
- HBase Failed to Start with "FileNotFoundException" in RegionServer Logs
- The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
- RegionServer Instance Is in the Restoring State
- HBase Failed to Start in a Newly Installed Cluster
- HBase Failed to Start Due to the Loss of the ACL Table Directory
- HBase Failed to Start After the Cluster Is Powered Off and On
- Failed to Import HBase Data Due to Oversized File Blocks
- Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
- Failed to Run the hbase shell Command on the MRS Cluster Client
- Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
- HBase Failed to Start Due to Insufficient RegionServer Memory
- Failed to Start HRegionServer on the Node Newly Added to the Cluster
- Region in the RIT State for a Long Time Due to HBase File Loss
-
Using HDFS
- HDFS NameNode Instances Become Standby After the RPC Port Is Changed
- An Error Is Reported When the HDFS Client Is Connected Through a Public IP Address
- Failed to Use Python to Remotely Connect to the Port of HDFS
- HDFS Capacity Reaches 100%, Causing Unavailable Upper-Layer Services Such as HBase and Spark
- Error Message "Permission denied" Is Displayed When HDFS and Yarn Are Started
- HDFS Users Can Create or Delete Files in Directories of Other Users
- A DataNode of HDFS Is Always in the Decommissioning State
- HDFS NameNode Failed to Start Due to Insufficient Memory
- A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
- CPU Usage of DataNodes Is Close to 100% Occasionally, Causing Node Loss
- Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
- Error "Failed to place enough replicas" Is Reported When HDFS Reads or Writes Files
- Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
- HDFS Client File Fails to Be Closed After Data Writing
- File Fails to Be Uploaded to HDFS Due to File Errors
- After dfs.blocksize Is Configured on the UI and Data Is Uploaded, the Block Size Does Not Change
- HDFS File Fails to Be Read, and Error Message "FileNotFoundException" Is Displayed
- Failed to Write Files to HDFS, and Error Message "item limit of xxx is exceeded" Is Displayed
- Adjusting the Log Level of the HDFS SHDFShell Client
- HDFS File Read Fails, and Error Message "No common protection layer" Is Displayed
- Failed to Write Files Because the HDFS Directory Quota Is Insufficient
- Balancing Fails, and Error Message "Source and target differ in block-size" Is Displayed
- Failed to Query or Delete HDFS Files
- Uneven Data Distribution Due to Non-HDFS Data Residuals
- Uneven Data Distribution Due to HDFS Client Installation on the DataNode
- Unbalanced DataNode Disk Usages of a Node
- Locating Common Balance Problems
- HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
- Error Message "error creating DomainSocket" Is Displayed When the HDFS Client Installed on the Core Node in a Normal Cluster Is Used
- HDFS Files Fail to Be Uploaded When the Client Is Installed on a Node Outside the Cluster
- Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
- HDFS Client Failed to Delete Overlong Directories
- An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
- "ALM-12027 Host PID Usage Exceeds the Threshold" Is Generated for a NameNode
- ALM-14012 JournalNode Is Out of Synchronization Is Generated in the Cluster
- Failed to Decommission a DataNode Due to HDFS Block Loss
- An Error Is Reported When DistCP Is Used to Copy an Empty Folder
-
Using Hive
- Common Hive Logs
- Failed to Start Hive
- Error Message "Cannot modify xxx at runtime" Is Displayed When the set Command Is Executed in a Security Cluster
- Specifying a Queue When Submitting a Hive Task
- Setting the Map/Reduce Memory on the Client
- Specifying the Output File Compression Format When Importing a Hive Table
- Description of the Hive Table Is Too Long to Be Completely Displayed
- NULL Is Displayed When Data Is Inserted After the Partition Column Is Added to a Hive Table
- New User Created in the Cluster Does Not Have the Permission to Query Hive Data
- An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
- An Error Is Reported When the "load data inpath" Command Is Executed
- An Error Is Reported When the "load data local inpath" Command Is Executed
- An Error Is Reported When the create external table Command Is Executed
- An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
- Insufficient Permissions to Execute the set role admin Command
- An Error Is Reported When a UDF Is Created on the Beeline Client
- Hive Is Faulty
- Difference Between Hive Service Health Status and Hive Instance Health Status
- "authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
- Failed to Access ZooKeeper from the Client
- "Invalid function" Is Displayed When a UDF Is Used
- Hive Service Status Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is unknown
- Health Status of a HiveServer or MetaStore Instance Is Concerning
- Garbled Characters Returned Upon a Query If Text Files Are Compressed Using ARC4
- Hive Task Failed to Run on the Client but Successful on Yarn
- Error Message "Execution Error return code 2" Is Displayed When the SELECT Statement Is Executed
- Failed to Perform drop partition When There Are a Large Number of Partitions
- Failed to Start the Local Task When the Join Operation Is Performed
- WebHCat Fails to Be Started After the Hostname Is Changed
- An Error Is Reported When the Hive Sample Program Is Running After the Domain Name of a Cluster Is Changed
- Hive MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
- Error Message "Failed to execute session hooks: over max connections" Is Displayed on the Beeline Client
- Error Message "OutOfMemoryError" Is Displayed on the Beeline Client
- Task Execution Fails Because the Input File Number Exceeds the Threshold
- Hive Task Execution Fails Because of Stack Memory Overflow
- Task Failed Due to Concurrent Writes to One Table or Partition
- Hive Task Failed Due to a Lack of HDFS Directory Permission
- Failed to Load Data to Hive Tables
- Failed to Run the Application Developed Based on the Hive JDBC Code Case
- HiveServer and HiveHCat Process Faults
- Error Message "ConnectionLoss for hiveserver2" Is Displayed When MRS Hive Connects to ZooKeeper
- An Error Is Reported When Hive Executes the insert into Statement
- Timeout Reported When Adding the Hive Table Field
- Failed to Restart Hive
- Failed to Delete a Table Due to Excessive Hive Partitions
- An Error Is Reported When msck repair table Is Executed on Hive
- Insufficient User Permission for Running the insert into Command on Hive
- Releasing Disk Space After Dropping a Table in Hive
- Abnormal Hive Query Due to Damaged Data in the JSON Table
- Connection Timed Out During SQL Statement Execution on the Hive Client
- WebHCat Failed to Start Due to Abnormal Health Status
- WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
- An SQL Error Is Reported When the Number of MetaStore Dynamic Partitions Exceeds the Threshold
-
Using Hue
- An Unknown Job Is Running on the Hue Page
- HQL Fails to Be Executed on Hue Using Internet Explorer
- Failed to Access the Hue Web UI
- HBase Tables Cannot Be Loaded on the Hue Web UI
- Chinese Characters Entered in the Hue Text Box Are Displayed Incorrectly
- An Error Is Reported If the Query Result of an Impala SQL Statement Executed on Hue Contains Chinese Characters
- Using Impala
-
Using Kafka
- An Error Is Reported When the Kafka Client Is Run to Obtain Topics
- Using Python3.x to Connect to Kafka in a Security Cluster
- Flume Normally Connects to Kafka but Fails to Send Messages
- Producer Fails to Send Data and Error Message "NullPointerException" Is Displayed
- Producer Fails to Send Data and Error Message "TOPIC_AUTHORIZATION_FAILED" Is Displayed
- Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
- Consumer Is Initialized Successfully, but the Specified Topic Message Cannot Be Obtained from Kafka
- Consumer Fails to Consume Data and Remains in the Waiting State
- SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
- Consumer Fails to Consume Data in a Newly Created Cluster, and Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
- SparkStreaming Fails to Consume Kafka Messages, and Message "Couldn't find leader offsets" Is Displayed
- Consumer Fails to Consume Data and Message "SchemaException: Error reading field" Is Displayed
- Kafka Consumer Loses Consumed Data
- Failed to Start Kafka Due to Account Lockout
- Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
- Kafka Topics Cannot Be Deleted
- Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
- When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
- Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
- When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
- When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
- Consumer Repeatedly Consumes Data
- Leader for the Created Kafka Topic Partition Is Displayed as none
- Safety Instructions on Using Kafka
- Obtaining Kafka Consumer Offset Information
- Adding or Deleting Configurations for a Topic
- Reading the Content of the __consumer_offsets Internal Topic
- Configuring Logs for Shell Commands on the Kafka Client
- Obtaining Topic Distribution Information
- Kafka HA Usage Description
- Failed to Manage a Kafka Cluster Using the Kafka Shell Command
- Kafka Producer Writes Oversized Records
- Kafka Consumer Reads Oversized Records
- High Usage of Multiple Disks on a Kafka Cluster Node
- Kafka Is Disconnected from the ZooKeeper Client
- Using Oozie
-
Using Presto
- During sql-standard-with-group Configuration, a Schema Fails to Be Created and the Error Message "Access Denied" Is Displayed
- Coordinator Process of Presto Cannot Be Started
- When Presto Queries a Kudu Table, an Error Is Reported Indicating That the Table Cannot Be Found
- No Data is Found in the Hive Table Using Presto
- Error Message "The node may have crashed or be under too much load" Is Displayed During MRS Presto Query
- Accessing Presto from an MRS Cluster Through a Public Network
-
Using Spark
- An Error Is Reported When the Split Size Is Changed for a Running Spark Application
- Incorrect Parameter Format Is Displayed When a Spark Task Is Submitted
- Spark, Hive, and Yarn Are Unavailable Due to Insufficient Disk Capacity
- A Spark Job Fails to Run Due to Incorrect JAR File Import
- Spark Job Suspended Due to Insufficient Memory or Lack of JAR Packages
- Error "ClassNotFoundException" Is Reported When a Spark Task Is Submitted
- Driver Displays a Message Indicating That the Running Memory Exceeds the Threshold When a Spark Task Is Submitted
- Error "Can't get the Kerberos realm" Is Reported When a Spark Task Is Submitted in Yarn-Cluster Mode
- Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
- ApplicationMaster Fails to Start Twice When a Spark Task Is Submitted in Yarn-client Mode
- Failed to Connect to ResourceManager When a Spark Task Is Submitted
- DataArts Studio Failed to Schedule Spark Jobs
- Job Status Is error After a Spark Job Is Submitted Through an API
- ALM-43006 Is Repeatedly Reported for the MRS Cluster
- Failed to Create or Delete a Table in Spark Beeline
- Failed to Connect to the Driver When a Spark Job Is Submitted on a Node Outside the Cluster
- Large Number of Shuffle Results Are Lost During Spark Task Execution
- Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
- Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
- Spark Task Submission Failure
- Spark Task Execution Failure
- JDBCServer Connection Failure
- Failed to View Spark Task Logs
- Spark Streaming Task Submission Issues
- Authentication Fails When Spark Connects to Other Services
- Authentication Fails When Spark Connects to Kafka
- An Error Occurs When SparkSQL Reads the ORC Table
- Failed to Switch to the Log Page from stderr and stdout on the Native Spark Web UI
- An Error Is Reported When spark-beeline Is Used to Query a Hive View
-
Using Sqoop
- Connecting Sqoop to MySQL
- Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
- An Error Is Reported When a Sqoop Task Is Created Using Hue to Import Data from HBase to HDFS
- A Data Format Error Is Reported When Data Is Exported from Hive to MySQL 8.0 Using Sqoop
- An Error Is Reported When the sqoop import Command Is Executed to Extract Data from PgSQL to Hive
- Failed to Use Sqoop to Read MySQL Data and Write Parquet Files to OBS
- An Error Is Reported When Database Data Is Migrated Using Sqoop
-
Using Storm
- Invalid Hyperlink of Events on the Storm Web UI
- Failed to Submit the Storm Topology
- Failed to Submit the Storm Topology and Message "Failed to check principle for keytab" Is Displayed
- Worker Logs Are Empty After the Storm Topology Is Submitted
- Worker Runs Abnormally After the Storm Topology Is Submitted and Error "Failed to bind to XXX" Is Displayed
- "well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
- Data Cannot Be Written to Bolts When the Storm-JDBC Plug-in Is Used to Develop Oracle Databases
- Internal Server Error Is Displayed When the User Queries Information on the Storm UI
- Using Ranger
-
Using Yarn
- A Large Number of Jobs Occupying Resources After Yarn Is Started in a Cluster
- Error "GC overhead" Is Reported When Tasks Are Submitted Using the hadoop jar Command on the Client
- Disk Space of a Node Is Used Up Due to Oversized Aggregated Logs of Yarn
- Temporary Files Are Not Deleted When a MapReduce Job Is Abnormal
- Incorrect Port Information of the Yarn Client Causes Error "connection refused" After a Task Is Submitted
- "Could not access logs page!" Is Displayed When Job Logs Are Queried on the Yarn Web UI
- Error "ERROR 500" Is Displayed When Queue Information Is Queried on the Yarn Web UI
- Error "ERROR 500" Is Displayed When Job Logs Are Queried on the Yarn Web UI
- An Error Is Reported When a Yarn Client Command Is Used to Query Historical Jobs
- Number of Files in the TimelineServer Directory Reaches the Upper Limit
- Using ZooKeeper
-
Storage-Compute Decoupling
- A User Without the Permission on the /tmp Directory Failed to Execute a Job for Accessing OBS
- When the Hadoop Client Is Used to Delete Data from OBS, It Does Not Have the Permission for the .Trash Directory
- An MRS Cluster Fails Authentication When Accessing OBS Because the NTP Time of Cluster Nodes Is Not Synchronized
- Videos
- Glossary
-
More Documents
-
User Guide (ME-Abu Dhabi Region)
-
Overview
- What Is MRS?
- Application Scenarios
- Components
- Functions
- Constraints
- Permissions Management
- Related Services
- IAM Permissions Management
- MRS Quick Start
-
Configuring a Cluster
- Overview
- Cluster List
- Methods of Creating MRS Clusters
- Quick Creation of a Hadoop Analysis Cluster
- Quick Creation of an HBase Analysis Cluster
- Quick Creation of a Kafka Streaming Cluster
- Quick Creation of a ClickHouse Cluster
- Quick Creation of a Real-time Analysis Cluster
- Creating a Custom Cluster
- Customizing a Topology Cluster
- Adding a Tag to a Cluster
- Communication Security Authorization
- Installing the Third-Party Software Using Bootstrap Actions
-
Managing an Existing Cluster
- Managing and Monitoring a Cluster
- Manually Scaling Out a Cluster
- Manually Scaling In a Cluster
- Configuring an Auto Scaling Rule
- Configuring Auto Scaling Rules When Creating a Cluster
- Changing the Subnet of a Cluster
- Configuring Message Notification
- O&M
- Terminating a Cluster
- Deleting a Failed Task
-
Job Management
- Introduction to MRS Jobs
- Running a MapReduce Job
- Running a SparkSubmit Job
- Running a HiveSQL Job
- Running a SparkSql Job
- Running a Flink Job
- Running a Kafka Job
- Viewing Job Configuration and Logs
- Stopping a Job
- Deleting a Job
- Using Encrypted OBS Data for Job Running
- Configuring Job Notification Rules
- Importing and Exporting Data
-
Component Management
- Object Management
- Viewing Configuration
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configuration
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Managing a Host (Node)
- Isolating a Host
- Canceling Host Isolation
- Starting and Stopping a Cluster
- Synchronizing Cluster Configuration
- Exporting Cluster Configuration
- Performing Rolling Restart
- Alarm Management
- Patch Management
-
Health Check Management
- Before You Start
- Performing a Health Check
- Viewing and Exporting a Health Check Report
- DBService Health Check Indicators
- Flume Health Check Indicators
- HBase Health Check Indicators
- Host Health Check Indicators
- HDFS Health Check Indicators
- Hive Health Check Indicators
- Kafka Health Check Indicators
- KrbServer Health Check Indicators
- LdapServer Health Check Indicators
- Loader Health Check Indicators
- MapReduce Health Check Indicators
- OMS Health Check Indicators
- Spark Health Check Indicators
- Storm Health Check Indicators
- Yarn Health Check Indicators
- ZooKeeper Health Check Indicators
-
Tenant Management
- Before You Start
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a Tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Backup and Restoration
-
MRS Multi-User Permission Management
- Users and Permissions of MRS Clusters
- Default Users of Clusters with Kerberos Authentication Enabled
- Creating a Role
- Creating a User Group
- Creating a User
- Modifying User Information
- Locking a User
- Unlocking a User
- Deleting a User
- Changing the Password of an Operation User
- Initializing the Password of a System User
- Downloading a User Authentication File
- Modifying a Password Policy
- Configuring Cross-Cluster Mutual Trust Relationships
- Configuring Users to Access Resources of a Trusted Cluster
- Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
- Managing Historical Clusters
- Viewing Operation Logs
- Metadata
- Connecting to Clusters
- Using an MRS Client
-
MRS Manager Operation Guide (Applicable to 2.x and Earlier Versions)
- Introduction to MRS Manager
- Checking Running Tasks
- Monitoring Management
- Alarm Management
-
Object Management
- Managing Objects
- Viewing Configurations
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configurations
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Managing a Host
- Isolating a Host
- Canceling Host Isolation
- Starting or Stopping a Cluster
- Synchronizing Cluster Configurations
- Exporting Configuration Data of a Cluster
- Log Management
-
Health Check Management
- Performing a Health Check
- Viewing and Exporting a Health Check Report
- Configuring the Number of Health Check Reports to Be Reserved
- Managing Health Check Reports
- DBService Health Check Indicators
- Flume Health Check Indicators
- HBase Health Check Indicators
- Host Health Check Indicators
- HDFS Health Check Indicators
- Hive Health Check Indicators
- Kafka Health Check Indicators
- KrbServer Health Check Indicators
- LdapServer Health Check Indicators
- Loader Health Check Indicators
- MapReduce Health Check Indicators
- OMS Health Check Indicators
- Spark Health Check Indicators
- Storm Health Check Indicators
- Yarn Health Check Indicators
- ZooKeeper Health Check Indicators
- Static Service Pool Management
-
Tenant Management
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Backup and Restoration
-
Security Management
- Default Users of Clusters with Kerberos Authentication Disabled
- Default Users of Clusters with Kerberos Authentication Enabled
- Changing the Password of an OS User
- Changing the password of user admin
- Changing the Password of the Kerberos Administrator
- Changing the Passwords of the LDAP Administrator and the LDAP User
- Changing the Password of a Component Running User
- Changing the Password of the OMS Database Administrator
- Changing the Password of the Data Access User of the OMS Database
- Changing the Password of a Component Database User
- Updating Cluster Keys
- Permissions Management
- Patch Operation Guide
- Restoring Patches for the Isolated Hosts
- Rolling Restart
-
FusionInsight Manager Operation Guide (Applicable to 3.x)
- Getting Started
- Homepage
-
Cluster
- Cluster Management
- Managing a Service
- Instance Management
- Hosts
- O&M
- Audit
- Tenant Resources
- System Configuration
- Cluster Management
- Log Management
- Backup and Recovery Management
-
Security Management
- Security Overview
- Account Management
-
Security Hardening
- Hardening Policy
- Configuring a Trusted IP Address to Access LDAP
- HFile and WAL Encryption
- Security Configuration
- Configuring an IP Address Whitelist for Modifications Allowed by HBase
- Updating a Key for a Cluster
- Hardening the LDAP
- Configuring Kafka Data Encryption During Transmission
- Configuring HDFS Data Encryption During Transmission
- Configuring Communication Authentication for Storm Processes
- Encrypting the Communication Between Controller and Agent
- Updating SSH Keys for User omm
- Security Maintenance
- Security Statement
- Data Backup and Restoration
- Storage-Compute Decoupling Operation Guide
- Security
- High-Risk Operations Overview
-
FAQs
-
MRS Overview
- What Is MRS Used For?
- What Types of Distributed Storage Does MRS Support?
- How Do I Create an MRS Cluster Using a Custom Security Group?
- How Do I Use MRS?
- How Does MRS Ensure Security of Data and Services?
- Can I Configure a Phoenix Connection Pool?
- Does MRS Support Change of the Network Segment?
- Can I Downgrade the Specifications of an MRS Cluster Node?
- What Is the Relationship Between Hive and Other Components?
- Does an MRS Cluster Support Hive on Spark?
- What Are the Differences Between Hive Versions?
- Which MRS Cluster Version Supports Hive Connection and User Synchronization?
- What Are the Differences Between OBS and HDFS in Data Storage?
- How Do I Obtain the Hadoop Pressure Test Tool?
- What Is the Relationship Between Impala and Other Components?
- Statement About the Public IP Addresses in the Open-Source Third-Party SDK Integrated by MRS
- What Is the Relationship Between Kudu and HBase?
- Does MRS Support Running Hive on Kudu?
- What Are the Solutions for processing 1 Billion Data Records?
- Can I Change the IP address of DBService?
- Can I Clear MRS sudo Logs?
- Is the Storm Log also limited to 20 GB in MRS cluster 2.1.0?
- What Is Spark ThriftServer?
- What Access Protocols Are Supported by Kafka?
- What If Error 408 Is Reported When an MRS Node Accesses OBS?
- What Is the Compression Ratio of zstd?
- Why Are the HDFS, YARN, and MapReduce Components Unavailable When an MRS Cluster Is Created?
- Why Is the ZooKeeper Component Unavailable When an MRS Cluster Is Created?
- Which Python Versions Are Supported by Spark Tasks in an MRS 3.1.0 Cluster?
- How Do I Enable Different Service Programs to Use Different YARN Queues?
- Differences and Relationships Between the MRS Management Console and Cluster Manager
- How Do I Unbind an EIP from an MRS Cluster Node?
- Account and Password
-
Accounts and Permissions
- Does an MRS Cluster Support Access Permission Control If Kerberos Authentication Is not Enabled?
- How Do I Assign Tenant Management Permission to a New Account?
- How Do I Customize an MRS Policy?
- Why Is the Manage User Function Unavailable on the System Page on MRS Manager?
- Does Hue Support Account Permission Configuration?
- Client Usage
-
Web Page Access
- How Do I Change the Session Timeout Duration for an Open Source Component Web UI?
- Why Cannot I Refresh the Dynamic Resource Plan Page on MRS Tenant Tab?
- What Do I Do If the Kafka Topic Monitoring Tab Is Unavailable on Manager?
- How Do I Do If an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of HDFS, Hue, YARN, and Flink?
-
Alarm Monitoring
- In an MRS Streaming Cluster, Can the Kafka Topic Monitoring Function Send Alarm Notifications?
- Where Can I View the Running Resource Queues When the Alarm "ALM-18022 Insufficient Yarn Queue Resources" Is Reported?
- How Do I Understand the Multi-Level Chart Statistics in the HBase Operation Requests Metric?
- Performance Tuning
-
Job Development
- How Do I Get My Data into OBS or HDFS?
- What Types of Spark Jobs Can Be Submitted in a Cluster?
- Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
- What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
- How Do I View MRS Job Logs?
- How Do I Do If the Message "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page." Is Displayed?
- LauncherJob Job Execution Is Failed And the Error Message "jobPropertiesMap is null." Is Displayed
- How Do I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
- How Do I Do If a SparkStreaming Job Fails After Being Executed Dozens of Hours and the OBS Access 403 Error is Reported?
- How Do I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
- How Do I Do If Error Message "java.io.IOException: Connection reset by peer" Is Displayed During the Execution of a Spark Job?
- How Do I Do If Error Message "requestId=4971883851071737250" Is Displayed When a Spark Job Accesses OBS?
- Why DataArtsStudio Occasionally Fail to Schedule Spark Jobs and the Rescheduling also Fails?
- How Do I Do If a Flink Job Fails to Execute and the Error Message "java.lang.NoSuchFieldError: SECURITY_SSL_ENCRYPT_ENABLED" Is Displayed?
- Why Submitted Yarn Job Cannot Be Viewed on the Web UI?
- How Do I Modify the HDFS NameSpace (fs.defaultFS) of an Existing Cluster?
- How Do I Do If the launcher-job Queue Is Stopped by YARN due to Insufficient Heap Size When I Submit a Flink Job on the Management Plane?
- How Do I Do If the Error Message "slot request timeout" Is Displayed When I Submit a Flink Job?
- Data Import and Export of DistCP Jobs
- Cluster Upgrade/Patching
- Cluster Access
-
Big Data Service Development
- Can MRS Run Multiple Flume Tasks at a Time?
- How Do I Change FlumeClient Logs to Standard Logs?
- Where Are the .jar Files and Environment Variables of Hadoop Located?
- What Compression Algorithms Does HBase Support?
- Can MRS Write Data to HBase Through the HBase External Table of Hive?
- How Do I View HBase Logs?
- How Do I Set the TTL for an HBase Table?
- How Do I Balance HDFS Data?
- How Do I Change the Number of HDFS Replicas?
- What Is the Port for Accessing HDFS Using Python?
- How Do I Modify the HDFS Active/Standby Switchover Class?
- What Is the Recommended Number Type of DynamoDB in Hive Tables?
- Can the Hive Driver Be Interconnected with DBCP2?
- How Do I View the Hive Table Created by Another User?
- Can I Export the Query Result of Hive Data?
- How Do I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
- How Do I Do If a "hivesql/hivescript" Job Fails to Submit After Hive Is Added?
- What If an Excel File Downloaded on Hue Failed to Open?
- How Do I Do If Sessions Are Not Released After Hue Connects to HiveServer and the Error Message "over max user connections" Is Displayed?
- How Do I Reset Kafka Data?
- How Do I Obtain the Client Version of MRS Kafka?
- What Access Protocols Are Supported by Kafka?
- How Do I Do If Error Message "Not Authorized to access group xxx" Is Displayed When a Kafka Topic Is Consumed?
- What Compression Algorithms Does Kudu Support?
- How Do I View Kudu Logs?
- How Do I Handle the Kudu Service Exceptions Generated During Cluster Creation?
- Does OpenTSDB Support Python APIs?
- How Do I Configure Other Data Sources on Presto?
- How Do I Connect to Spark Shell from MRS?
- How Do I Connect to Spark Beeline from MRS?
- Where Are the Execution Logs of Spark Jobs Stored?
- How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
- How Do I Check Whether the ResourceManager Configuration of Yarn Is Correct?
- How Do I Modify the allow_drop_detached Parameter of ClickHouse?
- How Do I Do If an Alarm Indicating Insufficient Memory Is Reported During Spark Task Execution?
- How Do I Do If ClickHouse Consumes Excessive CPU Resources?
- How Do I Enable the Map Type on ClickHouse?
- A Large Number of OBS APIs Are Called When Spark SQL Accesses Hive Partitioned Tables
- API
-
Cluster Management
- How Do I View All Clusters?
- How Do I View Log Information?
- How Do I View Cluster Configuration Information?
- How Do I Install Kafka and Flume in an MRS Cluster?
- How Do I Stop an MRS Cluster?
- Can I Expand Data Disk Capacity for MRS?
- Can I Add Components to an Existing Cluster?
- Can I Delete Components Installed in an MRS Cluster?
- Can I Change MRS Cluster Nodes on the MRS Console?
- How Do I Shield Cluster Alarm/Event Notifications?
- Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
- How Do I Configure the knox Memory?
- What Is the Python Version Installed for an MRS Cluster?
- How Do I View the Configuration File Directory of Each Component?
- How Do I Do If the Time on MRS Nodes Is Incorrect?
- How Do I Query the Startup Time of an MRS Node?
- How Do I Do If Trust Relationships Between Nodes Are Abnormal?
- How Do I Adjust the Memory Size of the manager-executor Process?
-
Kerberos Usage
- How Do I Change the Kerberos Authentication Status of a Created MRS Cluster?
- What Are the Ports of the Kerberos Authentication Service?
- How Do I Deploy the Kerberos Service in a Running Cluster?
- How Do I Access Hive in a Cluster with Kerberos Authentication Enabled?
- How Do I Access Presto in a Cluster with Kerberos Authentication Enabled?
- How Do I Access Spark in a Cluster with Kerberos Authentication Enabled?
- How Do I Prevent Kerberos Authentication Expiration?
- Metadata Management
-
MRS Overview
-
Troubleshooting
- Accessing the Web Pages
-
Cluster Management
- Failed to Reduce Task Nodes
- OBS Certificate in a Cluster Expired
- Adding a New Disk to an MRS Cluster
- Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
- Replacing a Disk in an MRS Cluster (Applicable to 3.x)
- MRS Backup Failure
- Inconsistency Between df and du Command Output on the Core Node
- Disassociating a Subnet from the ACL Network
- MRS Becomes Abnormal After hostname Modification
- DataNode Restarts Unexpectedly
- Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
- Failed to Download the MRS Cluster Client
- Failed to Scale Out an MRS Cluster
- Error Occurs When MRS Executes the Insert Command Using Beeline
- How Do I Upgrade EulerOS to Fix Vulnerabilities in an MRS Cluster?
- Using CDM to Migrate Data to HDFS
- Alarms Are Frequently Generated in the MRS Cluster
- Memory Usage of the PMS Process Is High
- High Memory Usage of the Knox Process
- It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
- How Do I Locate a Job Submission Failure?
- OS Disk Space Is Insufficient Due to Oversized HBase Log Files
- Failed to Delete a New Tenant on FusionInsight Manager
- Using Alluixo
- Using ClickHouse
-
Using DBService
- DBServer Instance Is in Abnormal Status
- DBServer Instance Remains in the Restoring State
- Default Port 20050 or 20051 Is Occupied
- DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
- DBService Backup Failure
- Components Failed to Connect to DBService in Normal State
- DBServer Failed to Start
- DBService Backup Failed Because the Floating IP Address Is Unreachable
- DBService Failed to Start Due to the Loss of the DBService Configuration File
-
Using Flink
- "IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
- "IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
- The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
- Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
- Flink Service Program Fails to Read Files on the NFS Disk
- Failed to Customize the Flink Log4j Log Level
- Using Flume
-
Using HBase
- Slow Response to HBase Connection
- Failed to Authenticate the HBase User
- RegionServer Failed to Start Because the Port Is Occupied
- HBase Failed to Start Due to Insufficient Node Memory
- HBase Service Unavailable Due to Poor HDFS Performance
- HBase Failed to Start Due to Inappropriate Parameter Settings
- RegionServer Failed to Start Due to Residual Processes
- HBase Failed to Start Due to a Quota Set on HDFS
- HBase Failed to Start Due to Corrupted Version Files
- High CPU Usage Caused by Zero-Loaded RegionServer
- HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
- The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
- RegionServer Instance Is in the Restoring State
- HBase Failed to Start in a Newly Installed Cluster
- HBase Failed to Start Due to the Loss of the ACL Table Directory
- HBase Failed to Start After the Cluster Is Powered Off and On
- Failed to Import HBase Data Due to Oversized File Blocks
- Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
- Failed to Run the hbase shell Command on the MRS Cluster Client
- Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
- HBase Failed to Start Due to Insufficient RegionServer Memory
-
Using HDFS
- All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
- An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
- Failed to Use Python to Remotely Connect to the Port of HDFS
- HDFS Capacity Usage Reaches 100%, Causing Unavailable Upper-layer Services Such as HBase and Spark
- An Error Is Reported During HDFS and Yarn Startup
- HDFS Permission Setting Error
- A DataNode of HDFS Is Always in the Decommissioning State
- HDFS Failed to Start Due to Insufficient Memory
- A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
- CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
- Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
- Common File Read/Write Faults
- Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
- A Client File Fails to Be Closed After Data Writing
- File Fails to Be Uploaded to HDFS Due to File Errors
- After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
- Failed to Read Files, and "FileNotFoundException" Is Displayed
- Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
- Adjusting the Log Level of the Shell Client
- File Read Fails, and "No common protection layer" Is Displayed
- Failed to Write Files Because the HDFS Directory Quota Is Insufficient
- Balancing Fails, and "Source and target differ in block-size" Is Displayed
- A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
- Uneven Data Distribution Due to Non-HDFS Data Residuals
- Uneven Data Distribution Due to the Client Installation on the DataNode
- Handling Unbalanced DataNode Disk Usage on Nodes
- Locating Common Balance Problems
- HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
- An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
- Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
- Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
- HDFS Client Failed to Delete Overlong Directories
- An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
-
Using Hive
- Content Recorded in Hive Logs
- Causes of Hive Startup Failure
- "Cannot modify xxx at runtime" Is Reported When the set Command Is Executed in a Security Cluster
- How to Specify a Queue When Hive Submits a Job
- How to Set Map and Reduce Memory on the Client
- Specifying the Output File Compression Format When Importing a Table
- desc Table Cannot Be Completely Displayed
- NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
- A Newly Created User Has No Query Permissions
- An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
- An Error Is Reported When the "load data inpath" Command Is Executed
- An Error Is Reported When the "load data local inpath" Command Is Executed
- An Error Is Reported When the "create external table" Command Is Executed
- An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
- Insufficient Permissions to Execute the set role admin Command
- An Error Is Reported When UDF Is Created Using Beeline
- Difference Between Hive Service Health Status and Hive Instance Health Status
- Hive Alarms and Triggering Conditions
- "authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
- Failed to Access ZooKeeper from the Client
- "Invalid function" Is Displayed When a UDF Is Used
- Hive Service Status Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Concerning
- Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
- Hive Task Failed to Run on the Client But Successful on Yarn
- An Error Is Reported When the select Statement Is Executed
- Failed to Drop a Large Number of Partitions
- Failed to Start a Local Task
- Failed to Start WebHCat
- Sample Code Error for Hive Secondary Development After Domain Switching
- MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
- "Failed to execute session hooks: over max connections" Reported by Beeline
- beeline Reports the "OutOfMemoryError" Error
- Task Execution Fails Because the Input File Number Exceeds the Threshold
- Task Execution Fails Because of Stack Memory Overflow
- Task Failed Due to Concurrent Writes to One Table or Partition
- Hive Task Failed Due to a Lack of HDFS Directory Permission
- Failed to Load Data to Hive Tables
- HiveServer and HiveHCat Process Faults
- An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
- Timeout Reported When Adding the Hive Table Field
- Failed to Restart the Hive Service
- Hive Failed to Delete a Table
- An Error Is Reported When msck repair table table_name Is Run on Hive
- How Do I Release Disk Space After Dropping a Table in Hive?
- Connection Timeout During SQL Statement Execution on the Client
- WebHCat Failed to Start Due to Abnormal Health Status
- WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
- Using Hue
- Using Impala
-
Using Kafka
- An Error Is Reported When Kafka Is Run to Obtain a Topic
- Flume Normally Connects to Kafka But Fails to Send Messages
- Producer Failed to Send Data and Threw "NullPointerException"
- Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
- Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
- Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
- Consumer Fails to Consume Data and Remains in the Waiting State
- SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
- Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
- SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
- Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
- Checking Whether Data Consumed by a Customer Is Lost
- Failed to Start a Component Due to Account Lock
- Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
- Kafka Topics Cannot Be Deleted
- Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
- When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
- Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
- When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
- When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
- Consumer Repeatedly Consumes Data
- Leader for the Created Kafka Topic Partition Is Displayed as none
- Safety Instructions on Using Kafka
- Obtaining Kafka Consumer Offset Information
- Adding or Deleting Configurations for a Topic
- Reading the Content of the __consumer_offsets Internal Topic
- Configuring Logs for Shell Commands on the Client
- Obtaining Topic Distribution Information
- Kafka HA Usage Description
- Kafka Producer Writes Oversized Records
- Kafka Consumer Reads Oversized Records
- High Usage of Multiple Disks on a Kafka Cluster Node
- Using Oozie
- Using Presto
-
Using Spark
- An Error Occurs When the Split Size Is Changed in a Spark Application
- An Error Is Reported When Spark Is Used
- A Spark Job Fails to Run Due to Incorrect JAR File Import
- A Spark Job Is Pending Due to Insufficient Memory
- An Error Is Reported During Spark Running
- Executor Memory Reaches the Threshold Is Displayed in Driver
- Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
- Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
- ApplicationMaster Failed to Start Twice in Yarn-client Mode
- Failed to Connect to ResourceManager When a Spark Task Is Submitted
- DataArts Studio Failed to Schedule Spark Jobs
- Submission Status of the Spark Job API Is Error
- Alarm 43006 Is Repeatedly Generated in the Cluster
- Failed to Create or Delete a Table in Spark Beeline
- Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
- Large Number of Shuffle Results Are Lost During Spark Task Execution
- Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
- Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
- Spark Task Submission Failure
- Spark Task Execution Failure
- JDBCServer Connection Failure
- Failed to View Spark Task Logs
- Authentication Fails When Spark Connects to Other Services
- An Error Occurs When Spark Connects to Redis
- An Error Is Reported When spark-beeline Is Used to Query a Hive View
-
Using Sqoop
- Connecting Sqoop to MySQL
- Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
- Failed to Export HBase Data to HDFS Through Hue's Sqoop Task
- A Format Error Is Reported When Sqoop Is Used to Export Data from Hive to MySQL 8.0
- An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
- Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
-
Using Storm
- Invalid Hyperlink of Events on the Storm UI
- Failed to Submit a Topology
- Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
- The Worker Log Is Empty After a Topology Is Submitted
- Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
- "well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
- When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
- The GC Parameter Configured for the Service Topology Does Not Take Effect
- Internal Server Error Is Displayed When the User Queries Information on the UI
- Using Ranger
-
Using Yarn
- Plenty of Jobs Are Found After Yarn Is Started
- "GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
- Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
- Temporary Files Are Not Deleted When an MR Job Is Abnormal
- ResourceManager of Yarn (Port 8032) Throws Error "connection refused"
- Failed to View Job Logs on the Yarn Web UI
- An Error Is Reported When a Queue Name Is Clicked on the Yarn Page
- Using ZooKeeper
- Accessing OBS
- Appendix
-
Overview
-
Component Operation Guide (ME-Abu Dhabi Region)
- Using Alluxio
- Using CarbonData (for Versions Earlier Than MRS 3.x)
-
Using CarbonData (for MRS 3.x or Later)
- Overview
- Configuration Reference
- CarbonData Operation Guide
- CarbonData Performance Tuning
- CarbonData Access Control
- CarbonData Syntax Reference
- CarbonData Troubleshooting
-
CarbonData FAQ
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why Data Load Performance Decreases due to Bad Records?
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
- Why Data loading Fails During off heap?
- Why Do I Fail to Create a Hive Table?
- Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
- How Do I Logically Split Data Across Different Namespaces?
- Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
- Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
- Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
-
Using ClickHouse
- Using ClickHouse from Scratch
- ClickHouse Table Engine Overview
- Creating a ClickHouse Table
- ClickHouse Data Type
- Common ClickHouse SQL Syntax
- Migrating ClickHouse Data
- User Management and Authentication
- Backing Up and Restoring ClickHouse Data Using a Data File
- ClickHouse Log Overview
- ClickHouse Performance Tuning
-
ClickHouse FAQ
- How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
- How Do I Migrate Data from Hive/HDFS to ClickHouse?
- How Do I Migrate Data from OBS/S3 to ClickHouse?
- An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data
- How Do I Grant the Select Permission at the Database Level to ClickHouse Users?
- Using DBService
- Using Flink
-
Using Flume
- Using Flume from Scratch
- Overview
- Installing the Flume Client
- Viewing Flume Client Logs
- Stopping or Uninstalling the Flume Client
- Using the Encryption Tool of the Flume Client
- Flume Service Configuration Guide
- Flume Configuration Parameter Description
- Using Environment Variables in the properties.properties File
-
Non-Encrypted Transmission
- Configuring Non-encrypted Transmission
- Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
- Encrypted Transmission
- Viewing Flume Client Monitoring Information
- Connecting Flume to Kafka in Security Mode
- Connecting Flume with Hive in Security Mode
- Configuring the Flume Service Model
- Introduction to Flume Logs
- Flume Client Cgroup Usage Guide
- Secondary Development Guide for Flume Third-Party Plug-ins
- Common Issues About Flume
-
Using HBase
- Using HBase from Scratch
- Using an HBase Client
- Creating HBase Roles
- Configuring HBase Replication
- Configuring HBase Parameters
- Enabling Cross-Cluster Copy
- Using the ReplicationSyncUp Tool
- Using HIndex
- Configuring HBase DR
- Configuring HBase Data Compression and Encoding
- Performing an HBase DR Service Switchover
- Performing an HBase DR Active/Standby Cluster Switchover
- Community BulkLoad Tool
- Configuring the MOB
- Configuring Secure HBase Replication
- Configuring Region In Transition Recovery Chore Service
- Using a Secondary Index
- HBase Log Overview
- HBase Performance Tuning
-
Common Issues About HBase
- Why Does a Client Keep Failing to Connect to a Server for a Long Time?
- Operation Failures Occur in Stopping BulkLoad On the Client
- Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
- Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
- Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
- How Do I Restore a Region in the RIT State for a Long Time?
- Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
- Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
- When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
- Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
- What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
- How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
- Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
- Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
- Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
- Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
- Insufficient Rights When a Tenant Accesses Phoenix
- What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
- How Do I Fix Region Overlapping?
- Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
- Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
- Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
- How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
-
Using HDFS
- Using Hadoop from Scratch
- Configuring Memory Management
- Creating an HDFS Role
- Using the HDFS Client
- Running the DistCp Command
- Overview of HDFS File System Directories
- Changing the DataNode Storage Directory
- Configuring HDFS Directory Permission
- Configuring NFS
- Planning HDFS Capacity
- Configuring ulimit for HBase and HDFS
- Balancing DataNode Capacity
- Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
- Configuring the Number of Files in a Single HDFS Directory
- Configuring the Recycle Bin Mechanism
- Setting Permissions on Files and Directories
- Setting the Maximum Lifetime and Renewal Interval of a Token
- Configuring the Damaged Disk Volume
- Configuring Encrypted Channels
- Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
- Configuring the NameNode Blacklist
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Configuring Reserved Percentage of Disk Usage on DataNodes
- Configuring HDFS NodeLabel
- Configuring HDFS Mover
- Using HDFS AZ Mover
- Configuring HDFS DiskBalancer
- Configuring the Observer NameNode to Process Read Requests
- Performing Concurrent Operations on HDFS Files
- Introduction to HDFS Logs
- HDFS Performance Tuning
-
FAQ
- NameNode Startup Is Slow
- DataNode Is Normal but Cannot Report Data Blocks
- HDFS WebUI Cannot Properly Update Information About Damaged Data
- Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
- Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
- Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
- Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
- Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
- Why Does Array Border-crossing Occur During FileInputFormat Split?
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
- Can I Delete or Modify the Data Storage Directory in DataNode?
- Blocks Miss on the NameNode UI After the Successful Rollback
- Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
-
Using Hive
- Using Hive from Scratch
- Configuring Hive Parameters
- Hive SQL
- Permission Management
- Using a Hive Client
- Using HDFS Colocation to Store Hive Tables
- Using the Hive Column Encryption Function
- Customizing Row Separators
- Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
- Deleting Single-Row Records from Hive on HBase
- Configuring HTTPS/HTTP-based REST APIs
- Enabling or Disabling the Transform Function
- Access Control of a Dynamic Table View on Hive
- Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
- Using Hive to Read Data in a Relational Database
- Supporting Traditional Relational Database Syntax in Hive
- Creating User-Defined Hive Functions
- Enhancing beeline Reliability
- Viewing Table Structures Using the show create Statement as Users with the select Permission
- Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
- Inserting Data to a Directory That Does Not Exist
- Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
- Disabling of Specifying the location Keyword When Creating an Internal Hive Table
- Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
- Authorizing Over 32 Roles in Hive
- Restricting the Maximum Number of Maps for Hive Tasks
- HiveServer Lease Isolation
- Hive Supporting Transactions
- Switching the Hive Execution Engine to Tez
- Hive Materialized View
- Hive Log Overview
- Hive Performance Tuning
-
Common Issues About Hive
- How Do I Delete UDFs on Multiple HiveServers at the Same Time?
- Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- Table Creation Fails Because Hive Complex Fields' Names Contain Special Characters
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
- Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
- Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Description of Hive Table Location (Either Be an OBS or HDFS Path)
- Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
- Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
- Why Does Hive Not Support Vectorized Query?
- Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
- How Do I Disable the Logging Function of Hive?
- Why Hive Tables in the OBS Directory Fail to Be Deleted?
- Hive Configuration Problems
-
Using Hudi
- Getting Started
- Basic Operations
- Hudi Performance Tuning
-
Common Issues About Hudi
-
Data Write
- Parquet/Avro schema Is Reported When Updated Data Is Written
- UnsupportedOperationException Is Reported When Updated Data Is Written
- SchemaCompatabilityException Is Reported When Updated Data Is Written
- What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
- Hudi Fails to Write Decimal Data with Lower Precision
- Data Collection
- Hive Synchronization
-
Data Write
- Using Hue (Versions Earlier Than MRS 3.x)
-
Using Hue (MRS 3.x or Later)
- Using Hue from Scratch
- Accessing the Hue Web UI
- Hue Common Parameters
- Using HiveQL Editor on the Hue Web UI
- Using the SparkSql Editor on the Hue Web UI
- Using the Metadata Browser on the Hue Web UI
- Using File Browser on the Hue Web UI
- Using Job Browser on the Hue Web UI
- Using HBase on the Hue Web UI
- Typical Scenarios
- Hue Log Overview
-
Common Issues About Hue
- Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
- Why Does the use database Statement Become Invalid in Hive?
- Why Do HDFS Files Fail to Access Through the Hue Web UI?
- Why Do Large Files Fail to Upload on the Hue Page
- Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- Using Impala
-
Using Kafka
- Using Kafka from Scratch
- Managing Kafka Topics
- Querying Kafka Topics
- Managing Kafka User Permissions
- Managing Messages in Kafka Topics
- Synchronizing Binlog-based MySQL Data to the MRS Cluster
- Creating a Kafka Role
- Kafka Common Parameters
- Safety Instructions on Using Kafka
- Kafka Specifications
- Using the Kafka Client
- Configuring Kafka HA and High Reliability Parameters
- Changing the Broker Storage Directory
- Checking the Consumption Status of Consumer Group
- Kafka Balancing Tool Instructions
- Balancing Data After Kafka Node Scale-Out
- Kafka Token Authentication Mechanism Tool Usage
- Introduction to Kafka Logs
- Performance Tuning
- Kafka Feature Description
- Migrating Data Between Kafka Nodes
- Common Issues About Kafka
- Using KafkaManager
-
Using Loader
- Using Loader from Scratch
- How to Use Loader
- Common Loader Parameters
- Creating a Loader Role
- Loader Link Configuration
- Managing Loader Links (Versions Earlier Than MRS 3.x)
- Managing Loader Links (MRS 3.x and Later Versions)
- Source Link Configurations of Loader Jobs
- Destination Link Configurations of Loader Jobs
- Managing Loader Jobs
- Preparing a Driver for MySQL Database Link
-
Importing Data
- Overview
- Importing Data Using Loader
- Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
- Typical Scenario: Importing Data from an SFTP Server to HBase
- Typical Scenario: Importing Data from an SFTP Server to Hive
- Typical Scenario: Importing Data from an FTP Server to HBase
- Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
- Typical Scenario: Importing Data from a Relational Database to HBase
- Typical Scenario: Importing Data from a Relational Database to Hive
- Typical Scenario: Importing Data from HDFS or OBS to HBase
- Typical Scenario: Importing Data from a Relational Database to ClickHouse
- Typical Scenario: Importing Data from HDFS to ClickHouse
-
Exporting Data
- Overview
- Using Loader to Export Data
- Typical Scenario: Exporting Data from HDFS or OBS to an SFTP Server
- Typical Scenario: Exporting Data from HBase to an SFTP Server
- Typical Scenario: Exporting Data from Hive to an SFTP Server
- Typical Scenario: Exporting Data from HDFS or OBS to a Relational Database
- Typical Scenario: Exporting Data from HBase to a Relational Database
- Typical Scenario: Exporting Data from Hive to a Relational Database
- Typical Scenario: Importing Data from HBase to HDFS or OBS
- Managing Jobs
- Operator Help
-
Client Tools
- Running a Loader Job by Using Commands
- loader-tool Usage Guide
- loader-tool Usage Example
- schedule-tool Usage Guide
- schedule-tool Usage Example
- Using loader-backup to Back Up Job Data
- Open Source sqoop-shell Tool Usage Guide
- Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
- Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
- Loader Log Overview
- Example: Using Loader to Import Data from OBS to HDFS
- Common Issues About Loader
- Using Kudu
-
Using MapReduce
- Configuring the Log Archiving and Clearing Mechanism
- Reducing Client Application Failure Rate
- Transmitting MapReduce Tasks from Windows to Linux
- Configuring the Distributed Cache
- Configuring the MapReduce Shuffle Address
- Configuring the Cluster Administrator List
- Introduction to MapReduce Logs
- MapReduce Performance Tuning
-
Common Issues About MapReduce
- Why Does a MapReduce Task Stay Unchanged for a Long Time?
- Why the Client Hangs During Job Running?
- Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
- MapReduce Job Failed in Multiple NameService Environment
- Why a Fault MapReduce Node Is Not Blacklisted?
- Using OpenTSDB
-
Using Oozie
- Using Oozie from Scratch
- Using the Oozie Client
- Using Oozie Client to Submit an Oozie Job
-
Using Hue to Submit an Oozie Job
- Creating a Workflow
-
Submitting a Workflow Job
- Submitting a Hive2 Job
- Submitting a Spark2x Job
- Submitting a Java Job
- Submitting a Loader Job
- Submitting a MapReduce Job
- Submitting a Sub-workflow Job
- Submitting a Shell Job
- Submitting an HDFS Job
- Submitting a Streaming Job
- Submitting a DistCp Job
- Example of Mutual Trust Operations
- Submitting an SSH Job
- Submitting a Hive Script
- Submitting a Coordinator Periodic Scheduling Job
- Submitting a Bundle Batch Processing Job
- Querying Job Execution Results
- Oozie Log Overview
- Common Issues About Oozie
- Using Presto
- Using Ranger (MRS 1.9.2)
-
Using Ranger (MRS 3.x)
- Logging In to the Ranger Web UI
- Enabling Ranger Authentication
- Configuring Component Permission Policies
- Viewing Ranger Audit Information
- Configuring a Security Zone
- Viewing Ranger Permission Information
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark2x
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for Storm
- Ranger Log Overview
-
Common Issues About Ranger
- Why Ranger Startup Fails During the Cluster Installation?
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
- Using Spark
-
Using Spark2x
- Precautions
-
Basic Operation
- Getting Started
- Configuring Parameters Rapidly
- Common Parameters
- Spark on HBase Overview and Basic Applications
- Spark on HBase V2 Overview and Basic Applications
- SparkSQL Permission Management(Security Mode)
-
Scenario-Specific Configuration
- Configuring Multi-active Instance Mode
- Configuring the Multi-tenant Mode
- Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
- Configuring the Size of the Event Queue
- Configuring Executor Off-Heap Memory
- Enhancing Stability in a Limited Memory Condition
- Viewing Aggregated Container Logs on the Web UI
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Configuring the Default Number of Data Blocks Divided by SparkSQL
- Configuring the Compression Format of a Parquet Table
- Configuring the Number of Lost Executors Displayed in WebUI
- Setting the Log Level Dynamically
- Configuring Whether Spark Obtains HBase Tokens
- Configuring LIFO for Kafka
- Configuring Reliability for Connected Kafka
- Configuring Streaming Reading of Driver Execution Results
- Filtering Partitions without Paths in Partitioned Tables
- Configuring Spark2x Web UI ACLs
- Configuring Vector-based ORC Data Reading
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Hive Dynamic Partition Overwriting Syntax
- Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
- Configuring Local Disk Cache for JobHistory
- Configuring Spark SQL to Enable the Adaptive Execution Feature
- Configuring Event Log Rollover
- Adapting to the Third-party JDK When Ranger Is Used
- Spark2x Logs
- Obtaining Container Logs of a Running Spark Application
- Small File Combination Tools
- Using CarbonData for First Query
-
Spark2x Performance Tuning
- Spark Core Tuning
-
Spark SQL and DataFrame Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the INSERT...SELECT Operation
- Multiple JDBC Clients Concurrently Connecting to JDBCServer
- Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- Optimizing SQL Query of Data of Multiple Sources
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Tuning
-
Common Issues About Spark2x
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
- Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- Why the Operation Fails When the Table Name Is TABLE?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Do I Fail to Modify MetaData by Running the Hive Command?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
- Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
- Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Functions Cannot Be Used When Different JDBCServers Are Connected?
- Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
- Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
- Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
- Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
-
Spark Streaming
- Same DAG Log Is Recorded Twice for a Streaming Task
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Why Is not an Application Displayed When I Run the Application with the Empty Part File?
- Why Does Spark2x Fail to Export a Table with the Same Field Name?
- Why JRE fatal error after running Spark application multiple times?
- Native Spark2x UI Fails to Be Accessed or Is Incorrectly Displayed when Internet Explorer Is Used for Access
- How Does Spark2x Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- Why Is the Native Page of an Application in Spark2x JobHistory Displayed Incorrectly?
- Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
- Spark Shuffle Exception Handling
-
Spark Core
-
Using Sqoop
- Using Sqoop from Scratch
- Adapting Sqoop 1.4.7 to MRS 3.x Clusters
- Common Sqoop Commands and Parameters
-
Common Issues About Sqoop
- What Should I Do If Class QueryProvider Is Unavailable?
- How Do I Do If PostgreSQL or GaussDB Fails to Connect?
- What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
- What Should I Do If Data Failed to Be Synchronized Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
- What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
-
Using Storm
- Using Storm from Scratch
- Using the Storm Client
- Submitting Storm Topologies on the Client
- Accessing the Storm Web UI
- Managing Storm Topologies
- Querying Storm Topology Logs
- Storm Common Parameters
- Configuring a Storm Service User Password Policy
- Migrating Storm Services to Flink
- Storm Log Introduction
- Performance Tuning
- Using Tez
-
Using YARN
- Common YARN Parameters
- Creating Yarn Roles
- Using the YARN Client
- Configuring Resources for a NodeManager Role Instance
- Changing NodeManager Storage Directories
- Configuring Strict Permission Control for Yarn
- Configuring Container Log Aggregation
- Using CGroups with YARN
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring the Access Channel Protocol
- Configuring Memory Usage Detection
- Configuring the Additional Scheduler WebUI
- Configuring Yarn Restart
- Configuring ApplicationMaster Work Preserving
- Configuring the Localized Log Levels
- Configuring Users That Run Tasks
- Yarn Log Overview
- Yarn Performance Tuning
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Common ZooKeeper Parameters
- Using a ZooKeeper Client
- Configuring the ZooKeeper Permissions
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
-
API Reference (ME-Abu Dhabi Region)
- Before You Start
- API Overview
- Calling APIs
- Application Cases
- API V2
- API V1.1
- Out-of-Date APIs
- Permissions Policies and Supported Actions
- Appendix
-
User Guide (Paris Region)
-
Overview
- What Is MRS?
- Application Scenarios
- Components
- Functions
- Constraints
- Permissions Management
- Related Services
- Preparing a User
-
Configuring a Cluster
- Methods of Creating MRS Clusters
- Quick Creation of a Cluster
- Creating a Custom Cluster
- Creating a Custom Topology Cluster
- Adding a Tag to a Cluster
- Communication Security Authorization
-
Configuring Auto Scaling Rules
- Overview
- Configuring Auto Scaling During Cluster Creation
- Creating an Auto Scaling Policy for an Existing Cluster
- Scenario 1: Using Auto Scaling Rules Alone
- Scenario 2: Using Resource Plans Alone
- Scenario 3: Using Both Auto Scaling Rules and Resource Plans
- Modifying an Auto Scaling Policy
- Deleting an Auto Scaling Policy
- Enabling or Disabling an Auto Scaling Policy
- Viewing an Auto Scaling Policy
- Configuring Automation Scripts
- Configuring Auto Scaling Metrics
- Managing Data Connections
- Installing Third-Party Software Using Bootstrap Actions
- Viewing Failed MRS Tasks
- Viewing Information of a Historical Cluster
-
Managing Clusters
- Logging In to a Cluster
- Cluster Overview
- Cluster O&M
- Managing Nodes
-
Job Management
- Introduction to MRS Jobs
- Running a MapReduce Job
- Running a SparkSubmit Job
- Running a HiveSQL Job
- Running a SparkSql Job
- Running a Flink Job
- Running a Kafka Job
- Viewing Job Configuration and Logs
- Stopping a Job
- Deleting a Job
- Using Encrypted OBS Data for Job Running
- Configuring Job Notification Rules
-
Component Management
- Object Management
- Viewing Configuration
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configuration
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Starting and Stopping a Cluster
- Synchronizing Cluster Configuration
- Exporting Cluster Configuration
- Performing Rolling Restart
- Alarm Management
- Patch Management
-
Tenant Management
- Before You Start
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a Tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Bootstrap Actions
- Using an MRS Client
- Configuring a Cluster with Storage and Compute Decoupled
- Accessing Web Pages of Open Source Components Managed in MRS Clusters
- Interconnecting Jupyter Notebook with MRS Using Custom Python
- Accessing Manager
-
FusionInsight Manager Operation Guide (Applicable to 3.x)
- Getting Started
- Homepage
-
Cluster
- Cluster Management
- Managing a Service
- Instance Management
- Hosts
- O&M
- Audit
- Tenant Resources
- System
- Cluster Management
- Log Management
-
Backup and Recovery Management
- Introduction
-
Backing Up Data
- Backing Up Manager Data
- Backing Up CDL Data
- Backing Up ClickHouse Metadata
- Backing Up ClickHouse Service Data
- Backing Up DBService Data
- Backing Up HBase Metadata
- Backing Up HBase Service Data
- Backing Up NameNode Data
- Backing Up HDFS Service Data
- Backing Up Hive Service Data
- Backing Up IoTDB Metadata
- Backing Up IoTDB Service Data
- Backing Up Kafka Metadata
-
Recovering Data
- Restoring Manager Data
- Restoring CDL Data
- Restoring ClickHouse Metadata
- Restoring ClickHouse Service Data
- Restoring DBService data
- Restoring HBase Metadata
- Restoring HBase Service Data
- Restoring NameNode Data
- Restoring HDFS Service Data
- Restoring Hive Service Data
- Restoring IoTDB Metadata
- Restoring IoTDB Service Data
- Restoring Kafka Metadata
- Enabling Cross-Cluster Replication
- Managing Local Quick Restoration Tasks
- Modifying a Backup Task
- Viewing Backup and Restoration Tasks
- How Do I Configure the Environment When I Create a ClickHouse Backup Task on FusionInsight Manager and Set the Path Type to RemoteHDFS?
-
Security Management
- Security Overview
- Account Management
-
Security Hardening
- Hardening Policies
- Configuring a Trusted IP Address to Access LDAP
- HFile and WAL Encryption
- Configuring Hadoop Security Parameters
- Configuring an IP Address Whitelist for Modification Allowed by HBase
- Updating a Key for a Cluster
- Hardening the LDAP
- Configuring Kafka Data Encryption During Transmission
- Configuring HDFS Data Encryption During Transmission
- Encrypting the Communication Between the Controller and the Agent
- Updating SSH Keys for User omm
- Security Maintenance
- Security Statement
-
MRS Manager Operation Guide (Applicable to 2.x and Earlier Versions)
- Introduction to MRS Manager
- Checking Running Tasks
- Monitoring Management
- Alarm Management
-
Object Management
- Managing Objects
- Viewing Configurations
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configurations
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Managing a Host
- Isolating a Host
- Canceling Host Isolation
- Starting or Stopping a Cluster
- Synchronizing Cluster Configurations
- Exporting Configuration Data of a Cluster
- Log Management
-
Health Check Management
- Performing a Health Check
- Viewing and Exporting a Health Check Report
- Configuring the Number of Health Check Reports to Be Reserved
- Managing Health Check Reports
- DBService Health Check Indicators
- Flume Health Check Indicators
- HBase Health Check Indicators
- Host Health Check Indicators
- HDFS Health Check Indicators
- Hive Health Check Indicators
- Kafka Health Check Indicators
- KrbServer Health Check Indicators
- LdapServer Health Check Indicators
- Loader Health Check Indicators
- MapReduce Health Check Indicators
- OMS Health Check Indicators
- Spark Health Check Indicators
- Storm Health Check Indicators
- Yarn Health Check Indicators
- ZooKeeper Health Check Indicators
- Static Service Pool Management
-
Tenant Management
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Backup and Restoration
-
Security Management
- Default Users of Clusters with Kerberos Authentication Disabled
- Default Users of Clusters with Kerberos Authentication Enabled
- Changing the Password of an OS User
- Changing the password of user admin
- Changing the Password of the Kerberos Administrator
- Changing the Passwords of the LDAP Administrator and the LDAP User
- Changing the Password of a Component Running User
- Changing the Password of the OMS Database Administrator
- Changing the Password of the Data Access User of the OMS Database
- Changing the Password of a Component Database User
- Replacing the HA Certificate
- Updating Cluster Keys
- Permissions Management
-
MRS Multi-User Permission Management
- Users and Permissions of MRS Clusters
- Default Users of Clusters with Kerberos Authentication Enabled
- Creating a Role
- Creating a User Group
- Creating a User
- Modifying User Information
- Locking a User
- Unlocking a User
- Deleting a User
- Changing the Password of an Operation User
- Initializing the Password of a System User
- Downloading a User Authentication File
- Modifying a Password Policy
- Configuring Cross-Cluster Mutual Trust Relationships
- Configuring Users to Access Resources of a Trusted Cluster
- Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
- Patch Operation Guide
- Restoring Patches for the Isolated Hosts
- Rolling Restart
- Security Description
- High-Risk Operations
- MRS Quick Start
-
Troubleshooting
- Accessing the Web Pages
-
Cluster Management
- Failed to Reduce Task Nodes
- Adding a New Disk to an MRS Cluster
- Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
- Replacing a Disk in an MRS Cluster (Applicable to 3.x)
- MRS Backup Failure
- Inconsistency Between df and du Command Output on the Core Node
- Disassociating a Subnet from the ACL Network
- MRS Becomes Abnormal After hostname Modification
- DataNode Restarts Unexpectedly
- Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
- Failed to Download the MRS Cluster Client
- Failed to Scale Out an MRS Cluster
- Error Occurs When MRS Executes the Insert Command Using Beeline
- How Do I Upgrade EulerOS to Fix Vulnerabilities in an MRS Cluster?
- Using CDM to Migrate Data to HDFS
- Alarms Are Frequently Generated in the MRS Cluster
- Memory Usage of the PMS Process Is High
- High Memory Usage of the Knox Process
- It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
- How Do I Locate a Job Submission Failure?
- OS Disk Space Is Insufficient Due to Oversized HBase Log Files
- Failed to Delete a New Tenant on FusionInsight Manager
- Using Alluixo
- Using ClickHouse
-
Using DBService
- DBServer Instance Is in Abnormal Status
- DBServer Instance Remains in the Restoring State
- Default Port 20050 or 20051 Is Occupied
- DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
- DBService Backup Failure
- Components Failed to Connect to DBService in Normal State
- DBServer Failed to Start
- DBService Backup Failed Because the Floating IP Address Is Unreachable
- DBService Failed to Start Due to the Loss of the DBService Configuration File
-
Using Flink
- "IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
- "IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
- The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
- Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
- Flink Service Program Fails to Read Files on the NFS Disk
- Failed to Customize the Flink Log4j Log Level
- Using Flume
-
Using HBase
- Slow Response to HBase Connection
- Failed to Authenticate the HBase User
- RegionServer Failed to Start Because the Port Is Occupied
- HBase Failed to Start Due to Insufficient Node Memory
- HBase Service Unavailable Due to Poor HDFS Performance
- HBase Failed to Start Due to Inappropriate Parameter Settings
- RegionServer Failed to Start Due to Residual Processes
- HBase Failed to Start Due to a Quota Set on HDFS
- HBase Failed to Start Due to Corrupted Version Files
- High CPU Usage Caused by Zero-Loaded RegionServer
- HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
- The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
- RegionServer Instance Is in the Restoring State
- HBase Failed to Start in a Newly Installed Cluster
- HBase Failed to Start Due to the Loss of the ACL Table Directory
- HBase Failed to Start After the Cluster Is Powered Off and On
- Failed to Import HBase Data Due to Oversized File Blocks
- Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
- Failed to Run the hbase shell Command on the MRS Cluster Client
- Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
- HBase Failed to Start Due to Insufficient RegionServer Memory
-
Using HDFS
- All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
- An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
- Failed to Use Python to Remotely Connect to the Port of HDFS
- HDFS Capacity Usage Reaches 100%, Causing Unavailable Upper-layer Services Such as HBase and Spark
- An Error Is Reported During HDFS and Yarn Startup
- HDFS Permission Setting Error
- A DataNode of HDFS Is Always in the Decommissioning State
- HDFS Failed to Start Due to Insufficient Memory
- A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
- CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
- Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
- Common File Read/Write Faults
- Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
- A Client File Fails to Be Closed After Data Writing
- File Fails to Be Uploaded to HDFS Due to File Errors
- After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
- Failed to Read Files, and "FileNotFoundException" Is Displayed
- Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
- Adjusting the Log Level of the Shell Client
- File Read Fails, and "No common protection layer" Is Displayed
- Failed to Write Files Because the HDFS Directory Quota Is Insufficient
- Balancing Fails, and "Source and target differ in block-size" Is Displayed
- A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
- Uneven Data Distribution Due to Non-HDFS Data Residuals
- Uneven Data Distribution Due to the Client Installation on the DataNode
- Handling Unbalanced DataNode Disk Usage on Nodes
- Locating Common Balance Problems
- HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
- An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
- Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
- Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
- HDFS Client Failed to Delete Overlong Directories
- An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
-
Using Hive
- Content Recorded in Hive Logs
- Causes of Hive Startup Failure
- "Cannot modify xxx at runtime" Is Reported When the set Command Is Executed in a Security Cluster
- How to Specify a Queue When Hive Submits a Job
- How to Set Map and Reduce Memory on the Client
- Specifying the Output File Compression Format When Importing a Table
- desc Table Cannot Be Completely Displayed
- NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
- A Newly Created User Has No Query Permissions
- An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
- An Error Is Reported When the "load data inpath" Command Is Executed
- An Error Is Reported When the "load data local inpath" Command Is Executed
- An Error Is Reported When the "create external table" Command Is Executed
- An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
- Insufficient Permissions to Execute the set role admin Command
- An Error Is Reported When UDF Is Created Using Beeline
- Difference Between Hive Service Health Status and Hive Instance Health Status
- Hive Alarms and Triggering Conditions
- "authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
- Failed to Access ZooKeeper from the Client
- "Invalid function" Is Displayed When a UDF Is Used
- Hive Service Status Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Concerning
- Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
- Hive Task Failed to Run on the Client But Successful on Yarn
- An Error Is Reported When the select Statement Is Executed
- Failed to Drop a Large Number of Partitions
- Failed to Start a Local Task
- Failed to Start WebHCat
- Sample Code Error for Hive Secondary Development After Domain Switching
- MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
- "Failed to execute session hooks: over max connections" Reported by Beeline
- beeline Reports the "OutOfMemoryError" Error
- Task Execution Fails Because the Input File Number Exceeds the Threshold
- Task Execution Fails Because of Stack Memory Overflow
- Task Failed Due to Concurrent Writes to One Table or Partition
- Hive Task Failed Due to a Lack of HDFS Directory Permission
- Failed to Load Data to Hive Tables
- HiveServer and HiveHCat Process Faults
- An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
- Timeout Reported When Adding the Hive Table Field
- Failed to Restart the Hive Service
- Hive Failed to Delete a Table
- An Error Is Reported When msck repair table table_name Is Run on Hive
- How Do I Release Disk Space After Dropping a Table in Hive?
- Connection Timeout During SQL Statement Execution on the Client
- WebHCat Failed to Start Due to Abnormal Health Status
- WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
- Using Hue
- Using Impala
-
Using Kafka
- An Error Is Reported When Kafka Is Run to Obtain a Topic
- Flume Normally Connects to Kafka But Fails to Send Messages
- Producer Failed to Send Data and Threw "NullPointerException"
- Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
- Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
- Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
- Consumer Fails to Consume Data and Remains in the Waiting State
- SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
- Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
- SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
- Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
- Checking Whether Data Consumed by a Customer Is Lost
- Failed to Start a Component Due to Account Lock
- Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
- Kafka Topics Cannot Be Deleted
- Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
- When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
- Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
- When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
- When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
- Consumer Repeatedly Consumes Data
- Leader for the Created Kafka Topic Partition Is Displayed as none
- Safety Instructions on Using Kafka
- Obtaining Kafka Consumer Offset Information
- Adding or Deleting Configurations for a Topic
- Reading the Content of the __consumer_offsets Internal Topic
- Configuring Logs for Shell Commands on the Client
- Obtaining Topic Distribution Information
- Kafka HA Usage Description
- Kafka Producer Writes Oversized Records
- Kafka Consumer Reads Oversized Records
- High Usage of Multiple Disks on a Kafka Cluster Node
- Using Oozie
- Using Presto
-
Using Spark
- An Error Occurs When the Split Size Is Changed in a Spark Application
- An Error Is Reported When Spark Is Used
- A Spark Job Fails to Run Due to Incorrect JAR File Import
- A Spark Job Is Pending Due to Insufficient Memory
- An Error Is Reported During Spark Running
- Executor Memory Reaches the Threshold Is Displayed in Driver
- Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
- Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
- ApplicationMaster Failed to Start Twice in Yarn-client Mode
- Failed to Connect to ResourceManager When a Spark Task Is Submitted
- DataArts Studio Failed to Schedule Spark Jobs
- Submission Status of the Spark Job API Is Error
- Alarm 43006 Is Repeatedly Generated in the Cluster
- Failed to Create or Delete a Table in Spark Beeline
- Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
- Large Number of Shuffle Results Are Lost During Spark Task Execution
- Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
- Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
- Spark Task Submission Failure
- Spark Task Execution Failure
- JDBCServer Connection Failure
- Failed to View Spark Task Logs
- Authentication Fails When Spark Connects to Other Services
- An Error Occurs When Spark Connects to Redis
- An Error Is Reported When spark-beeline Is Used to Query a Hive View
-
Using Sqoop
- Connecting Sqoop to MySQL
- Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
- Failed to Export HBase Data to HDFS Through Hue's Sqoop Task
- A Format Error Is Reported When Sqoop Is Used to Export Data from Hive to MySQL 8.0
- An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
- Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
-
Using Storm
- Invalid Hyperlink of Events on the Storm UI
- Failed to Submit a Topology
- Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
- The Worker Log Is Empty After a Topology Is Submitted
- Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
- "well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
- When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
- The GC Parameter Configured for the Service Topology Does Not Take Effect
- Internal Server Error Is Displayed When the User Queries Information on the UI
- Using Ranger
-
Using Yarn
- Plenty of Jobs Are Found After Yarn Is Started
- "GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
- Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
- Temporary Files Are Not Deleted When an MR Job Is Abnormal
- ResourceManager of Yarn (Port 8032) Throws Error "connection refused"
- Failed to View Job Logs on the Yarn Web UI
- An Error Is Reported When a Queue Name Is Clicked on the Yarn Page
- Using ZooKeeper
- Accessing OBS
- Appendix
- Change History
-
Overview
-
Component Operation Guide (Paris Region)
- Using CarbonData (for Versions Earlier Than MRS 3.x)
-
Using CarbonData (for MRS 3.x or Later)
- Overview
- Configuration Reference
- CarbonData Operation Guide
- CarbonData Performance Tuning
- CarbonData Access Control
- CarbonData Syntax Reference
- CarbonData Troubleshooting
-
CarbonData FAQ
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why Data Load Performance Decreases due to Bad Records?
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
- Why Data loading Fails During off heap?
- Why Do I Fail to Create a Hive Table?
- Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
- How Do I Logically Split Data Across Different Namespaces?
- Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
- Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
- Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
- Using ClickHouse
- Using DBService
-
Using Flink
- Using Flink from Scratch
- Viewing Flink Job Information
- Flink Configuration Management
- Security Configuration
- Security Hardening
- Security Statement
- Using the Flink Web UI
- Flink Log Overview
- Flink Performance Tuning
- Common Flink Shell Commands
-
Using Flume
- Using Flume from Scratch
- Overview
- Installing the Flume Client
- Viewing Flume Client Logs
- Stopping or Uninstalling the Flume Client
- Using the Encryption Tool of the Flume Client
- Flume Service Configuration Guide
- Flume Configuration Parameter Description
- Using Environment Variables in the properties.properties File
-
Non-Encrypted Transmission
- Configuring Non-encrypted Transmission
- Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
- Encrypted Transmission
- Viewing Flume Client Monitoring Information
- Connecting Flume to Kafka in Security Mode
- Connecting Flume with Hive in Security Mode
- Configuring the Flume Service Model
- Introduction to Flume Logs
- Flume Client Cgroup Usage Guide
- Secondary Development Guide for Flume Third-Party Plug-ins
- Common Issues About Flume
-
Using HBase
- Using HBase from Scratch
- Using an HBase Client
- Creating HBase Roles
- Configuring HBase Replication
- Configuring HBase Parameters
- Enabling Cross-Cluster Copy
- Using the ReplicationSyncUp Tool
- GeoMesa Command Line
- Using HIndex
- Configuring HBase DR
- Configuring HBase Data Compression and Encoding
- Performing an HBase DR Service Switchover
- Performing an HBase DR Active/Standby Cluster Switchover
- Community BulkLoad Tool
- Configuring the MOB
- Configuring Secure HBase Replication
- Configuring Region In Transition Recovery Chore Service
- Using a Secondary Index
- HBase Log Overview
- HBase Performance Tuning
-
Common Issues About HBase
- Why Does a Client Keep Failing to Connect to a Server for a Long Time?
- Operation Failures Occur in Stopping BulkLoad On the Client
- Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
- Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
- Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
- How Do I Restore a Region in the RIT State for a Long Time?
- Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
- Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
- When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
- Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
- What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
- How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
- Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
- Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
- Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
- Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
- Insufficient Rights When a Tenant Accesses Phoenix
- What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
- How Do I Fix Region Overlapping?
- Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
- Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
- Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
- How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
-
Using HDFS
- Using Hadoop from Scratch
- Configuring Memory Management
- Creating an HDFS Role
- Using the HDFS Client
- Running the DistCp Command
- Overview of HDFS File System Directories
- Changing the DataNode Storage Directory
- Configuring HDFS Directory Permission
- Configuring NFS
- Planning HDFS Capacity
- Configuring ulimit for HBase and HDFS
- Balancing DataNode Capacity
- Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
- Configuring the Number of Files in a Single HDFS Directory
- Configuring the Recycle Bin Mechanism
- Setting Permissions on Files and Directories
- Setting the Maximum Lifetime and Renewal Interval of a Token
- Configuring the Damaged Disk Volume
- Configuring Encrypted Channels
- Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
- Configuring the NameNode Blacklist
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Configuring Reserved Percentage of Disk Usage on DataNodes
- Configuring HDFS NodeLabel
- Configuring HDFS Mover
- Using HDFS AZ Mover
- Configuring HDFS DiskBalancer
- Configuring the Observer NameNode to Process Read Requests
- Performing Concurrent Operations on HDFS Files
- Introduction to HDFS Logs
- HDFS Performance Tuning
-
FAQ
- NameNode Startup Is Slow
- DataNode Is Normal but Cannot Report Data Blocks
- HDFS WebUI Cannot Properly Update Information About Damaged Data
- Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
- Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
- Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
- Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
- Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
- Why Does Array Border-crossing Occur During FileInputFormat Split?
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
- Can I Delete or Modify the Data Storage Directory in DataNode?
- Blocks Miss on the NameNode UI After the Successful Rollback
- Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
-
Using Hive
- Using Hive from Scratch
- Configuring Hive Parameters
- Hive SQL
- Permission Management
- Using a Hive Client
- Using HDFS Colocation to Store Hive Tables
- Using the Hive Column Encryption Function
- Customizing Row Separators
- Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
- Deleting Single-Row Records from Hive on HBase
- Configuring HTTPS/HTTP-based REST APIs
- Enabling or Disabling the Transform Function
- Access Control of a Dynamic Table View on Hive
- Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
- Using Hive to Read Data in a Relational Database
- Supporting Traditional Relational Database Syntax in Hive
- Creating User-Defined Hive Functions
- Enhancing beeline Reliability
- Viewing Table Structures Using the show create Statement as Users with the select Permission
- Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
- Inserting Data to a Directory That Does Not Exist
- Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
- Disabling of Specifying the location Keyword When Creating an Internal Hive Table
- Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
- Authorizing Over 32 Roles in Hive
- Restricting the Maximum Number of Maps for Hive Tasks
- HiveServer Lease Isolation
- Hive Supporting Transactions
- Switching the Hive Execution Engine to Tez
- Hive Materialized View
- Hive Log Overview
- Hive Performance Tuning
-
Common Issues About Hive
- How Do I Delete UDFs on Multiple HiveServers at the Same Time?
- Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
- Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
- Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Description of Hive Table Location (Either Be an OBS or HDFS Path)
- Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
- Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
- Why Does Hive Not Support Vectorized Query?
- Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
- How Do I Disable the Logging Function of Hive?
- Why Hive Tables in the OBS Directory Fail to Be Deleted?
- Hive Configuration Problems
-
Using Hudi
- Getting Started
- Basic Operations
- Hudi Performance Tuning
-
Common Issues About Hudi
-
Data Write
- Parquet/Avro schema Is Reported When Updated Data Is Written
- UnsupportedOperationException Is Reported When Updated Data Is Written
- SchemaCompatabilityException Is Reported When Updated Data Is Written
- What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
- Hudi Fails to Write Decimal Data with Lower Precision
- Data Collection
- Hive Synchronization
-
Data Write
- Using Hue (Versions Earlier Than MRS 3.x)
-
Using Hue (MRS 3.x or Later)
- Using Hue from Scratch
- Accessing the Hue Web UI
- Hue Common Parameters
- Using HiveQL Editor on the Hue Web UI
- Using the SparkSql Editor on the Hue Web UI
- Using the Metadata Browser on the Hue Web UI
- Using File Browser on the Hue Web UI
- Using Job Browser on the Hue Web UI
- Using HBase on the Hue Web UI
- Typical Scenarios
- Hue Log Overview
-
Common Issues About Hue
- How Do I Solve the Problem that HQL Fails to Be Executed in Hue Using Internet Explorer?
- Why Does the use database Statement Become Invalid When Hive Is Used?
- What Can I Do If HDFS Files Fail to Be Accessed Using Hue WebUI?
- How Do I Do If a Large File Fails to Upload on the Hue Page?
- Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- Using Impala
-
Using Kafka
- Using Kafka from Scratch
- Managing Kafka Topics
- Querying Kafka Topics
- Managing Kafka User Permissions
- Managing Messages in Kafka Topics
- Synchronizing Binlog-based MySQL Data to the MRS Cluster
- Creating a Kafka Role
- Kafka Common Parameters
- Safety Instructions on Using Kafka
- Kafka Specifications
- Using the Kafka Client
- Configuring Kafka HA and High Reliability Parameters
- Changing the Broker Storage Directory
- Checking the Consumption Status of Consumer Group
- Kafka Balancing Tool Instructions
- Balancing Data After Kafka Node Scale-Out
- Kafka Token Authentication Mechanism Tool Usage
- Introduction to Kafka Logs
- Performance Tuning
- Kafka Feature Description
- Migrating Data Between Kafka Nodes
- Common Issues About Kafka
- Using KafkaManager
-
Using Loader
- Using Loader from Scratch
- How to Use Loader
- Loader Link Configuration
- Managing Loader Links (Versions Earlier Than MRS 3.x)
- Source Link Configurations of Loader Jobs
- Destination Link Configurations of Loader Jobs
- Managing Loader Jobs
- Preparing a Driver for MySQL Database Link
- Loader Log Overview
- Example: Using Loader to Import Data from OBS to HDFS
- Common Issues About Loader
- Using Kudu
-
Using MapReduce
- Configuring the Log Archiving and Clearing Mechanism
- Reducing Client Application Failure Rate
- Transmitting MapReduce Tasks from Windows to Linux
- Configuring the Distributed Cache
- Configuring the MapReduce Shuffle Address
- Configuring the Cluster Administrator List
- Introduction to MapReduce Logs
- MapReduce Performance Tuning
-
Common Issues About MapReduce
- Why Does It Take a Long Time to Run a Task Upon ResourceManager Active/Standby Switchover?
- Why Does a MapReduce Task Stay Unchanged for a Long Time?
- Why the Client Hangs During Job Running?
- Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
- MapReduce Job Failed in Multiple NameService Environment
- Why a Fault MapReduce Node Is Not Blacklisted?
- Using OpenTSDB
-
Using Oozie
- Using Oozie from Scratch
- Using the Oozie Client
- Using Oozie Client to Submit an Oozie Job
-
Using Hue to Submit an Oozie Job
- Creating a Workflow
-
Submitting a Workflow Job
- Submitting a Hive2 Job
- Submitting a Spark2x Job
- Submitting a Java Job
- Submitting a Loader Job
- Submitting a MapReduce Job
- Submitting a Sub-workflow Job
- Submitting a Shell Job
- Submitting an HDFS Job
- Submitting a Streaming Job
- Submitting a DistCp Job
- Example of Mutual Trust Operations
- Submitting an SSH Job
- Submitting a Hive Script
- Submitting a Coordinator Periodic Scheduling Job
- Submitting a Bundle Batch Processing Job
- Querying the Operation Results
- Oozie Log Overview
- Common Issues About Oozie
- Using Presto
-
Using Ranger (MRS 3.x)
- Logging In to the Ranger Web UI
- Enabling Ranger Authentication
- Configuring Component Permission Policies
- Viewing Ranger Audit Information
- Configuring a Security Zone
- Changing the Ranger Data Source to LDAP for a Normal Cluster
- Viewing Ranger Permission Information
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark2x
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for Storm
- Ranger Log Overview
-
Common Issues About Ranger
- Why Ranger Startup Fails During the Cluster Installation?
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
- Using Spark
-
Using Spark2x
- Precautions
-
Basic Operation
- Getting Started
- Configuring Parameters Rapidly
- Common Parameters
- Spark on HBase Overview and Basic Applications
- Spark on HBase V2 Overview and Basic Applications
- SparkSQL Permission Management(Security Mode)
-
Scenario-Specific Configuration
- Configuring Multi-active Instance Mode
- Configuring the Multi-tenant Mode
- Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
- Configuring the Size of the Event Queue
- Configuring Executor Off-Heap Memory
- Enhancing Stability in a Limited Memory Condition
- Viewing Aggregated Container Logs on the Web UI
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Configuring the Default Number of Data Blocks Divided by SparkSQL
- Configuring the Compression Format of a Parquet Table
- Configuring the Number of Lost Executors Displayed in WebUI
- Setting the Log Level Dynamically
- Configuring Whether Spark Obtains HBase Tokens
- Configuring LIFO for Kafka
- Configuring Reliability for Connected Kafka
- Configuring Streaming Reading of Driver Execution Results
- Filtering Partitions without Paths in Partitioned Tables
- Configuring Spark2x Web UI ACLs
- Configuring Vector-based ORC Data Reading
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Hive Dynamic Partition Overwriting Syntax
- Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
- Configuring Local Disk Cache for JobHistory
- Configuring Spark SQL to Enable the Adaptive Execution Feature
- Configuring Event Log Rollover
- Adapting to the Third-party JDK When Ranger Is Used
- Spark2x Logs
- Obtaining Container Logs of a Running Spark Application
- Small File Combination Tools
- Using CarbonData for First Query
-
Spark2x Performance Tuning
- Spark Core Tuning
-
Spark SQL and DataFrame Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the INSERT...SELECT Operation
- Multiple JDBC Clients Concurrently Connecting to JDBCServer
- Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- Optimizing SQL Query of Data of Multiple Sources
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Tuning
-
Common Issues About Spark2x
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
- Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- Why the Operation Fails When the Table Name Is TABLE?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Do I Fail to Modify MetaData by Running the Hive Command?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- What Should I Do If the JDBCServer Process is Mistakenly Killed During a Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
- Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
- Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Are Some Functions Not Available when Another JDBCServer Is Connected?
- Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
- Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
- Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
-
Spark Streaming
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Why Is not an Application Displayed When I Run the Application with the Empty Part File?
- Why Does Spark2x Fail to Export a Table with the Same Field Name?
- Why JRE fatal error after running Spark application multiple times?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native Spark2x UI
- How Does Spark2x Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- What Should I Do If the Native Page of an Application of Spark2x JobHistory Fails to Display During Access to the Page
- Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
- Spark Shuffle Exception Handling
-
Spark Core
-
Using Sqoop
- Using Sqoop from Scratch
- Adapting Sqoop 1.4.7 to MRS 3.x Clusters
- Common Sqoop Commands and Parameters
-
Common Issues About Sqoop
- What Should I Do If Class QueryProvider Is Unavailable?
- How Do I Do If PostgreSQL or GaussDB Fails to Connect?
- What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
- What Should I Do If Data Failed to Be Synchronized Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
- What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
-
Using Storm
- Using Storm from Scratch
- Using the Storm Client
- Submitting Storm Topologies on the Client
- Accessing the Storm Web UI
- Managing Storm Topologies
- Querying Storm Topology Logs
- Storm Common Parameters
- Configuring a Storm Service User Password Policy
- Migrating Storm Services to Flink
- Storm Log Introduction
- Performance Tuning
- Using Tez
-
Using Yarn
- Common YARN Parameters
- Creating Yarn Roles
- Using the YARN Client
- Configuring Resources for a NodeManager Role Instance
- Changing NodeManager Storage Directories
- Configuring Strict Permission Control for Yarn
- Configuring Container Log Aggregation
- Using CGroups with YARN
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring the Access Channel Protocol
- Configuring Memory Usage Detection
- Configuring the Additional Scheduler WebUI
- Configuring Yarn Restart
- Configuring ApplicationMaster Work Preserving
- Configuring the Localized Log Levels
- Configuring Users That Run Tasks
- Yarn Log Overview
- Yarn Performance Tuning
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Common ZooKeeper Parameters
- Using a ZooKeeper Client
- Configuring the ZooKeeper Permissions
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
-
Component Operation Guide (LTS) (Paris Region)
-
Using CarbonData
- Overview
- Configuration Reference
- CarbonData Operation Guide
- CarbonData Performance Tuning
- CarbonData Access Control
- CarbonData Syntax Reference
- CarbonData Troubleshooting
-
CarbonData FAQ
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why Data Load Performance Decreases due to Bad Records?
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial Executors Is Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
- Why Data loading Fails During off heap?
- Why Do I Fail to Create a Hive Table?
- Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
- How Do I Logically Split Data Across Different Namespaces?
- Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
- Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
-
Using ClickHouse
- Using ClickHouse from Scratch
-
Common ClickHouse SQL Syntax
- CREATE DATABASE: Creating a Database
- CREATE TABLE: Creating a Table
- INSERT INTO: Inserting Data into a Table
- SELECT: Querying Table Data
- ALTER TABLE: Modifying a Table Structure
- DESC: Querying a Table Structure
- DROP: Deleting a Table
- SHOW: Displaying Information About Databases and Tables
- Importing and Exporting File Data
- User Management and Authentication
- ClickHouse Table Engine Overview
- Creating a ClickHouse Table
- Using the ClickHouse Data Migration Tool
- Monitoring of Slow ClickHouse Query Statements and Replication Table Data Synchronization
- Adaptive MV Usage in ClickHouse
- ClickHouse Log Overview
- Using DBService
-
Using Flink
- Using Flink from Scratch
- Viewing Flink Job Information
- Flink Configuration Management
- Security Configuration
- Security Hardening
- Security Statement
-
Using the Flink Web UI
- Overview
- FlinkServer Permissions Management
- Accessing the Flink Web UI
- Creating an Application on the Flink Web UI
- Creating a Cluster Connection on the Flink Web UI
- Creating a Data Connection on the Flink Web UI
- Managing Tables on the Flink Web UI
- Managing Jobs on the Flink Web UI
- Managing UDFs on the Flink Web UI
- Interconnecting FlinkServer with External Components
- Deleting Residual Information About Flink Tasks
- Flink Log Overview
- Flink Performance Tuning
- Common Flink Shell Commands
-
Using Flume
- Using Flume from Scratch
- Overview
- Installing the Flume Client on Clusters
- Viewing Flume Client Logs
- Stopping or Uninstalling the Flume Client
- Using the Encryption Tool of the Flume Client
- Flume Service Configuration Guide
- Flume Configuration Parameter Description
- Using Environment Variables in the properties.properties File
-
Non-Encrypted Transmission
- Configuring Non-encrypted Transmission
- Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
- Encrypted Transmission
- Viewing Flume Client Monitoring Information
- Connecting Flume to Kafka in Security Mode
- Connecting Flume with Hive in Security Mode
- Configuring the Flume Service Model
- Introduction to Flume Logs
- Flume Client Cgroup Usage Guide
- Secondary Development Guide for Flume Third-Party Plug-ins
- Common Issues About Flume
-
Using HBase
- Using HBase from Scratch
- Creating HBase Roles
- Using an HBase Client
- Configuring HBase Replication
- Enabling Cross-Cluster Copy
- Supporting Full-Text Index
- Using the ReplicationSyncUp Tool
- Using HIndex
- Configuring HBase DR
- Performing an HBase DR Service Switchover
- Configuring HBase Data Compression and Encoding
- Performing an HBase DR Active/Standby Cluster Switchover
- Community BulkLoad Tool
- Configuring the MOB
- Configuring Secure HBase Replication
- Configuring Region In Transition Recovery Chore Service
- Using a Secondary Index
- HBase Log Overview
- HBase Performance Tuning
-
Common Issues About HBase
- Why Does a Client Keep Failing to Connect to a Server for a Long Time?
- Operation Failures Occur in Stopping BulkLoad On the Client
- Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
- Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
- Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
- How Do I Restore a Region in the RIT State for a Long Time?
- Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
- Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
- When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
- Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
- What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
- How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
- Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
- Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
- Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
- Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
- Insufficient Rights When a Tenant Accesses Phoenix
- What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
- How Do I Fix Region Overlapping?
- Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
- Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
- Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
- How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
-
Using HDFS
- Configuring Memory Management
- Creating an HDFS Role
- Using the HDFS Client
- Running the DistCp Command
- Overview of HDFS File System Directories
- Changing the DataNode Storage Directory
- Configuring HDFS Directory Permission
- Configuring NFS
- Planning HDFS Capacity
- Configuring ulimit for HBase and HDFS
- Balancing DataNode Capacity
- Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
- Configuring the Number of Files in a Single HDFS Directory
- Configuring the Recycle Bin Mechanism
- Setting Permissions on Files and Directories
- Setting the Maximum Lifetime and Renewal Interval of a Token
- Configuring the Damaged Disk Volume
- Configuring Encrypted Channels
- Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
- Configuring the NameNode Blacklist
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Configuring Reserved Percentage of Disk Usage on DataNodes
- Configuring HDFS NodeLabel
- Configuring HDFS DiskBalancer
- Performing Concurrent Operations on HDFS Files
- Introduction to HDFS Logs
- HDFS Performance Tuning
-
FAQ
- NameNode Startup Is Slow
- Why MapReduce Tasks Fails in the Environment with Multiple NameServices?
- DataNode Is Normal but Cannot Report Data Blocks
- HDFS WebUI Cannot Properly Update Information About Damaged Data
- Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
- Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
- Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?
- Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
- Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
- Why Does Array Border-crossing Occur During FileInputFormat Split?
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
- Can I Delete or Modify the Data Storage Directory in DataNode?
- Blocks Miss on the NameNode UI After the Successful Rollback
- Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
-
Using HetuEngine
- Using HetuEngine from Scratch
- HetuEngine Permission Management
- Creating HetuEngine Compute Instances
- Configuring Data Sources
- Managing Data Sources
-
Managing Compute Instances
- Configuring Resource Groups
- Adjusting the Number of Worker Nodes
- Managing a HetuEngine Compute Instance
- Importing and Exporting Compute Instance Configurations
- Viewing the Instance Monitoring Page
- Viewing Coordinator and Worker Logs
- Using Resource Labels to Specify on Which Node Coordinators Should Run
- Using the HetuEngine Client
- Using the HetuEngine Cross-Source Function
- Using HetuEngine Cross-Domain Function
- Using a Third-Party Visualization Tool to Access HetuEngine
- Function & UDF Development and Application
- Introduction to HetuEngine Logs
- HetuEngine Performance Tuning
- Common Issues About HetuEngine
-
Using Hive
- Using Hive from Scratch
- Configuring Hive Parameters
- Hive SQL
- Permission Management
- Using a Hive Client
- Using HDFS Colocation to Store Hive Tables
- Using the Hive Column Encryption Function
- Customizing Row Separators
- Deleting Single-Row Records from Hive on HBase
- Configuring HTTPS/HTTP-based REST APIs
- Enabling or Disabling the Transform Function
- Access Control of a Dynamic Table View on Hive
- Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
- Using Hive to Read Data in a Relational Database
- Supporting Traditional Relational Database Syntax in Hive
- Creating User-Defined Hive Functions
- Enhancing beeline Reliability
- Viewing Table Structures Using the show create Statement as Users with the select Permission
- Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
- Inserting Data to a Directory That Does Not Exist
- Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
- Disabling of Specifying the location Keyword When Creating an Internal Hive Table
- Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
- Authorizing Over 32 Roles in Hive
- Restricting the Maximum Number of Maps for Hive Tasks
- HiveServer Lease Isolation
- Hive Supporting Transactions
- Switching the Hive Execution Engine to Tez
- Connecting Hive with External RDS
- Redis-based CacheStore of HiveMetaStore
- Hive Materialized View
- Hive Supporting Reading Hudi Tables
- Hive Supporting Cold and Hot Storage of Partitioned Metadata
- Hive Supporting ZSTD Compression Formats
- Hive Log Overview
- Hive Performance Tuning
-
Common Issues About Hive
- How Do I Delete UDFs on Multiple HiveServers at the Same Time?
- Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
- Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
- Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Description of Hive Table Location (Either Be an OBS or HDFS Path)
- Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
- Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
- Why Does Hive Not Support Vectorized Query?
- Hive Configuration Problems
- Using Hudi
-
Using Hue
- Using Hue from Scratch
- Accessing the Hue Web UI
- Hue Common Parameters
- Using HiveQL Editor on the Hue Web UI
- Using the Metadata Browser on the Hue Web UI
- Using File Browser on the Hue Web UI
- Using Job Browser on the Hue Web UI
- Using HBase on the Hue Web UI
- Typical Scenarios
- Hue Log Overview
-
Common Issues About Hue
- How Do I Solve the Problem that HQL Fails to Be Executed in Hue Using Internet Explorer?
- Why Does the use database Statement Become Invalid When Hive Is Used?
- What Can I Do If HDFS Files Fail to Be Accessed Using Hue WebUI?
- What Should I Do If a Large File Fails to Be Uploaded on the Hue Page?
- Hue Page Cannot Be Displayed When the Hive Service Is Not Installed in a Cluster
-
Using Kafka
- Using Kafka from Scratch
- Managing Kafka Topics
- Querying Kafka Topics
- Managing Kafka User Permissions
- Managing Messages in Kafka Topics
- Creating a Kafka Role
- Kafka Common Parameters
- Safety Instructions on Using Kafka
- Kafka Specifications
- Using the Kafka Client
- Configuring Kafka HA and High Reliability Parameters
- Changing the Broker Storage Directory
- Checking the Consumption Status of Consumer Group
- Kafka Balancing Tool Instructions
- Kafka Token Authentication Mechanism Tool Usage
- Kafka Feature Description
- Using Kafka UI
- Introduction to Kafka Logs
- Performance Tuning
- Common Issues About Kafka
-
Using Loader
- Common Loader Parameters
- Creating a Loader Role
- Managing Loader Links
-
Importing Data
- Overview
- Importing Data Using Loader
- Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
- Typical Scenario: Importing Data from an SFTP Server to HBase
- Typical Scenario: Importing Data from an SFTP Server to Hive
- Typical Scenario: Importing Data from an SFTP Server to Spark
- Typical Scenario: Importing Data from an FTP Server to HBase
- Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
- Typical Scenario: Importing Data from a Relational Database to HBase
- Typical Scenario: Importing Data from a Relational Database to Hive
- Typical Scenario: Importing Data from a Relational Database to Spark
- Typical Scenario: Importing Data from HDFS or OBS to HBase
- Typical Scenario: Importing Data from a Relational Database to ClickHouse
- Typical Scenario: Importing Data from HDFS to ClickHouse
-
Exporting Data
- Overview
- Using Loader to Export Data
- Typical Scenario: Exporting Data from HDFS/OBS to an SFTP Server
- Typical Scenario: Exporting Data from HBase to an SFTP Server
- Typical Scenario: Exporting Data from Hive to an SFTP Server
- Typical Scenario: Exporting Data from Spark to an SFTP Server
- Typical Scenario: Exporting Data from HDFS/OBS to a Relational Database
- Typical Scenario: Exporting Data from HBase to a Relational Database
- Typical Scenario: Exporting Data from Hive to a Relational Database
- Typical Scenario: Exporting Data from Spark to a Relational Database
- Typical Scenario: Importing Data from HBase to HDFS/OBS
- Job Management
- Operator Help
-
Client Tool Description
- Running a Loader Job by Using Commands
- loader-tool Usage Guide
- loader-tool Usage Example
- schedule-tool Usage Guide
- schedule-tool Usage Example
- Using loader-backup to Back Up Job Data
- Open Source sqoop-shell Tool Usage Guide
- Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
- Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
- Loader Log Overview
- Common Issues About Loader
-
Using MapReduce
- Converting MapReduce from the Single Instance Mode to the HA Mode
- Configuring the Log Archiving and Clearing Mechanism
- Reducing Client Application Failure Rate
- Transmitting MapReduce Tasks from Windows to Linux
- Configuring the Distributed Cache
- Configuring the MapReduce Shuffle Address
- Configuring the Cluster Administrator List
- Introduction to MapReduce Logs
- MapReduce Performance Tuning
-
Common Issues About MapReduce
- Why Does It Take a Long Time to Run a Task Upon ResourceManager Active/Standby Switchover?
- Why Does a MapReduce Task Stay Unchanged for a Long Time?
- Why the Client Hangs During Job Running?
- Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
- MapReduce Job Failed in Multiple NameService Environment
- Why a Fault MapReduce Node Is Not Blacklisted?
-
Using Oozie
- Using Oozie from Scratch
- Using the Oozie Client
- Enabling Oozie High Availability (HA)
- Using Oozie Client to Submit an Oozie Job
-
Using Hue to Submit an Oozie Job
- Creating a Workflow
-
Submitting a Workflow Job
- Submitting a Hive2 Job
- Submitting a Spark2x Job
- Submitting a Java Job
- Submitting a Loader Job
- Submitting a MapReduce Job
- Submitting a Sub-workflow Job
- Submitting a Shell Job
- Submitting an HDFS Job
- Submitting a DistCp Job
- Example of Mutual Trust Operations
- Submitting an SSH Job
- Submitting a Hive Script
- Submitting an Email Job
- Submitting a Coordinator Periodic Scheduling Job
- Submitting a Bundle Batch Processing Job
- Querying the Operation Results
- Oozie Log Overview
- Common Issues About Oozie
-
Using Ranger
- Logging In to the Ranger Web UI
- Enabling Ranger Authentication
- Configuring Component Permission Policies
- Viewing Ranger Audit Information
- Configuring a Security Zone
- Changing the Ranger Data Source to LDAP for a Normal Cluster
- Viewing Ranger Permission Information
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark2x
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for HetuEngine
- Ranger Log Overview
-
Common Issues About Ranger
- Why Ranger Startup Fails During the Cluster Installation?
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
-
Using Spark2x
-
Basic Operation
- Getting Started
- Configuring Parameters Rapidly
- Common Parameters
- Spark on HBase Overview and Basic Applications
- Spark on HBase V2 Overview and Basic Applications
- SparkSQL Permission Management(Security Mode)
-
Scenario-Specific Configuration
- Configuring Multi-active Instance Mode
- Configuring the Multi-tenant Mode
- Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
- Configuring the Size of the Event Queue
- Configuring Executor Off-Heap Memory
- Enhancing Stability in a Limited Memory Condition
- Viewing Aggregated Container Logs on the Web UI
- Configuring Whether to Display Spark SQL Statements Containing Sensitive Words
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Configuring the Default Number of Data Blocks Divided by SparkSQL
- Configuring the Compression Format of a Parquet Table
- Configuring the Number of Lost Executors Displayed in WebUI
- Setting the Log Level Dynamically
- Configuring Whether Spark Obtains HBase Tokens
- Configuring LIFO for Kafka
- Configuring Reliability for Connected Kafka
- Configuring Streaming Reading of Driver Execution Results
- Filtering Partitions without Paths in Partitioned Tables
- Configuring Spark2x Web UI ACLs
- Configuring Vector-based ORC Data Reading
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Hive Dynamic Partition Overwriting Syntax
- Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
- Configuring Local Disk Cache for JobHistory
- Configuring Spark SQL to Enable the Adaptive Execution Feature
- Configuring Event Log Rollover
- Adapting to the Third-party JDK When Ranger Is Used
- Spark2x Logs
- Obtaining Container Logs of a Running Spark Application
- Small File Combination Tools
- Using CarbonData for First Query
-
Spark2x Performance Tuning
- Spark Core Tuning
-
Spark SQL and DataFrame Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the INSERT...SELECT Operation
- Multiple JDBC Clients Concurrently Connecting to JDBCServer
- Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- Optimizing SQL Query of Data of Multiple Sources
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Tuning
- Spark on OBS Tuning
-
Common Issues About Spark2x
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
- Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- Why the Operation Fails When the Table Name Is TABLE?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Do I Fail to Modify MetaData by Running the Hive Command?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- What Should I Do If the JDBCServer Process is Mistakenly Killed During a Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
- Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
- Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Are Some Functions Not Available when Another JDBCServer Is Connected?
- Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?
- Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
- Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
-
Spark Streaming
- Streaming Task Prints the Same DAG Log Twice
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Why Is not an Application Displayed When I Run the Application with the Empty Part File?
- Why Does Spark2x Fail to Export a Table with the Same Field Name?
- Why JRE fatal error after running Spark application multiple times?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native Spark2x UI
- How Does Spark2x Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- What Should I Do If the Native Page of an Application of Spark2x JobHistory Fails to Display During Access to the Page
- Spark Shuffle Exception Handling
-
Spark Core
-
Basic Operation
- Using Tez
-
Using Yarn
- Common Yarn Parameters
- Creating Yarn Roles
- Using the Yarn Client
- Configuring Resources for a NodeManager Role Instance
- Changing NodeManager Storage Directories
- Configuring Strict Permission Control for Yarn
- Configuring Container Log Aggregation
- Using CGroups with YARN
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring the Access Channel Protocol
- Configuring Memory Usage Detection
- Configuring the Additional Scheduler WebUI
- Configuring Yarn Restart
- Configuring ApplicationMaster Work Preserving
- Configuring the Localized Log Levels
- Configuring Users That Run Tasks
- Yarn Log Overview
- Yarn Performance Tuning
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- What Is the Queue Replacement Policy?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Common ZooKeeper Parameters
- Using a ZooKeeper Client
- Configuring the ZooKeeper Permissions
- Changing the ZooKeeper Storage Directory
- Configuring the ZooKeeper Connection
- Configuring ZooKeeper Response Timeout Interval
- Binding the Client to an IP Address
- Configuring the Port Range Bound to the Client
- Performing Special Configuration on ZooKeeper Clients in the Same JVM
- Configuring a Quota for a Znode
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
-
Using CarbonData
-
API Reference (Paris Region)
- Before You Start
- API Overview
- Calling APIs
- Application Cases
- API V2
- API V1.1
- Out-of-Date APIs
- Permissions Policies and Supported Actions
- Appendix
- Change History
-
User Guide (Kuala Lumpur Region)
-
Overview
- What Is MRS?
- Advantages of MRS Compared with Self-Built Hadoop
- Application Scenarios
-
Components
- Alluxio
- CarbonData
- ClickHouse
- DBService
- Flink
- Flume
- HBase
- HDFS
- Hive
- Hue
- Impala
- Kafka
- KafkaManager
- KrbServer and LdapServer
- Kudu
- Loader
- Manager
- MapReduce
- Oozie
- OpenTSDB
- Presto
- Ranger
- Spark
- Spark2x
- Storm
- Tez
- Yarn
- ZooKeeper
- Functions
- Constraints
- Technical Support
- Permissions Management
- Related Services
- Common Concepts
- MRS Quick Start
- Preparing a User
-
Configuring a Cluster
- Methods of Creating MRS Clusters
- Quick Creation of a Cluster
- Creating a Custom Cluster
- Creating a Custom Topology Cluster
- Adding a Tag to a Cluster
- Communication Security Authorization
- Configuring an Auto Scaling Rule
- Managing Data Connections
- Installing the Third-Party Software Using Bootstrap Actions
- Viewing Failed MRS Tasks
- Viewing Information of a Historical Cluster
-
Managing Clusters
- Logging In to a Cluster
- Cluster Overview
- Cluster O&M
- Managing Nodes
-
Job Management
- Introduction to MRS Jobs
- Running a MapReduce Job
- Running a SparkSubmit Job
- Running a HiveSQL Job
- Running a SparkSql Job
- Running a Flink Job
- Running a Kafka Job
- Viewing Job Configuration and Logs
- Stopping a Job
- Deleting a Job
- Using Encrypted OBS Data for Job Running
- Configuring Job Notification Rules
-
Component Management
- Object Management
- Viewing Configuration
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configuration
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Starting and Stopping a Cluster
- Synchronizing Cluster Configuration
- Exporting Cluster Configuration
- Performing Rolling Restart
- Alarm Management
- Patch Management
-
Tenant Management
- Before You Start
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a Tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Using an MRS Client
- Configuring a Cluster with Storage and Compute Decoupled
- Accessing Web Pages of Open Source Components Managed in MRS Clusters
- Accessing Manager
-
FusionInsight Manager Operation Guide (Applicable to 3.x)
- Getting Started
- Homepage
-
Cluster
- Cluster Management
- Managing a Service
- Instance Management
- Hosts
- O&M
- Audit
- Tenant Resources
- System Configuration
- Cluster Management
- Log Management
- Backup and Recovery Management
-
Security Management
- Security Overview
- Account Management
-
Security Hardening
- Hardening Policy
- Configuring a Trusted IP Address to Access LDAP
- HFile and WAL Encryption
- Security Configuration
- Configuring an IP Address Whitelist for Modifications Allowed by HBase
- Updating a Key for a Cluster
- Hardening the LDAP
- Configuring Kafka Data Encryption During Transmission
- Configuring HDFS Data Encryption During Transmission
- Encrypting the Communication Between Controller and Agent
- Updating SSH Keys for User omm
- Security Maintenance
- Security Statement
-
Alarm Reference (Applicable to MRS 3.x)
- ALM-12001 Audit Log Dumping Failure
- ALM-12004 OLdap Resource Abnormal
- ALM-12005 OKerberos Resource Abnormal
- ALM-12006 Node Fault
- ALM-12007 Process Fault
- ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes
- ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes
- ALM-12014 Partition Lost
- ALM-12015 Partition Filesystem Readonly
- ALM-12016 CPU Usage Exceeds the Threshold
- ALM-12017 Insufficient Disk Capacity
- ALM-12018 Memory Usage Exceeds the Threshold
- ALM-12027 Host PID Usage Exceeds the Threshold
- ALM-12028 The number of processes that are in the D state on the host exceeds the threshold
- ALM-12033 Slow Disk Fault
- ALM-12034 Periodical Backup Failure
- ALM-12035 Unknown Data Status After Recovery Task Failure
- ALM-12038 Monitoring Indicator Dumping Failure
- ALM-12039 Active/Standby OMS Databases Not Synchronized
- ALM-12040 Insufficient System Entropy
- ALM-12041 Incorrect Permission on Key Files
- ALM-12042 Incorrect Configuration of Key Files
- ALM-12045 Network Read Packet Dropped Rate Exceeds the Threshold
- ALM-12046 Network Write Packet Dropped Rate Exceeds the Threshold
- ALM-12047 Network Read Packet Error Rate Exceeds the Threshold
- ALM-12048 Network Write Packet Error Rate Exceeds the Threshold
- ALM-12049 Network Read Throughput Rate Exceeds the Threshold
- ALM-12050 Network Write Throughput Rate Exceeds the Threshold
- ALM-12051 Disk Inode Usage Exceeds the Threshold
- ALM-12052 TCP Temporary Port Usage Exceeds the Threshold
- ALM-12053 Host File Handle Usage Exceeds the Threshold
- ALM-12054 Invalid Certificate File
- ALM-12055 The Certificate File Is About to Expire
- ALM-12057 Metadata Not Configured with the Task to Periodically Back Up Data to a Third-Party Server
- ALM-12061 Process Usage Exceeds the Threshold
- ALM-12062 OMS Parameter Configurations Mismatch with the Cluster Scale
- ALM-12063 Unavailable Disk
- ALM-12064 Host Random Port Range Conflicts with Cluster Used Port
- ALM-12066 Trust Relationships Between Nodes Become Invalid
- ALM-12067 Tomcat Resource Is Abnormal
- ALM-12068 ACS Resource Is Abnormal
- ALM-12069 AOS Resource Is Abnormal
- ALM-12070 Controller Resource Is Abnormal
- ALM-12071 Httpd Resource Is Abnormal
- ALM-12072 FloatIP Resource Is Abnormal
- ALM-12073 CEP Resource Is Abnormal
- ALM-12074 FMS Resource Is Abnormal
- ALM-12075 PMS Resource Is Abnormal
- ALM-12076 GaussDB Resource Is Abnormal
- ALM-12077 User omm Expired
- ALM-12078 Password of User omm Expired
- ALM-12079 User omm Is About to Expire
- ALM-12080 Password of User omm Is About to Expire
- ALM-12081User ommdba Expired
- ALM-12082 User ommdba Is About to Expire
- ALM-12083 Password of User ommdba Is About to Expire
- ALM-12084 Password of User ommdba Expired
- ALM-12085 Service Audit Log Dump Failure
- ALM-12087 System Is in the Upgrade Observation Period
- ALM-12089 Inter-Node Network Is Abnormal
- ALM-12101 AZ Unhealthy
- ALM-12102 AZ HA Component Is Not Deployed Based on DR Requirements
- ALM-12110 Failed to get ECS temporary ak/sk
- ALM-13000 ZooKeeper Service Unavailable
- ALM-13001 Available ZooKeeper Connections Are Insufficient
- ALM-13002 ZooKeeper Direct Memory Usage Exceeds the Threshold
- ALM-13003 GC Duration of the ZooKeeper Process Exceeds the Threshold
- ALM-13004 ZooKeeper Heap Memory Usage Exceeds the Threshold
- ALM-13005 Failed to Set the Quota of Top Directories of ZooKeeper Components
- ALM-13006 Znode Number or Capacity Exceeds the Threshold
- ALM-13007 Available ZooKeeper Client Connections Are Insufficient
- ALM-13008 ZooKeeper Znode Usage Exceeds the Threshold
- ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold
- ALM-13010 Znode Usage of a Directory with Quota Configured Exceeds the Threshold
- ALM-14000 HDFS Service Unavailable
- ALM-14001 HDFS Disk Usage Exceeds the Threshold
- ALM-14002 DataNode Disk Usage Exceeds the Threshold
- ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold
- ALM-14006 Number of HDFS Files Exceeds the Threshold
- ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold
- ALM-14008 DataNode Heap Memory Usage Exceeds the Threshold
- ALM-14009 Number of Dead DataNodes Exceeds the Threshold
- ALM-14010 NameService Service Is Abnormal
- ALM-14011 DataNode Data Directory Is Not Configured Properly
- ALM-14012 JournalNode Is Out of Synchronization
- ALM-14013 Failed to Update the NameNode FsImage File
- ALM-14014 NameNode GC Time Exceeds the Threshold
- ALM-14015 DataNode GC Time Exceeds the Threshold
- ALM-14016 DataNode Direct Memory Usage Exceeds the Threshold
- ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold
- ALM-14018 NameNode Non-heap Memory Usage Exceeds the Threshold
- ALM-14019 DataNode Non-heap Memory Usage Exceeds the Threshold
- ALM-14020 Number of Entries in the HDFS Directory Exceeds the Threshold
- ALM-14021 NameNode Average RPC Processing Time Exceeds the Threshold
- ALM-14022 NameNode Average RPC Queuing Time Exceeds the Threshold
- ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold
- ALM-14024 Tenant Space Usage Exceeds the Threshold
- ALM-14025 Tenant File Object Usage Exceeds the Threshold
- ALM-14026 Blocks on DataNode Exceed the Threshold
- ALM-14027 DataNode Disk Fault
- ALM-14028 Number of Blocks to Be Supplemented Exceeds the Threshold
- ALM-14029 Number of Blocks in a Replica Exceeds the Threshold
- ALM-16000 Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold
- ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold
- ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold
- ALM-16003 Background Thread Usage Exceeds the Threshold
- ALM-16004 Hive Service Unavailable
- ALM-16005 The Heap Memory Usage of the Hive Process Exceeds the Threshold
- ALM-16006 The Direct Memory Usage of the Hive Process Exceeds the Threshold
- ALM-16007 Hive GC Time Exceeds the Threshold
- ALM-16008 Non-Heap Memory Usage of the Hive Process Exceeds the Threshold
- ALM-16009 Map Number Exceeds the Threshold
- ALM-16045 Hive Data Warehouse Is Deleted
- ALM-16046 Hive Data Warehouse Permission Is Modified
- ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
- ALM-16048 Tez or Spark Library Path Does Not Exist
- ALM-17003 Oozie Service Unavailable
- ALM-17004 Oozie Heap Memory Usage Exceeds the Threshold
- ALM-17005 Oozie Non Heap Memory Usage Exceeds the Threshold
- ALM-17006 Oozie Direct Memory Usage Exceeds the Threshold
- ALM-17007 Garbage Collection (GC) Time of the Oozie Process Exceeds the Threshold
- ALM-18000 Yarn Service Unavailable
- ALM-18002 NodeManager Heartbeat Lost
- ALM-18003 NodeManager Unhealthy
- ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold
- ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold
- ALM-18010 ResourceManager GC Time Exceeds the Threshold
- ALM-18011 NodeManager GC Time Exceeds the Threshold
- ALM-18012 JobHistoryServer GC Time Exceeds the Threshold
- ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold
- ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold
- ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold
- ALM-18016 Non Heap Memory Usage of ResourceManager Exceeds the Threshold
- ALM-18017 Non Heap Memory Usage of NodeManager Exceeds the Threshold
- ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold
- ALM-18019 Non Heap Memory Usage of JobHistoryServer Exceeds the Threshold
- ALM-18020 Yarn Task Execution Timeout
- ALM-18021 Mapreduce Service Unavailable
- ALM-18022 Insufficient Yarn Queue Resources
- ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold
- ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold
- ALM-18025 Number of Terminated Yarn Tasks Exceeds the Threshold
- ALM-18026 Number of Failed Yarn Tasks Exceeds the Threshold
- ALM-19000 HBase Service Unavailable
- ALM-19006 HBase Replication Sync Failed
- ALM-19007 HBase GC Time Exceeds the Threshold
- ALM-19008 Heap Memory Usage of the HBase Process Exceeds the Threshold
- ALM-19009 Direct Memory Usage of the HBase Process Exceeds the Threshold
- ALM-19011 RegionServer Region Number Exceeds the Threshold
- ALM-19012 HBase System Table Directory or File Lost
- ALM-19013 Duration of Regions in transaction State Exceeds the Threshold
- ALM-19014 Capacity Quota Usage on ZooKeeper Exceeds the Threshold Severely
- ALM-19015 Quantity Quota Usage on ZooKeeper Exceeds the Threshold
- ALM-19016 Quantity Quota Usage on ZooKeeper Exceeds the Threshold Severely
- ALM-19017 Capacity Quota Usage on ZooKeeper Exceeds the Threshold
- ALM-19018 HBase Compaction Queue Exceeds the Threshold
- ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold
- ALM-19020 Number of HBase WAL Files to Be Synchronized Exceeds the Threshold
- ALM-20002 Hue Service Unavailable
- ALM-24000 Flume Service Unavailable
- ALM-24001 Flume Agent Exception
- ALM-24003 Flume Client Connection Interrupted
- ALM-24004 Exception Occurs When Flume Reads Data
- ALM-24005 Exception Occurs When Flume Transmits Data
- ALM-24006 Heap Memory Usage of Flume Server Exceeds the Threshold
- ALM-24007 Flume Server Direct Memory Usage Exceeds the Threshold
- ALM-24008 Flume Server Non-Heap Memory Usage Exceeds the Threshold
- ALM-24009 Flume Server Garbage Collection (GC) Time Exceeds the Threshold
- ALM-24010 Flume Certificate File Is Invalid or Damaged
- ALM-24011 Flume Certificate File Is About to Expire
- ALM-24012 Flume Certificate File Has Expired
- ALM-24013 Flume MonitorServer Certificate File Is Invalid or Damaged
- ALM-24014 Flume MonitorServer Certificate Is About to Expire
- ALM-24015 Flume MonitorServer Certificate File Has Expired
- ALM-25000 LdapServer Service Unavailable
- ALM-25004 Abnormal LdapServer Data Synchronization
- ALM-25005 nscd Service Exception
- ALM-25006 Sssd Service Exception
- ALM-25500 KrbServer Service Unavailable
- ALM-26051 Storm Service Unavailable
- ALM-26052 Number of Available Supervisors of the Storm Service Is Less Than the Threshold
- ALM-26053 Storm Slot Usage Exceeds the Threshold
- ALM-26054 Nimbus Heap Memory Usage Exceeds the Threshold
- ALM-27001 DBService Service Unavailable
- ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
- ALM-27004 Data Inconsistency Between Active and Standby DBServices
- ALM-27005 Database Connections Usage Exceeds the Threshold
- ALM-27006 Disk Space Usage of the Data Directory Exceeds the Threshold
- ALM-27007 Database Enters the Read-Only Mode
- ALM-29000 Impala Service Unavailable
- ALM-29004 Impalad Process Memory Usage Exceeds the Threshold
- ALM-29005 Number of JDBC Connections to Impalad Exceeds the Threshold
- ALM-29006 Number of ODBC Connections to Impalad Exceeds the Threshold
- ALM-29100 Kudu Service Unavailable
- ALM-29104 Tserver Process Memory Usage Exceeds the Threshold
- ALM-29106 Tserver Process CPU Usage Exceeds the Threshold
- ALM-29107 Tserver Process Memory Usage Exceeds the Threshold
- ALM-38000 Kafka Service Unavailable
- ALM-38001 Insufficient Kafka Disk Capacity
- ALM-38002 Kafka Heap Memory Usage Exceeds the Threshold
- ALM-38004 Kafka Direct Memory Usage Exceeds the Threshold
- ALM-38005 GC Duration of the Broker Process Exceeds the Threshold
- ALM-38006 Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold
- ALM-38007 Status of Kafka Default User Is Abnormal
- ALM-38008 Abnormal Kafka Data Directory Status
- ALM-38009 Busy Broker Disk I/Os
- ALM-38010 Topics with Single Replica
- ALM-43001 Spark2x Service Unavailable
- ALM-43006 Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold
- ALM-43007 Non-Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold
- ALM-43008 The Direct Memory Usage of the JobHistory2x Process Exceeds the Threshold
- ALM-43009 JobHistory2x Process GC Time Exceeds the Threshold
- ALM-43010 Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
- ALM-43011 Non-Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
- ALM-43012 Direct Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold
- ALM-43013 JDBCServer2x Process GC Time Exceeds the Threshold
- ALM-43017 JDBCServer2x Process Full GC Number Exceeds the Threshold
- ALM-43018 JobHistory2x Process Full GC Number Exceeds the Threshold
- ALM-43019 Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold
- ALM-43020 Non-Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold
- ALM-43021 Direct Memory Usage of the IndexServer2x Process Exceeds the Threshold
- ALM-43022 IndexServer2x Process GC Time Exceeds the Threshold
- ALM-43023 IndexServer2x Process Full GC Number Exceeds the Threshold
- ALM-44004 Presto Coordinator Resource Group Queuing Tasks Exceed the Threshold
- ALM-44005 Presto Coordinator Process GC Time Exceeds the Threshold
- ALM-44006 Presto Worker Process GC Time Exceeds the Threshold
- ALM-45175 Average Time for Calling OBS Metadata APIs Is Greater than the Threshold
- ALM-45176 Success Rate of Calling OBS Metadata APIs Is Lower than the Threshold
- ALM-45177 Success Rate of Calling OBS Data Read APIs Is Lower than the Threshold
- ALM-45178 Success Rate of Calling OBS Data Write APIs Is Lower Than the Threshold
- ALM-45275 Ranger Service Unavailable
- ALM-45276 Abnormal RangerAdmin status
- ALM-45277 RangerAdmin Heap Memory Usage Exceeds the Threshold
- ALM-45278 RangerAdmin Direct Memory Usage Exceeds the Threshold
- ALM-45279 RangerAdmin Non Heap Memory Usage Exceeds the Threshold
- ALM-45280 RangerAdmin GC Duration Exceeds the Threshold
- ALM-45281 UserSync Heap Memory Usage Exceeds the Threshold
- ALM-45282 UserSync Direct Memory Usage Exceeds the Threshold
- ALM-45283 UserSync Non Heap Memory Usage Exceeds the Threshold
- ALM-45284 UserSync Garbage Collection (GC) Time Exceeds the Threshold
- ALM-45285 TagSync Heap Memory Usage Exceeds the Threshold
- ALM-45286 TagSync Direct Memory Usage Exceeds the Threshold
- ALM-45287 TagSync Non Heap Memory Usage Exceeds the Threshold
- ALM-45288 TagSync Garbage Collection (GC) Time Exceeds the Threshold
- ALM-45425 ClickHouse Service Unavailable
- ALM-45426 ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold
- ALM-45427 ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold
- ALM-45736 Guardian Service Unavailable
-
MRS Manager Operation Guide (Applicable to 2.x and Earlier Versions)
- Introduction to MRS Manager
- Checking Running Tasks
- Monitoring Management
- Alarm Management
-
Object Management
- Managing Objects
- Viewing Configurations
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configurations
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Managing a Host
- Isolating a Host
- Canceling Host Isolation
- Starting or Stopping a Cluster
- Synchronizing Cluster Configurations
- Exporting Configuration Data of a Cluster
- Log Management
-
Health Check Management
- Performing a Health Check
- Viewing and Exporting a Health Check Report
- Configuring the Number of Health Check Reports to Be Reserved
- Managing Health Check Reports
- DBService Health Check Indicators
- Flume Health Check Indicators
- HBase Health Check Indicators
- Host Health Check Indicators
- HDFS Health Check Indicators
- Hive Health Check Indicators
- Kafka Health Check Indicators
- KrbServer Health Check Indicators
- LdapServer Health Check Indicators
- Loader Health Check Indicators
- MapReduce Health Check Indicators
- OMS Health Check Indicators
- Spark Health Check Indicators
- Storm Health Check Indicators
- Yarn Health Check Indicators
- ZooKeeper Health Check Indicators
- Static Service Pool Management
-
Tenant Management
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Backup and Restoration
-
Security Management
- Default Users of Clusters with Kerberos Authentication Disabled
- Default Users of Clusters with Kerberos Authentication Enabled
- Changing the Password of an OS User
- Changing the password of user admin
- Changing the Password of the Kerberos Administrator
- Changing the Passwords of the LDAP Administrator and the LDAP User
- Changing the Password of a Component Running User
- Changing the Password of the OMS Database Administrator
- Changing the Password of the Data Access User of the OMS Database
- Changing the Password of a Component Database User
- Updating Cluster Keys
- Permissions Management
-
MRS Multi-User Permission Management
- Users and Permissions of MRS Clusters
- Default Users of Clusters with Kerberos Authentication Enabled
- Creating a Role
- Creating a User Group
- Creating a User
- Modifying User Information
- Locking a User
- Unlocking a User
- Deleting a User
- Changing the Password of an Operation User
- Initializing the Password of a System User
- Downloading a User Authentication File
- Modifying a Password Policy
- Configuring Cross-Cluster Mutual Trust Relationships
- Configuring Users to Access Resources of a Trusted Cluster
- Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS
- Patch Operation Guide
- Restoring Patches for the Isolated Hosts
- Rolling Restart
-
MRS Cluster Component Operation Gudie
- Using Alluxio
- Using CarbonData (for Versions Earlier Than MRS 3.x)
-
Using CarbonData (for MRS 3.x or Later)
- Overview
- Configuration Reference
- CarbonData Operation Guide
- CarbonData Performance Tuning
- CarbonData Access Control
- CarbonData Syntax Reference
- CarbonData Troubleshooting
-
CarbonData FAQ
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why Data Load Performance Decreases due to Bad Records?
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial ExecutorsIs Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
- Why Data loading Fails During off heap?
- Why Do I Fail to Create a Hive Table?
- Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner?
- How Do I Logically Split Data Across Different Namespaces?
- Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
- Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
- Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
- Using ClickHouse
- Using DBService
-
Using Flink
- Using Flink from Scratch
- Viewing Flink Job Information
- Flink Configuration Management
- Security Configuration
- Security Hardening
- Security Statement
- Using the Flink Web UI
- Flink Log Overview
- Flink Performance Tuning
- Common Flink Shell Commands
- Reference
-
Using Flume
- Using Flume from Scratch
- Overview
- Installing the Flume Client
- Viewing Flume Client Logs
- Stopping or Uninstalling the Flume Client
- Using the Encryption Tool of the Flume Client
- Flume Service Configuration Guide
- Flume Configuration Parameter Description
- Using Environment Variables in the properties.properties File
-
Non-Encrypted Transmission
- Configuring Non-encrypted Transmission
- Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
- Encrypted Transmission
- Viewing Flume Client Monitoring Information
- Connecting Flume to Kafka in Security Mode
- Connecting Flume with Hive in Security Mode
- Configuring the Flume Service Model
- Introduction to Flume Logs
- Flume Client Cgroup Usage Guide
- Secondary Development Guide for Flume Third-Party Plug-ins
- Common Issues About Flume
-
Using HBase
- Using HBase from Scratch
- Using an HBase Client
- Creating HBase Roles
- Configuring HBase Replication
- Configuring HBase Parameters
- Enabling Cross-Cluster Copy
- Using the ReplicationSyncUp Tool
- Using HIndex
- Configuring HBase DR
- Configuring HBase Data Compression and Encoding
- Performing an HBase DR Service Switchover
- Performing an HBase DR Active/Standby Cluster Switchover
- Community BulkLoad Tool
- Configuring the MOB
- Configuring Secure HBase Replication
- Configuring Region In Transition Recovery Chore Service
- Using a Secondary Index
- HBase Log Overview
- HBase Performance Tuning
-
Common Issues About HBase
- Why Does a Client Keep Failing to Connect to a Server for a Long Time?
- Operation Failures Occur in Stopping BulkLoad On the Client
- Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
- Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
- Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
- How Do I Restore a Region in the RIT State for a Long Time?
- Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
- Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
- When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
- Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
- What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
- How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
- Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
- Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
- Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
- Why Does the ImportTsv Tool Display "Permission denied" When the Same Linux User as and a Different Kerberos User from the Region Server Are Used?
- Insufficient Rights When a Tenant Accesses Phoenix
- What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
- How Do I Fix Region Overlapping?
- Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
- Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
- Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
- How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
-
Using HDFS
- Using Hadoop from Scratch
- Configuring Memory Management
- Creating an HDFS Role
- Using the HDFS Client
- Running the DistCp Command
- Overview of HDFS File System Directories
- Changing the DataNode Storage Directory
- Configuring HDFS Directory Permission
- Configuring NFS
- Planning HDFS Capacity
- Configuring ulimit for HBase and HDFS
- Balancing DataNode Capacity
- Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
- Configuring the Number of Files in a Single HDFS Directory
- Configuring the Recycle Bin Mechanism
- Setting Permissions on Files and Directories
- Setting the Maximum Lifetime and Renewal Interval of a Token
- Configuring the Damaged Disk Volume
- Configuring Encrypted Channels
- Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
- Configuring the NameNode Blacklist
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Configuring Reserved Percentage of Disk Usage on DataNodes
- Configuring HDFS NodeLabel
- Configuring HDFS Mover
- Using HDFS AZ Mover
- Configuring HDFS DiskBalancer
- Configuring the Observer NameNode to Process Read Requests
- Performing Concurrent Operations on HDFS Files
- Introduction to HDFS Logs
- HDFS Performance Tuning
-
FAQ
- NameNode Startup Is Slow
- DataNode Is Normal but Cannot Report Data Blocks
- HDFS WebUI Cannot Properly Update Information About Damaged Data
- Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception?
- Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
- Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?
- Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
- Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files
- Why Does Array Border-crossing Occur During FileInputFormat Split?
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time
- Can I Delete or Modify the Data Storage Directory in DataNode?
- Blocks Miss on the NameNode UI After the Successful Rollback
- Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
-
Using Hive
- Using Hive from Scratch
- Configuring Hive Parameters
- Hive SQL
- Permission Management
- Using a Hive Client
- Using HDFS Colocation to Store Hive Tables
- Using the Hive Column Encryption Function
- Customizing Row Separators
- Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
- Deleting Single-Row Records from Hive on HBase
- Configuring HTTPS/HTTP-based REST APIs
- Enabling or Disabling the Transform Function
- Access Control of a Dynamic Table View on Hive
- Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
- Using Hive to Read Data in a Relational Database
- Supporting Traditional Relational Database Syntax in Hive
- Creating User-Defined Hive Functions
- Enhancing beeline Reliability
- Viewing Table Structures Using the show create Statement as Users with the select Permission
- Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
- Inserting Data to a Directory That Does Not Exist
- Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
- Disabling of Specifying the location Keyword When Creating an Internal Hive Table
- Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
- Authorizing Over 32 Roles in Hive
- Restricting the Maximum Number of Maps for Hive Tasks
- HiveServer Lease Isolation
- Hive Supporting Transactions
- Switching the Hive Execution Engine to Tez
- Hive Materialized View
- Hive Log Overview
- Hive Performance Tuning
-
Common Issues About Hive
- How Do I Delete UDFs on Multiple HiveServers at the Same Time?
- Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
- Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
- Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Description of Hive Table Location (Either Be an OBS or HDFS Path)
- Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
- Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
- Why Does Hive Not Support Vectorized Query?
- Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
- How Do I Disable the Logging Function of Hive?
- Why Hive Tables in the OBS Directory Fail to Be Deleted?
- Hive Configuration Problems
- Using Hue (Versions Earlier Than MRS 3.x)
-
Using Hue (MRS 3.x or Later)
- Using Hue from Scratch
- Accessing the Hue Web UI
- Hue Common Parameters
- Using HiveQL Editor on the Hue Web UI
- Using the SparkSql Editor on the Hue Web UI
- Using the Metadata Browser on the Hue Web UI
- Using File Browser on the Hue Web UI
- Using Job Browser on the Hue Web UI
- Using HBase on the Hue Web UI
- Typical Scenarios
- Hue Log Overview
-
Common Issues About Hue
- How Do I Solve the Problem that HQL Fails to Be Executed in Hue Using Internet Explorer?
- Why Does the use database Statement Become Invalid When Hive Is Used?
- What Can I Do If HDFS Files Fail to Be Accessed Using Hue WebUI?
- What Can I Do If a Large File Fails to Be Uploaded on the Hue Page?
- Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- Using Impala
-
Using Kafka
- Using Kafka from Scratch
- Managing Kafka Topics
- Querying Kafka Topics
- Managing Kafka User Permissions
- Managing Messages in Kafka Topics
- Synchronizing Binlog-based MySQL Data to the MRS Cluster
- Creating a Kafka Role
- Kafka Common Parameters
- Safety Instructions on Using Kafka
- Kafka Specifications
- Using the Kafka Client
- Configuring Kafka HA and High Reliability Parameters
- Changing the Broker Storage Directory
- Checking the Consumption Status of Consumer Group
- Kafka Balancing Tool Instructions
- Balancing Data After Kafka Node Scale-Out
- Kafka Token Authentication Mechanism Tool Usage
- Introduction to Kafka Logs
- Performance Tuning
- Kafka Feature Description
- Migrating Data Between Kafka Nodes
- Common Issues About Kafka
- Using KafkaManager
- Using Kudu
-
Using Loader
- Using Loader from Scratch
- How to Use Loader
- Loader Link Configuration
- Managing Loader Links (Versions Earlier Than MRS 3.x)
- Source Link Configurations of Loader Jobs
- Destination Link Configurations of Loader Jobs
- Managing Loader Jobs
- Preparing a Driver for MySQL Database Link
- Loader Log Overview
- Example: Using Loader to Import Data from OBS to HDFS
- Common Issues About Loader
-
Using MapReduce
- Configuring the Log Archiving and Clearing Mechanism
- Reducing Client Application Failure Rate
- Transmitting MapReduce Tasks from Windows to Linux
- Configuring the Distributed Cache
- Configuring the MapReduce Shuffle Address
- Configuring the Cluster Administrator List
- Introduction to MapReduce Logs
- MapReduce Performance Tuning
-
Common Issues About MapReduce
- Why Does It Take a Long Time to Run a Task Upon ResourceManager Active/Standby Switchover?
- Why Does a MapReduce Task Stay Unchanged for a Long Time?
- Why the Client Hangs During Job Running?
- Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
- MapReduce Job Failed in Multiple NameService Environment
- Why a Fault MapReduce Node Is Not Blacklisted?
-
Using Oozie
- Using Oozie from Scratch
- Using the Oozie Client
- Using Oozie Client to Submit an Oozie Job
-
Using Hue to Submit an Oozie Job
- Creating a Workflow
-
Submitting a Workflow Job
- Submitting a Hive2 Job
- Submitting a Spark2x Job
- Submitting a Java Job
- Submitting a Loader Job
- Submitting a MapReduce Job
- Submitting a Sub-workflow Job
- Submitting a Shell Job
- Submitting an HDFS Job
- Submitting a Streaming Job
- Submitting a DistCp Job
- Example of Mutual Trust Operations
- Submitting an SSH Job
- Submitting a Hive Script
- Submitting a Coordinator Periodic Scheduling Job
- Submitting a Bundle Batch Processing Job
- Querying the Operation Results
- Oozie Log Overview
- Common Issues About Oozie
- Using Presto
-
Using Ranger (MRS 3.x)
- Logging In to the Ranger Web UI
- Enabling Ranger Authentication
- Configuring Component Permission Policies
- Viewing Ranger Audit Information
- Configuring a Security Zone
- Changing the Ranger Data Source to LDAP for a Normal Cluster
- Viewing Ranger Permission Information
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark2x
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for Storm
- Ranger Log Overview
-
Common Issues About Ranger
- Why Ranger Startup Fails During the Cluster Installation?
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
- Using Spark
-
Using Spark2x
- Precautions
-
Basic Operation
- Getting Started
- Configuring Parameters Rapidly
- Common Parameters
- Spark on HBase Overview and Basic Applications
- Spark on HBase V2 Overview and Basic Applications
- SparkSQL Permission Management(Security Mode)
-
Scenario-Specific Configuration
- Configuring Multi-active Instance Mode
- Configuring the Multi-tenant Mode
- Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
- Configuring the Size of the Event Queue
- Configuring Executor Off-Heap Memory
- Enhancing Stability in a Limited Memory Condition
- Viewing Aggregated Container Logs on the Web UI
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Configuring the Default Number of Data Blocks Divided by SparkSQL
- Configuring the Compression Format of a Parquet Table
- Configuring the Number of Lost Executors Displayed in WebUI
- Setting the Log Level Dynamically
- Configuring Whether Spark Obtains HBase Tokens
- Configuring LIFO for Kafka
- Configuring Reliability for Connected Kafka
- Configuring Streaming Reading of Driver Execution Results
- Filtering Partitions without Paths in Partitioned Tables
- Configuring Spark2x Web UI ACLs
- Configuring Vector-based ORC Data Reading
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Hive Dynamic Partition Overwriting Syntax
- Configuring the Column Statistics Histogram to Enhance the CBO Accuracy
- Configuring Local Disk Cache for JobHistory
- Configuring Spark SQL to Enable the Adaptive Execution Feature
- Configuring Event Log Rollover
- Adapting to the Third-party JDK When Ranger Is Used
- Spark2x Logs
- Obtaining Container Logs of a Running Spark Application
- Small File Combination Tools
- Using CarbonData for First Query
-
Spark2x Performance Tuning
- Spark Core Tuning
-
Spark SQL and DataFrame Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the INSERT...SELECT Operation
- Multiple JDBC Clients Concurrently Connecting to JDBCServer
- Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- Optimizing SQL Query of Data of Multiple Sources
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Tuning
-
Common Issues About Spark2x
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Is the Return Code of Driver Inconsistent with Application State Displayed on ResourceManager WebUI?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications
- Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- Why the Operation Fails When the Table Name Is TABLE?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Do I Fail to Modify MetaData by Running the Hive Command?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- What Should I Do If the JDBCServer Process is Mistakenly Killed During a Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
- Why Does the "Permission denied" Exception Occur When I Create a Temporary Table or View in Spark-beeline?
- Why Is the "Code of method ... grows beyond 64 KB" Error Message Displayed When I Run Complex SQL Statements?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Are Some Functions Not Available when Another JDBCServer Is Connected?
- Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?
- Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
- Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
-
Spark Streaming
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why does Spark Streaming Application Fail to Restart from Checkpoint When It Creates an Input Stream Without Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- Why the Job Information Obtained from the restful Interface of an Ended Spark Application Is Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Why Is not an Application Displayed When I Run the Application with the Empty Part File?
- Why Does Spark2x Fail to Export a Table with the Same Field Name?
- Why JRE fatal error after running Spark application multiple times?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native Spark2x UI
- How Does Spark2x Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- What Should I Do If the Native Page of an Application of Spark2x JobHistory Fails to Display During Access to the Page
- Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
- Spark Shuffle Exception Handling
-
Spark Core
-
Using Sqoop
- Using Sqoop from Scratch
- Adapting Sqoop 1.4.7 to MRS 3.x Clusters
- Common Sqoop Commands and Parameters
-
Common Issues About Sqoop
- What Should I Do If Class QueryProvider Is Unavailable?
- What Should I Do If PostgreSQL or GaussDB Failed to Be Connected?
- What Should I Do If Data Failed to Be Synchronized to a Hive Table on the OBS Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to an ORC or Parquet Table Using hive-table?
- What Should I Do If Data Failed to Be Synchronized Using hive-table?
- What Should I Do If Data Failed to Be Synchronized to a Hive Parquet Table Using HCatalog?
- What Should I Do If the Data Type of Fields timestamp and data Is Incorrect During Data Synchronization Between Hive and MySQL?
-
Using Storm
- Using Storm from Scratch
- Using the Storm Client
- Submitting Storm Topologies on the Client
- Accessing the Storm Web UI
- Managing Storm Topologies
- Querying Storm Topology Logs
- Storm Common Parameters
- Configuring a Storm Service User Password Policy
- Migrating Storm Services to Flink
- Storm Log Introduction
- Performance Tuning
- Using Tez
-
Using Yarn
- Common YARN Parameters
- Creating Yarn Roles
- Using the YARN Client
- Configuring Resources for a NodeManager Role Instance
- Changing NodeManager Storage Directories
- Configuring Strict Permission Control for Yarn
- Configuring Container Log Aggregation
- Using CGroups with YARN
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring the Access Channel Protocol
- Configuring Memory Usage Detection
- Configuring the Additional Scheduler WebUI
- Configuring Yarn Restart
- Configuring ApplicationMaster Work Preserving
- Configuring the Localized Log Levels
- Configuring Users That Run Tasks
- Yarn Log Overview
- Yarn Performance Tuning
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Why Is an Application Moved Back to the Original Queue After ResourceManager Restarts?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Common ZooKeeper Parameters
- Using a ZooKeeper Client
- Configuring the ZooKeeper Permissions
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
- Security Description
- High-Risk Operations Overview
-
FAQs
-
MRS Overview
- What Is MRS Used For?
- What Types of Distributed Storage Does MRS Support?
- How Do I Create an MRS Cluster Using a Custom Security Group?
- How Do I Use MRS?
- How Does MRS Ensure Security of Data and Services?
- Can I Configure a Phoenix Connection Pool?
- Does MRS Support Change of the Network Segment?
- Can I Downgrade the Specifications of an MRS Cluster Node?
- What Is the Relationship Between Hive and Other Components?
- Does an MRS Cluster Support Hive on Spark?
- What Are the Differences Between Hive Versions?
- Which MRS Cluster Version Supports Hive Connection and User Synchronization?
- What Are the Differences Between OBS and HDFS in Data Storage?
- How Do I Obtain the Hadoop Pressure Test Tool?
- What Is the Relationship Between Impala and Other Components?
- Statement About the Public IP Addresses in the Open-Source Third-Party SDK Integrated by MRS
- What Is the Relationship Between Kudu and HBase?
- Does MRS Support Running Hive on Kudu?
- What Are the Solutions for processing 1 Billion Data Records?
- Can I Change the IP address of DBService?
- Can I Clear MRS sudo Logs?
- Is the Storm Log also limited to 20 GB in MRS cluster 2.1.0?
- What Is Spark ThriftServer?
- What Access Protocols Are Supported by Kafka?
- What Is the Compression Ratio of zstd?
- Why Are the HDFS, YARN, and MapReduce Components Unavailable When an MRS Cluster Is Created?
- Why Is the ZooKeeper Component Unavailable When an MRS Cluster Is Created?
- Which Python Versions Are Supported by Spark Tasks in an MRS 3.1.0 Cluster?
- How Do I Enable Different Service Programs to Use Different YARN Queues?
- Differences and Relationships Between the MRS Management Console and Cluster Manager
- How Do I Unbind an EIP from an MRS Cluster Node?
- Account and Password
-
Accounts and Permissions
- Does an MRS Cluster Support Access Permission Control If Kerberos Authentication Is not Enabled?
- How Do I Assign Tenant Management Permission to a New Account?
- How Do I Customize an MRS Policy?
- Why Is the Manage User Function Unavailable on the System Page on MRS Manager?
- Does Hue Support Account Permission Configuration?
- Client Usage
-
Web Page Access
- How Do I Change the Session Timeout Duration for an Open Source Component Web UI?
- Why Cannot I Refresh the Dynamic Resource Plan Page on MRS Tenant Tab?
- What Do I Do If the Kafka Topic Monitoring Tab Is Unavailable on Manager?
- How Do I Do If an Error Is Reported or Some Functions Are Unavailable When I Access the Web UIs of HDFS, Hue, YARN, and Flink?
-
Alarm Monitoring
- In an MRS Streaming Cluster, Can the Kafka Topic Monitoring Function Send Alarm Notifications?
- Where Can I View the Running Resource Queues When the Alarm "ALM-18022 Insufficient Yarn Queue Resources" Is Reported?
- How Do I Understand the Multi-Level Chart Statistics in the HBase Operation Requests Metric?
- Performance Tuning
-
Job Development
- How Do I Get My Data into OBS or HDFS?
- What Types of Spark Jobs Can Be Submitted in a Cluster?
- Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
- What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
- How Do I View MRS Job Logs?
- How Do I Do If the Message "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page." Is Displayed?
- LauncherJob Job Execution Is Failed And the Error Message "jobPropertiesMap is null." Is Displayed
- How Do I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
- How Do I Do If a SparkStreaming Job Fails After Being Executed Dozens of Hours and the OBS Access 403 Error is Reported?
- How Do I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
- How Do I Do If Error Message "java.io.IOException: Connection reset by peer" Is Displayed During the Execution of a Spark Job?
- How Do I Do If Error Message "requestId=4971883851071737250" Is Displayed When a Spark Job Accesses OBS?
- Why DataArtsStudio Occasionally Fail to Schedule Spark Jobs and the Rescheduling also Fails?
- How Do I Do If a Flink Job Fails to Execute and the Error Message "java.lang.NoSuchFieldError: SECURITY_SSL_ENCRYPT_ENABLED" Is Displayed?
- Why Submitted Yarn Job Cannot Be Viewed on the Web UI?
- How Do I Modify the HDFS NameSpace (fs.defaultFS) of an Existing Cluster?
- How Do I Do If the launcher-job Queue Is Stopped by YARN due to Insufficient Heap Size When I Submit a Flink Job on the Management Plane?
- How Do I Do If the Error Message "slot request timeout" Is Displayed When I Submit a Flink Job?
- Data Import and Export of DistCP Jobs
- Cluster Upgrade/Patching
- Cluster Access
-
Big Data Service Development
- Can MRS Run Multiple Flume Tasks at a Time?
- How Do I Change FlumeClient Logs to Standard Logs?
- Where Are the .jar Files and Environment Variables of Hadoop Located?
- What Compression Algorithms Does HBase Support?
- Can MRS Write Data to HBase Through the HBase External Table of Hive?
- How Do I View HBase Logs?
- How Do I Set the TTL for an HBase Table?
- How Do I Balance HDFS Data?
- How Do I Change the Number of HDFS Replicas?
- What Is the Port for Accessing HDFS Using Python?
- How Do I Modify the HDFS Active/Standby Switchover Class?
- What Is the Recommended Number Type of DynamoDB in Hive Tables?
- Can the Hive Driver Be Interconnected with DBCP2?
- How Do I View the Hive Table Created by Another User?
- Can I Export the Query Result of Hive Data?
- How Do I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
- How Do I Do If a "hivesql/hivescript" Job Fails to Submit After Hive Is Added?
- What If an Excel File Downloaded on Hue Failed to Open?
- How Do I Do If Sessions Are Not Released After Hue Connects to HiveServer and the Error Message "over max user connections" Is Displayed?
- How Do I Reset Kafka Data?
- How Do I Obtain the Client Version of MRS Kafka?
- What Access Protocols Are Supported by Kafka?
- How Do I Do If Error Message "Not Authorized to access group xxx" Is Displayed When a Kafka Topic Is Consumed?
- What Compression Algorithms Does Kudu Support?
- How Do I View Kudu Logs?
- How Do I Handle the Kudu Service Exceptions Generated During Cluster Creation?
- Does OpenTSDB Support Python APIs?
- How Do I Configure Other Data Sources on Presto?
- How Do I Connect to Spark Shell from MRS?
- How Do I Connect to Spark Beeline from MRS?
- Where Are the Execution Logs of Spark Jobs Stored?
- How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
- How Do I Check Whether the ResourceManager Configuration of Yarn Is Correct?
- How Do I Modify the allow_drop_detached Parameter of ClickHouse?
- How Do I Do If an Alarm Indicating Insufficient Memory Is Reported During Spark Task Execution?
- How Do I Do If ClickHouse Consumes Excessive CPU Resources?
- How Do I Enable the Map Type on ClickHouse?
- A Large Number of OBS APIs Are Called When Spark SQL Accesses Hive Partitioned Tables
- API
-
Cluster Management
- How Do I View All Clusters?
- How Do I View Log Information?
- How Do I View Cluster Configuration Information?
- How Do I Install Kafka and Flume in an MRS Cluster?
- How Do I Stop an MRS Cluster?
- Can I Expand Data Disk Capacity for MRS?
- Can I Add Components to an Existing Cluster?
- Can I Delete Components Installed in an MRS Cluster?
- Can I Change MRS Cluster Nodes on the MRS Console?
- How Do I Shield Cluster Alarm/Event Notifications?
- Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
- How Do I Configure the knox Memory?
- What Is the Python Version Installed for an MRS Cluster?
- How Do I View the Configuration File Directory of Each Component?
- How Do I Do If the Time on MRS Nodes Is Incorrect?
- How Do I Query the Startup Time of an MRS Node?
- How Do I Do If Trust Relationships Between Nodes Are Abnormal?
- How Do I Adjust the Memory Size of the manager-executor Process?
-
Kerberos Usage
- How Do I Change the Kerberos Authentication Status of a Created MRS Cluster?
- What Are the Ports of the Kerberos Authentication Service?
- How Do I Deploy the Kerberos Service in a Running Cluster?
- How Do I Access Hive in a Cluster with Kerberos Authentication Enabled?
- How Do I Access Presto in a Cluster with Kerberos Authentication Enabled?
- How Do I Access Spark in a Cluster with Kerberos Authentication Enabled?
- How Do I Prevent Kerberos Authentication Expiration?
- Metadata Management
-
MRS Overview
-
Troubleshooting
- Accessing the Web Pages
-
Cluster Management
- Failed to Reduce Task Nodes
- Adding a New Disk to an MRS Cluster
- Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
- Replacing a Disk in an MRS Cluster (Applicable to 3.x)
- MRS Backup Failure
- Inconsistency Between df and du Command Output on the Core Node
- Disassociating a Subnet from the ACL Network
- MRS Becomes Abnormal After hostname Modification
- DataNode Restarts Unexpectedly
- Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
- Failed to Download the MRS Cluster Client
- Failed to Scale Out an MRS Cluster
- Error Occurs When MRS Executes the Insert Command Using Beeline
- How Do I Upgrade EulerOS to Fix Vulnerabilities in an MRS Cluster?
- Using CDM to Migrate Data to HDFS
- Alarms Are Frequently Generated in the MRS Cluster
- Memory Usage of the PMS Process Is High
- High Memory Usage of the Knox Process
- It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
- How Do I Locate a Job Submission Failure?
- OS Disk Space Is Insufficient Due to Oversized HBase Log Files
- Failed to Delete a New Tenant on FusionInsight Manager
- Using Alluixo
- Using ClickHouse
-
Using DBService
- DBServer Instance Is in Abnormal Status
- DBServer Instance Remains in the Restoring State
- Default Port 20050 or 20051 Is Occupied
- DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
- DBService Backup Failure
- Components Failed to Connect to DBService in Normal State
- DBServer Failed to Start
- DBService Backup Failed Because the Floating IP Address Is Unreachable
- DBService Failed to Start Due to the Loss of the DBService Configuration File
-
Using Flink
- "IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
- "IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
- The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
- Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
- Flink Service Program Fails to Read Files on the NFS Disk
- Failed to Customize the Flink Log4j Log Level
- Using Flume
-
Using HBase
- Slow Response to HBase Connection
- Failed to Authenticate the HBase User
- RegionServer Failed to Start Because the Port Is Occupied
- HBase Failed to Start Due to Insufficient Node Memory
- HBase Service Unavailable Due to Poor HDFS Performance
- HBase Failed to Start Due to Inappropriate Parameter Settings
- RegionServer Failed to Start Due to Residual Processes
- HBase Failed to Start Due to a Quota Set on HDFS
- HBase Failed to Start Due to Corrupted Version Files
- High CPU Usage Caused by Zero-Loaded RegionServer
- HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
- The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
- RegionServer Instance Is in the Restoring State
- HBase Failed to Start in a Newly Installed Cluster
- HBase Failed to Start Due to the Loss of the ACL Table Directory
- HBase Failed to Start After the Cluster Is Powered Off and On
- Failed to Import HBase Data Due to Oversized File Blocks
- Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
- Failed to Run the hbase shell Command on the MRS Cluster Client
- Disordered Information Display on the HBase Shell Client Console Due to Printing of the INFO Information
- HBase Failed to Start Due to Insufficient RegionServer Memory
-
Using HDFS
- All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
- An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
- Failed to Use Python to Remotely Connect to the Port of HDFS
- HDFS Capacity Usage Reaches 100%, Causing Unavailable Upper-layer Services Such as HBase and Spark
- An Error Is Reported During HDFS and Yarn Startup
- HDFS Permission Setting Error
- A DataNode of HDFS Is Always in the Decommissioning State
- HDFS Failed to Start Due to Insufficient Memory
- A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
- CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
- Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
- Common File Read/Write Faults
- Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
- A Client File Fails to Be Closed After Data Writing
- File Fails to Be Uploaded to HDFS Due to File Errors
- After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
- Failed to Read Files, and "FileNotFoundException" Is Displayed
- Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
- Adjusting the Log Level of the Shell Client
- File Read Fails, and "No common protection layer" Is Displayed
- Failed to Write Files Because the HDFS Directory Quota Is Insufficient
- Balancing Fails, and "Source and target differ in block-size" Is Displayed
- A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
- Uneven Data Distribution Due to Non-HDFS Data Residuals
- Uneven Data Distribution Due to the Client Installation on the DataNode
- Handling Unbalanced DataNode Disk Usage on Nodes
- Locating Common Balance Problems
- HDFS Displays Insufficient Disk Space But 10% Disk Space Remains
- An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
- Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
- Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
- HDFS Client Failed to Delete Overlong Directories
- An Error Is Reported When a Node Outside the Cluster Accesses MRS HDFS
-
Using Hive
- Content Recorded in Hive Logs
- Causes of Hive Startup Failure
- "Cannot modify xxx at runtime" Is Reported When the set Command Is Executed in a Security Cluster
- How to Specify a Queue When Hive Submits a Job
- How to Set Map and Reduce Memory on the Client
- Specifying the Output File Compression Format When Importing a Table
- desc Table Cannot Be Completely Displayed
- NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
- A Newly Created User Has No Query Permissions
- An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
- An Error Is Reported When the "load data inpath" Command Is Executed
- An Error Is Reported When the "load data local inpath" Command Is Executed
- An Error Is Reported When the "create external table" Command Is Executed
- An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
- Insufficient Permissions to Execute the set role admin Command
- An Error Is Reported When UDF Is Created Using Beeline
- Difference Between Hive Service Health Status and Hive Instance Health Status
- Hive Alarms and Triggering Conditions
- "authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
- Failed to Access ZooKeeper from the Client
- "Invalid function" Is Displayed When a UDF Is Used
- Hive Service Status Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Concerning
- Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
- Hive Task Failed to Run on the Client But Successful on Yarn
- An Error Is Reported When the select Statement Is Executed
- Failed to Drop a Large Number of Partitions
- Failed to Start a Local Task
- Failed to Start WebHCat
- Sample Code Error for Hive Secondary Development After Domain Switching
- MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
- "Failed to execute session hooks: over max connections" Reported by Beeline
- beeline Reports the "OutOfMemoryError" Error
- Task Execution Fails Because the Input File Number Exceeds the Threshold
- Task Execution Fails Because of Stack Memory Overflow
- Task Failed Due to Concurrent Writes to One Table or Partition
- Hive Task Failed Due to a Lack of HDFS Directory Permission
- Failed to Load Data to Hive Tables
- HiveServer and HiveHCat Process Faults
- An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
- Timeout Reported When Adding the Hive Table Field
- Failed to Restart the Hive Service
- Hive Failed to Delete a Table
- An Error Is Reported When msck repair table table_name Is Run on Hive
- How Do I Release Disk Space After Dropping a Table in Hive?
- Connection Timeout During SQL Statement Execution on the Client
- WebHCat Failed to Start Due to Abnormal Health Status
- WebHCat Failed to Start Because the mapred-default.xml File Cannot Be Parsed
- Using Hue
- Using Impala
-
Using Kafka
- An Error Is Reported When Kafka Is Run to Obtain a Topic
- Flume Normally Connects to Kafka But Fails to Send Messages
- Producer Failed to Send Data and Threw "NullPointerException"
- Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
- Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
- Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
- Consumer Fails to Consume Data and Remains in the Waiting State
- SparkStreaming Fails to Consume Kafka Messages, and "Error getting partition metadata" Is Displayed
- Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
- SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
- Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
- Checking Whether Data Consumed by a Customer Is Lost
- Failed to Start a Component Due to Account Lock
- Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
- Kafka Topics Cannot Be Deleted
- Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
- When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
- Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
- When a Kafka Topic Fails to Be Created, "NoNode for /brokers/ids" Is Displayed
- When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
- Consumer Repeatedly Consumes Data
- Leader for the Created Kafka Topic Partition Is Displayed as none
- Safety Instructions on Using Kafka
- Obtaining Kafka Consumer Offset Information
- Adding or Deleting Configurations for a Topic
- Reading the Content of the __consumer_offsets Internal Topic
- Configuring Logs for Shell Commands on the Client
- Obtaining Topic Distribution Information
- Kafka HA Usage Description
- Kafka Producer Writes Oversized Records
- Kafka Consumer Reads Oversized Records
- High Usage of Multiple Disks on a Kafka Cluster Node
- Using Oozie
- Using Presto
-
Using Spark
- An Error Occurs When the Split Size Is Changed in a Spark Application
- An Error Is Reported When Spark Is Used
- A Spark Job Fails to Run Due to Incorrect JAR File Import
- A Spark Job Is Pending Due to Insufficient Memory
- An Error Is Reported During Spark Running
- Executor Memory Reaches the Threshold Is Displayed in Driver
- Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
- Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
- ApplicationMaster Failed to Start Twice in Yarn-client Mode
- Failed to Connect to ResourceManager When a Spark Task Is Submitted
- DataArts Studio Failed to Schedule Spark Jobs
- Submission Status of the Spark Job API Is Error
- Alarm 43006 Is Repeatedly Generated in the Cluster
- Failed to Create or Delete a Table in Spark Beeline
- Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
- Large Number of Shuffle Results Are Lost During Spark Task Execution
- Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
- Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
- Spark Task Submission Failure
- Spark Task Execution Failure
- JDBCServer Connection Failure
- Failed to View Spark Task Logs
- Authentication Fails When Spark Connects to Other Services
- An Error Occurs When Spark Connects to Redis
- An Error Is Reported When spark-beeline Is Used to Query a Hive View
-
Using Sqoop
- Connecting Sqoop to MySQL
- Failed to Find the HBaseAdmin.<init> Method When Sqoop Reads Data from the MySQL Database to HBase
- Failed to Export HBase Data to HDFS Through Hue's Sqoop Task
- A Format Error Is Reported When Sqoop Is Used to Export Data from Hive to MySQL 8.0
- An Error Is Reported When sqoop import Is Executed to Import PostgreSQL Data to Hive
- Sqoop Failed to Read Data from MySQL and Write Parquet Files to OBS
-
Using Storm
- Invalid Hyperlink of Events on the Storm UI
- Failed to Submit a Topology
- Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
- The Worker Log Is Empty After a Topology Is Submitted
- Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
- "well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
- When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
- The GC Parameter Configured for the Service Topology Does Not Take Effect
- Internal Server Error Is Displayed When the User Queries Information on the UI
- Using Ranger
-
Using Yarn
- Plenty of Jobs Are Found After Yarn Is Started
- "GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
- Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
- Temporary Files Are Not Deleted When an MR Job Is Abnormal
- ResourceManager of Yarn (Port 8032) Throws Error "connection refused"
- Failed to View Job Logs on the Yarn Web UI
- An Error Is Reported When a Queue Name Is Clicked on the Yarn Page
- Using ZooKeeper
- Accessing OBS
- Appendix
-
Overview
-
API Reference (Kuala Lumpur Region)
- Before You Start
- API Overview
- Calling APIs
- Application Cases
- API V2
- API V1.1
- Permissions Policies and Supported Actions
- Appendix
-
User Guide (Ankara Region)
-
Overview
- What Is MRS?
- Application Scenarios
-
Components
- CarbonData
- ClickHouse
- Containers
- CDL
- DBService
- Doris
- Elasticsearch
- Flink
- Flume
- FTP-Server
- GraphBase
- Guardian
- HBase
- HDFS
- HetuEngine
- Hive
- Hudi
- Hue
- IoTDB
- Kafka
- KafkaManager
- KMS
- KrbServer and LdapServer
- LakeSearch
- Loader
- Manager
- MapReduce
- MemArtsCC
- Metadata
- MOTService
- Oozie
- Ranger
- Redis
- RTDService
- Solr
- Spark
- Tez
- YARN
- ZooKeeper
- Functions
- Constraints
- Permissions Management
- Related Services
- Preparing a User
- Getting Started
-
Configuring a Cluster
- How to Create an MRS Cluster
- Quick Configuration
- Creating a Custom Cluster
- Configuring Custom Topology
- Adding a Tag to a Cluster/Node
- Communication Security Authorization
-
Configuring Auto Scaling Rules
- Overview
- Configuring Auto Scaling During Cluster Creation
- Creating an Auto Scaling Policy for an Existing Cluster
- Scenario 1: Using Auto Scaling Rules Alone
- Scenario 2: Using Resource Plans Alone
- Scenario 3: Using Both Auto Scaling Rules and Resource Plans
- Modifying an Auto Scaling Policy
- Deleting an Auto Scaling Policy
- Enabling or Disabling an Auto Scaling Policy
- Viewing an Auto Scaling Policy
- Configuring Automation Scripts
- Configuring Auto Scaling Metrics
- Managing Data Connections
- Installing Third-Party Software Using Bootstrap Actions
- Viewing Failed MRS Tasks
- Viewing Information of a Historical Cluster
-
Managing Clusters
- Logging In to a Cluster
- Cluster Overview
- Viewing and Customizing Cluster Monitoring Metrics
- Cluster O&M
- Managing Nodes
- Job Management
-
Component Management
- Object Management
- Viewing Configuration
- Managing Services
- Configuring Service Parameters
- Configuring Customized Service Parameters
- Synchronizing Service Configuration
- Managing Role Instances
- Configuring Role Instance Parameters
- Synchronizing Role Instance Configuration
- Decommissioning and Recommissioning a Role Instance
- Starting and Stopping a Cluster
- Performing Rolling Restart
- Alarm Management
-
Tenant Management
- Overview
- Creating a Tenant
- Creating a Sub-tenant
- Deleting a Tenant
- Managing a Tenant Directory
- Restoring Tenant Data
- Creating a Resource Pool
- Modifying a Resource Pool
- Deleting a Resource Pool
- Configuring a Queue
- Configuring the Queue Capacity Policy of a Resource Pool
- Clearing Configuration of a Queue
- Bootstrap Actions
- Using an MRS Client
-
Configuring a Cluster with Decoupled Storage and Compute
- MRS Storage-Compute Decoupling
- Interconnecting with OBS Using the Cluster Agency Mechanism
- Interconnecting with OBS Using the Guardian Service
- Accessing Web Pages of Open Source Components Managed in MRS Clusters
- Accessing FusionInsight Manager
-
FusionInsight Manager Operation Guide
- Getting Started
- Home Page
-
Cluster
- Cluster Management
- Managing a Service
- Instance Management
- Hosts
- O&M
- Audit
- Tenant Resources
- System Configuration
- Cluster Management
- Log Management
-
Backup and Recovery Management
- Introduction
-
Backing Up Data
- Backing Up Manager Data
- Backing Up CDL Data
- Backing Up Containers Metadata
- Backing Up ClickHouse Metadata
- Backing Up ClickHouse Service Data
- Backing Up DBService Data
- Backing Up Flink Metadata
- Backing Up HBase Metadata
- Backing Up HBase Service Data
- Backing Up Elasticsearch Service Data
- Backing Up MOTService Service Data
- Backing Up NameNode Data
- Backing Up HDFS Service Data
- Backing Up Hive Service Data
- Backing Up IoTDB Metadata
- Backing Up IoTDB Service Data
- Backing Up Kafka Metadata
- Backing Up Redis Data
- Backing Up RTDService Metadata
- Backing Up Solr Metadata
- Backing Up Solr Service Data
-
Recovering Data
- Restoring Manager Data
- Restoring CDL Data
- Restoring Containers Metadata
- Restoring ClickHouse Metadata
- Restoring ClickHouse Service Data
- Restoring DBService Data
- Restoring Flink Metadata
- Restoring HBase Metadata
- Restoring HBase Service Data
- Restoring Elasticsearch Service Data
- Restoring MOTService Service Data
- Restoring NameNode Data
- Restoring HDFS Service Data
- Restoring Hive Service Data
- Restoring IoTDB Metadata
- Restoring IoTDB Service Data
- Restoring Kafka Metadata
- Restoring Redis Data
- Restoring RTDService Metadata
- Restoring Solr Metadata
- Restoring Solr Service Data
- Enabling Cross-Cluster Replication
- Managing Local Quick Restoration Tasks
- Modifying a Backup Task
- Viewing Backup and Restoration Tasks
- SQL Inspector
-
Security Management
- Security Overview
- Account Management
-
Security Hardening
- Hardening Policies
- Configuring a Trusted IP Address to Access LDAP
- HFile and WAL Encryption
- Configuring Hadoop Security Parameters
- Configuring an IP Address Whitelist for Modification Allowed by HBase
- Updating a Key for a Cluster
- Changing the Cluster Encryption Mode
- Hardening the LDAP
- Configuring Kafka Data Encryption During Transmission
- Configuring HDFS Data Encryption During Transmission
- Configuring HetuEngine Data Encryption During Transmission
- Configuring RTD Data Encryption During Transmission
- Configuring IoTDB Data Encryption During Transmission
- ClickHouse Security Hardening
- Hive Metastore Security Hardening
- Configuring ZooKeeper SSL
- Encrypting the Communication Between the Controller and the Agent
- Updating SSH Keys for User omm
- Changing the Timeout Duration of the Manager Page
- Resetting Sessions During Secondary Authentication Configuration
- Security Maintenance
- Security Statement
-
Alarm Reference
- ALM-12001 Audit Log Dumping Failure
- ALM-12004 Manager OLdap Resource Abnormal
- ALM-12005 Manager OKerberos Resource Abnormal
- ALM-12006 NodeAgent Process Is Abnormal
- ALM-12007 Process Fault
- ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes
- ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes
- ALM-12014 Device Partition Lost
- ALM-12015 Partition Filesystem Readonly
- ALM-12016 CPU Usage Exceeds the Threshold
- ALM-12017 Insufficient Disk Capacity
- ALM-12018 Memory Usage Exceeds the Threshold
- ALM-12027 Host PID Usage Exceeds the Threshold
- ALM-12028 Number of Processes in the D State on a Host Exceeds the Threshold
- ALM-12033 Slow Disk Fault
- ALM-12034 Periodical Backup Failure
- ALM-12035 Unknown Data Status After Recovery Task Failure
- ALM-12038 Monitoring Indicator Dumping Failure
- ALM-12039 Active/Standby OMS Databases Not Synchronized
- ALM-12040 Insufficient OS Entropy
- ALM-12041 Incorrect Permission on Key Files
- ALM-12042 Incorrect Configuration of Key Files
- ALM-12045 Read Packet Dropped Rate Exceeds the Threshold
- ALM-12046 Write Packet Dropped Rate Exceeds the Threshold
- ALM-12047 Read Packet Error Rate Exceeds the Threshold
- ALM-12048 Write Packet Error Rate Exceeds the Threshold
- ALM-12049 Network Read Throughput Rate Exceeds the Threshold
- ALM-12050 Network Write Throughput Rate Exceeds the Threshold
- ALM-12051 Disk Inode Usage Exceeds the Threshold
- ALM-12052 TCP Temporary Port Usage Exceeds the Threshold
- ALM-12053 Host File Handle Usage Exceeds the Threshold
- ALM-12054 Invalid Certificate File
- ALM-12055 The Certificate File Is About to Expire
- ALM-12057 Metadata Not Configured with the Task to Periodically Back Up Data to a Third-Party Server
- ALM-12061 Process Usage Exceeds the Threshold
- ALM-12062 OMS Parameter Configurations Mismatch with the Cluster Scale
- ALM-12063 Unavailable Disk
- ALM-12064 Host Random Port Range Conflicts with Cluster Used Port
- ALM-12066 Trust Relationships Between Nodes Become Invalid
- ALM-12067 Abnormal Tomcat Resources of Manager
- ALM-12068 Abnormal ACS Resources of Manager
- ALM-12069 Abnormal AOS Resources of Manager
- ALM-12070 Controller Resource Is Abnormal
- ALM-12071 Httpd Resource Is Abnormal
- ALM-12072 FloatIP Resource Is Abnormal
- ALM-12073 CEP Resource Is Abnormal
- ALM-12074 FMS Resource Is Abnormal
- ALM-12075 PMS Resource Is Abnormal
- ALM-12076 GaussDB Resource Is Abnormal
- ALM-12077 User omm Expired
- ALM-12078 Password of User omm Expired
- ALM-12079 User omm Is About to Expire
- ALM-12080 Password of User omm Is About to Expire
- ALM-12081User ommdba Expired
- ALM-12082 User ommdba Is About to Expire
- ALM-12083 Password of User ommdba Is About to Expire
- ALM-12084 Password of User ommdba Expired
- ALM-12085 Service Audit Log Dump Failure
- ALM-12087 System Is in the Upgrade Observation Period
- ALM-12089 Network Connections Between Nodes Are Abnormal
- ALM-12099 Core Dump for Cluster Processes
- ALM-12101 AZ Unhealthy
- ALM-12102 AZ HA Component Is Not Deployed Based on DR Requirements
- ALM-12110 Failed to get ECS temporary AK/SK
- ALM-12180 Suspended Disk I/O
- ALM-12190 Number of Knox Connections Exceeds the Threshold
- ALM-12191 Disk I/O Usage Exceeds the Threshold
- ALM-12192 Host Load Exceeds the Threshold
- ALM-12200 Password Is About to Expire
- ALM-12201 Process CPU Usage Exceeds the Threshold
- ALM-12202 Process Memory Usage Exceeds the Threshold
- ALM-12203 Process Full GC Duration Exceeds the Threshold
- ALM-12204 Wait Duration of a Disk Read Exceeds the Threshold
- ALM-12205 Wait Duration of a Disk Write Exceeds the Threshold
- ALM-12206 Password Has Expired
- ALM-13000 ZooKeeper Service Unavailable
- ALM-13001 Available ZooKeeper Connections Are Insufficient
- ALM-13002 ZooKeeper Direct Memory Usage Exceeds the Threshold
- ALM-13003 GC Duration of the ZooKeeper Process Exceeds the Threshold
- ALM-13004 ZooKeeper Heap Memory Usage Exceeds the Threshold
- ALM-13005 Failed to Set the Quota of Top Directories of ZooKeeper Components
- ALM-13006 Znode Number or Capacity Exceeds the Threshold
- ALM-13007 Available ZooKeeper Client Connections Are Insufficient
- ALM-13008 ZooKeeper Znode Usage Exceeds the Threshold
- ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold
- ALM-13010 Znode Usage of a Directory with Quota Configured Exceeds the Threshold
- ALM-14000 HDFS Service Unavailable
- ALM-14001 HDFS Disk Usage Exceeds the Threshold
- ALM-14002 DataNode Disk Usage Exceeds the Threshold
- ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold
- ALM-14006 Number of HDFS Files Exceeds the Threshold
- ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold
- ALM-14008 DataNode Heap Memory Usage Exceeds the Threshold
- ALM-14009 Number of Dead DataNodes Exceeds the Threshold
- ALM-14010 NameService Service Is Abnormal
- ALM-14011 DataNode Data Directory Is Not Configured Properly
- ALM-14012 JournalNode Is Out of Synchronization
- ALM-14013 Failed to Update the NameNode FsImage File
- ALM-14014 NameNode GC Time Exceeds the Threshold
- ALM-14015 DataNode GC Time Exceeds the Threshold
- ALM-14016 DataNode Direct Memory Usage Exceeds the Threshold
- ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold
- ALM-14018 NameNode Non-heap Memory Usage Exceeds the Threshold
- ALM-14019 DataNode Non-heap Memory Usage Exceeds the Threshold
- ALM-14020 Number of Entries in the HDFS Directory Exceeds the Threshold
- ALM-14021 NameNode Average RPC Processing Time Exceeds the Threshold
- ALM-14022 NameNode Average RPC Queuing Time Exceeds the Threshold
- ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold
- ALM-14024 Tenant Space Usage Exceeds the Threshold
- ALM-14025 Tenant File Object Usage Exceeds the Threshold
- ALM-14026 Blocks on DataNode Exceed the Threshold
- ALM-14027 DataNode Disk Fault
- ALM-14028 Number of Blocks to Be Supplemented Exceeds the Threshold
- ALM-14029 Number of Blocks in a Replica Exceeds the Threshold
- ALM-14030 HDFS Allows Write of Single-Replica Data
- ALM-14031 DataNode Process Is Abnormal
- ALM-14032 JournalNode Process Is Abnormal
- ALM-14033 ZKFC Process Is Abnormal
- ALM-14034 Router Process Is Abnormal
- ALM-14035 HttpFS Process Is Abnormal
- ALM-16000 Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold
- ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold
- ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold
- ALM-16003 Background Thread Usage Exceeds the Threshold
- ALM-16004 Hive Service Unavailable
- ALM-16005 The Heap Memory Usage of the Hive Process Exceeds the Threshold
- ALM-16006 The Direct Memory Usage of the Hive Process Exceeds the Threshold
- ALM-16007 Hive GC Time Exceeds the Threshold
- ALM-16008 Non-Heap Memory Usage of the Hive Process Exceeds the Threshold
- ALM-16009 Map Number Exceeds the Threshold
- ALM-16045 Hive Data Warehouse Is Deleted
- ALM-16046 Hive Data Warehouse Permission Is Modified
- ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
- ALM-16048 Tez or Spark Library Path Does Not Exist
- ALM-16051 Percentage of Sessions Connected to MetaStore Exceeds the Threshold
- ALM-17003 Oozie Service Unavailable
- ALM-17004 Oozie Heap Memory Usage Exceeds the Threshold
- ALM-17005 Oozie Non Heap Memory Usage Exceeds the Threshold
- ALM-17006 Oozie Direct Memory Usage Exceeds the Threshold
- ALM-17007 Garbage Collection (GC) Time of the Oozie Process Exceeds the Threshold
- ALM-17008 Abnormal Connection Between Oozie and ZooKeeper
- ALM-17009 Abnormal Connection Between Oozie and DBService
- ALM-17010 Abnormal Connection Between Oozie and HDFS
- ALM-17011 Abnormal Connection Between Oozie and Yarn
- ALM-18000 Yarn Service Unavailable
- ALM-18002 NodeManager Heartbeat Lost
- ALM-18003 NodeManager Unhealthy
- ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold
- ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold
- ALM-18010 ResourceManager GC Time Exceeds the Threshold
- ALM-18011 NodeManager GC Time Exceeds the Threshold
- ALM-18012 JobHistoryServer GC Time Exceeds the Threshold
- ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold
- ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold
- ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold
- ALM-18016 Non Heap Memory Usage of ResourceManager Exceeds the Threshold
- ALM-18017 Non Heap Memory Usage of NodeManager Exceeds the Threshold
- ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold
- ALM-18019 Non Heap Memory Usage of JobHistoryServer Exceeds the Threshold
- ALM-18020 Yarn Task Execution Timeout
- ALM-18021 Mapreduce Service Unavailable
- ALM-18022 Insufficient YARN Queue Resources
- ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold
- ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold
- ALM-18025 Number of Terminated Yarn Tasks Exceeds the Threshold
- ALM-18026 Number of Failed Yarn Tasks Exceeds the Threshold
- ALM-19000 HBase Service Unavailable
- ALM-19006 HBase Replication Sync Failed
- ALM-19007 HBase GC Time Exceeds the Threshold
- ALM-19008 Heap Memory Usage of the HBase Process Exceeds the Threshold
- ALM-19009 Direct Memory Usage of the HBase Process Exceeds the Threshold
- ALM-19011 RegionServer Region Number Exceeds the Threshold
- ALM-19012 HBase System Table Directory or File Lost
- ALM-19013 Duration of Regions in transaction State Exceeds the Threshold
- ALM-19014 Capacity Quota Usage on ZooKeeper Exceeds the Threshold Severely
- ALM-19015 Quantity Quota Usage on ZooKeeper Exceeds the Threshold
- ALM-19016 Quantity Quota Usage on ZooKeeper Exceeds the Threshold Severely
- ALM-19017 Capacity Quota Usage on ZooKeeper Exceeds the Threshold
- ALM-19018 HBase Compaction Queue Size Exceeds the Threshold
- ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold
- ALM-19020 Number of HBase WAL Files to Be Synchronized Exceeds the Threshold
- ALM-19022 HBase Hotspot Detection Is Unavailable
- ALM-19023 Region Traffic Restriction for HBase
- ALM-19024 RPC Requests P99 Latency on RegionServer Exceeds the Threshold
- ALM-19025 Damaged StoreFile in HBase
- ALM-19026 Damaged WAL Files in HBase
- ALM-19030 P99 Latency of RegionServer RPC Request Exceeds the Threshold
- ALM-19031 Number of RegionServer RPC Connections Exceeds the Threshold
- ALM-19032 Number of Tasks in the RegionServer RPC Write Queue Exceeds the Threshold
- ALM-19033 Number of Tasks in the RegionServer RPC Read Queue Exceeds the Threshold
- ALM-19034 Number of RegionServer WAL Write Timesouts Exceeds the Threshold
- ALM-19035 Size of the RegionServer Call Queue Exceeds the Threshold
- ALM-20002 Hue Service Unavailable
- ALM-23001 Loader Service Unavailable
- ALM-23003 Loader Task Execution Failed
- ALM-23004 Loader Heap Memory Usage Exceeds the Threshold
- ALM-23005 Loader Non-Heap Memory Usage Exceeds the Threshold
- ALM-23006 Loader Direct Memory Usage Exceeds the Threshold
- ALM-23007 GC Duration of the Loader Process Exceeds the Threshold
- ALM-24000 Flume Service Unavailable
- ALM-24001 Flume Agent Exception
- ALM-24003 Flume Client Connection Interrupted
- ALM-24004 Exception Occurs When Flume Reads Data
- ALM-24005 Exception Occurs When Flume Transmits Data
- ALM-24006 Heap Memory Usage of Flume Server Exceeds the Threshold
- ALM-24007 Flume Server Direct Memory Usage Exceeds the Threshold
- ALM-24008 Flume Server Non Heap Memory Usage Exceeds the Threshold
- ALM-24009 Flume Server Garbage Collection (GC) Duration Exceeds the Threshold
- ALM-24010 Flume Certificate File Is Invalid or Damaged
- ALM-24011 Flume Certificate File Is About to Expire
- ALM-24012 Flume Certificate File Has Expired
- ALM-24013 Flume MonitorServer Certificate File Is Invalid or Damaged
- ALM-24014 Flume MonitorServer Certificate Is About to Expire
- ALM-24015 Flume MonitorServer Certificate File Has Expired
- ALM-25000 LdapServer Service Unavailable
- ALM-25004 Abnormal LdapServer Data Synchronization
- ALM-25005 nscd Service Exception
- ALM-25006 Sssd Service Exception
- ALM-25500 KrbServer Service Unavailable
- ALM-25501 Too Many KerberosServer Requests
- ALM-27001 DBService Is Unavailable
- ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
- ALM-27004 Data Inconsistency Between Active and Standby DBServices
- ALM-27005 Database Connection Usage Exceeds the Threshold
- ALM-27006 Data Directory Disk Usage Exceeds the Threshold
- ALM-27007 Database Enters the Read-Only Mode
- ALM-33004 BLU Instance Health Status of Containers Is Abnormal
- ALM-33005 Maximum Number of Concurrent Containers Requests Exceeds the Threshold
- ALM-33006 Failure Rate of Containers Calls Exceeds the Threshold
- ALM-33007 ALB TPS of Containers Exceeds the Threshold
- ALM-33008 Average Latency of Containers Exceeds the Threshold
- ALM-33009 Containers Heap Memory Usage Exceeds the Threshold
- ALM-33010 Containers Non-Heap Memory Usage Exceeds the Threshold
- ALM-33011 Containers Metaspace Usage Exceeds the Threshold
- ALM-33012 Containers' ZooKeeper Client Is Disconnected
- ALM-38000 Kafka Service Unavailable
- ALM-38001 Insufficient Kafka Disk Space
- ALM-38002 Kafka Heap Memory Usage Exceeds the Threshold
- ALM-38004 Kafka Direct Memory Usage Exceeds the Threshold
- ALM-38005 GC Duration of the Broker Process Exceeds the Threshold
- ALM-38006 Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold
- ALM-38007 Status of Kafka Default User Is Abnormal
- ALM-38008 Abnormal Kafka Data Directory Status
- ALM-38009 Busy Broker Disk I/Os
- ALM-38010 Topics with Single Replica
- ALM-38011 User Connection Usage on Broker Exceeds the Threshold
- ALM-41007 RTDService Unavailable
- ALM-43001 Spark Service Unavailable
- ALM-43006 Heap Memory Usage of the JobHistory Process Exceeds the Threshold
- ALM-43007 Non-Heap Memory Usage of the JobHistory Process Exceeds the Threshold
- ALM-43008 Direct Memory Usage of the JobHistory Process Exceeds the Threshold
- ALM-43009 JobHistory Process GC Duration Exceeds the Threshold
- ALM-43010 Heap Memory Usage of the JDBCServer Process Exceeds the Threshold
- ALM-43011 Non-Heap Memory Usage of the JDBCServer Process Exceeds the Threshold
- ALM-43012 Direct Memory Usage of the JDBCServer Process Exceeds the Threshold
- ALM-43013 JDBCServer Process GC Duration Exceeds the Threshold
- ALM-43017 JDBCServer Process Full GC Times Exceeds the Threshold
- ALM-43018 JobHistory Process Full GC Times Exceeds the Threshold
- ALM-43019 Heap Memory Usage of the IndexServer Process Exceeds the Threshold
- ALM-43020 Non-Heap Memory Usage of the IndexServer Process Exceeds the Threshold
- ALM-43021 Direct Memory Usage of the IndexServer Process Exceeds the Threshold
- ALM-43022 IndexServer Process GC Time Exceeds the Threshold
- ALM-43023 IndexServer Process Full GC Number Exceeds the Threshold
- ALM-43200 Elasticsearch Service Unavailable
- ALM-43201 Heap Memory Usage of Elasticsearch Exceeds the Threshold
- ALM-43202 Indices in the Yellow State Exist in Elasticsearch
- ALM-43203 Indices in the Red State Exist in Elasticsearch
- ALM-43204 GC Duration of the Elasticsearch Process Exceeds the Threshold
- ALM-43205 Elasticsearch Stored Shard Data Volume Exceeds the Threshold
- ALM-43206 Elasticsearch Shard Document Number Exceeds the Threshold
- ALM-43207 Elasticsearch Has Indexes Without Replicas
- ALM-43208 Elasticsearch Data Directory Usage Exceeds the Threshold
- ALM-43209 Total Number of Elasticsearch Instance Shards Exceeds the Threshold
- ALM-43210 Total Number of Elasticsearch Shards Exceeds the Threshold
- ALM-43600 GraphBase Service Unavailable
- ALM-43605 Number of Real-Time Requests on a GraphBase Node Exceeds the Threshold
- ALM-43607 Nginx Fault in GraphBase
- ALM-43608 Floating IP Address of GraphBase Is Faulty
- ALM-43609 TaskManager of GraphBase Is Faulty
- ALM-43610 GC Time of the Old-Generation GraphServer Process Exceeds the Threshold
- ALM-43611 Number of GC Times of the Old-Generation GraphServer Process Exceeds the Threshold
- ALM-43612 GC Duration of the Young-Generation GraphServer Process Exceeds the Threshold
- ALM-43613 Number of GC Times of the Young-Generation GraphServer Process Exceeds the Threshold
- ALM-43614 Time Spent on a GraphBase Path Query Request Exceeds the Threshold
- ALM-43615 Time Spent on a Line Expansion Query Request in GraphBase Exceeds the Threshold
- ALM-43616 GraphBase-related Yarn Jobs Are Abnormal
- ALM-43617 Number of Waiting Queues for Real-Time Data Import to GraphBase Exceeds the Threshold
- ALM-43618 GraphServer Heap Memory Usage Exceeds the Threshold
- ALM-43619 Invalid GraphBase HA Certificate Files
- ALM-43620 GraphBase HA Certificates Are About to Expire
- ALM-43621 GraphBase HA Certificate Files Have Expired
- ALM-43850 KMS Service Unavailable
- ALM-45000 HetuEngine Service Unavailable
- ALM-45001 Faulty HetuEngine Compute Instances
- ALM-45003 HetuEngine QAS Disk Capacity Is Insufficient
- ALM-45004 Tasks Stacked on HetuEngine Compute Instance
- ALM-45005 CPU Usage of HetuEngine Compute Instance Exceeded the Threshold
- ALM-45006 Memory Usage of a HetuEngine Compute Instance Exceeded the Threshold
- ALM-45007 Number of Workers of a HetuEngine Compute Instance Is Less Than the Threshold
- ALM-45191 Failed to Obtain ECS Metadata
- ALM-45192 Failed to Obtain the IAM Security Token
- ALM-45275 Ranger Service Unavailable
- ALM-45276 Abnormal RangerAdmin Status
- ALM-45277 RangerAdmin Heap Memory Usage Exceeds the Threshold
- ALM-45278 RangerAdmin Direct Memory Usage Exceeds the Threshold
- ALM-45279 RangerAdmin Non-Heap Memory Usage Exceeds the Threshold
- ALM-45280 RangerAdmin GC Duration Exceeds the Threshold
- ALM-45281 UserSync Heap Memory Usage Exceeds the Threshold
- ALM-45282 UserSync Direct Memory Usage Exceeds the Threshold
- ALM-45283 UserSync Non-Heap Memory Usage Exceeds the Threshold
- ALM-45284 UserSync Garbage Collection (GC) Time Exceeds the Threshold
- ALM-45285 TagSync Heap Memory Usage Exceeds the Threshold
- ALM-45286 TagSync Direct Memory Usage Exceeds the Threshold
- ALM-45287 TagSync Non-Heap Memory Usage Exceeds the Threshold
- ALM-45288 TagSync Garbage Collection (GC) Time Exceeds the Threshold
- ALM-45289 PolicySync Heap Memory Usage Exceeds the Threshold
- ALM-45290 PolicySync Direct Memory Usage Exceeds the Threshold
- ALM-45291 PolicySync Non-Heap Memory Usage Exceeds the Threshold
- ALM-45292 PolicySync GC Duration Exceeds the Threshold
- ALM-45293 Ranger User Synchronization Exception
- ALM-45425 ClickHouse Service Unavailable
- ALM-45426 ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold
- ALM-45427 ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold
- ALM-45428 ClickHouse Disk I/O Exception
- ALM-45429 Table Metadata Synchronization Failed on the Added ClickHouse Node
- ALM-45430 Permission Metadata Synchronization Failed on the Added ClickHouse Node
- ALM-45434 A Single Replica Exists in the ClickHouse Data Table
- ALM-45440 Inconsistency Between ClickHouse Replicas
- ALM-45441 Zookeeper Disconnected
- ALM-45442 Too Many Concurrent SQL Statements
- ALM-45443 Slow SQL Queries in the Cluster
- ALM-45444 Abnormal ClickHouse Process
- ALM-45445 Failed to Send Data Files to Remote Shards When ClickHouse Writes Data to a Distributed Table
- ALM-45446 Mutation Task of ClickHouse Is Not Complete for a Long Time
- ALM-45585 IoTDB Service Unavailable
- ALM-45586 IoTDBServer Heap Memory Usage Exceeds the Threshold
- ALM-45587 IoTDBServer GC Duration Exceeds the Threshold
- ALM-45588 IoTDBServer Direct Memory Usage Exceeds the Threshold
- ALM-45589 ConfigNode Heap Memory Usage Exceeds the Threshold
- ALM-45590 ConfigNode GC Duration Exceeds the Threshold
- ALM-45591 ConfigNode Direct Memory Usage Exceeds the Threshold
- ALM-45592 IoTDBServer RPC Execution Duration Exceeds the Threshold
- ALM-45593 IoTDBServer Flush Execution Duration Exceeds the Threshold
- ALM-45594 IoTDBServer Intra-Space Merge Duration Exceeds the Threshold
- ALM-45595 IoTDBServer Cross-Space Merge Duration Exceeds the Threshold
- ALM-45596 Procedure Execution Failed
- ALM-45615 CDL Service Unavailable
- ALM-45616 CDL Job Execution Exception
- ALM-45617 Data Queued in the CDL Replication Slot Exceeds the Threshold
- ALM-45635 FlinkServer Job Execution Failure
- ALM-45636 Number of Consecutive Checkpoint Failures of a Flink Job Exceeds the Threshold
- ALM-45637 Continuous Back Pressure Time of a Flink Job Exceeds the Threshold
- ALM-45638 Number of Restarts After Flink Job Failures Exceeds the Threshold
- ALM-45639 Checkpointing of a Flink Job Times Out
- ALM-45640 FlinkServer Heartbeat Interruption Between the Active and Standby Nodes
- ALM-45641 Data Synchronization Exception Between the Active and Standby FlinkServer Nodes
- ALM-45642 RocksDB Continuously Triggers Write Traffic Limiting
- ALM-45643 MemTable Size of RocksDB Continuously Exceeds the Threshold
- ALM-45644 Number of SST Files at Level 0 of RocksDB Continuously Exceeds the Threshold
- ALM-45645 Pending Flush Size of RocksDB Continuously Exceeds the Threshold
- ALM-45646 Pending Compaction Size of RocksDB Continuously Exceeds the Threshold
- ALM-45647 Estimated Pending Compaction Size of RocksDB Continuously Exceeds the Threshold
- ALM-45648 RocksDB Frequently Encounters Write-Stopped
- ALM-45649 P95 Latency of RocksDB Get Requests Continuously Exceeds the Threshold
- ALM-45650 P95 Latency of RocksDB Write Requests Continuously Exceeds the Threshold
- ALM-45652 Flink Service Unavailable
- ALM-45653 Invalid Flink HA Certificate File
- ALM-45654 Flink HA Certificate Is About to Expire
- ALM-45655 Flink HA Certificate File Has Expired
- ALM-45736 Guardian Service Unavailable
- ALM-45737 Guardian TokenServer Heap Memory Usage Exceeds the Threshold
- ALM-45738 Guardian TokenServer Direct Memory Usage Exceeds the Threshold
- ALM-45739 Guardian TokenServer Non-Heap Memory Usage Exceeds the Threshold
- ALM-45740 Guardian TokenServer GC Duration Exceeds the Threshold
- ALM-45741 Guardian Failed to Call the ECS securitykey API
- ALM-45742 Guardian Failed to Call the ECS Metadata API
- ALM-45743 Guardian Failed to Call the IAM API
- ALM-46001 MOTService Unavailable
- ALM-46003 MOTService Heartbeat Interruption Between the Active and Standby Nodes
- ALM-46004 Data Inconsistency Between Active and Standby MOTService Nodes
- ALM-46005 MOTService Database Connection Usage Exceeds the Threshold
- ALM-46006 Disk Space Usage of the MOTService Data Directory Exceeds the Threshold
- ALM-46007 MOTService Database Enters the Read-Only Mode
- ALM-46008 MOTService Memory Usage Exceeds the Threshold
- ALM-46009 MOTService CPU Usage Exceeds the Threshold
- ALM-46010 MOTService Certificate File Is About to Expire
- ALM-46011 MOTService Certificate File Has Expired
- ALM-46012 Abnormal Nginx of MOTService
- ALM-47000 MemArtsCC Instance Unavailable
- ALM-47002 MemArtsCC Disk Fault
- ALM-50201 Doris Service Unavailable
- ALM-50202 FE CPU Usage Exceeds the Threshold
- ALM-50203 FE Memory Usage Exceeds the Threshold
- ALM-50205 BE CPU Usage Exceeds the Threshold
- ALM-50206 BE Memory Usage Exceeds the Threshold
- ALM-50207 Ratio of Connections to the FE MySQL Port to the Maximum Connections Allowed Exceeds the Threshold
- ALM-50208 Failures to Clear Historical Metadata Image Files Exceed the Threshold
- ALM-50209 Failures to Generate Metadata Image Files Exceed the Threshold
- ALM-50210 Maximum Compaction Score of All BE Nodes Exceeds the Threshold
- ALM-50211 FE Queue Length of BE Periodic Report Tasks Exceeds the Threshold
- ALM-50212 Accumulated Old-Generation GC Duration of the FE Process Exceeds the Threshold
- ALM-50213 Number of Tasks Queuing in the FE Thread Pool for Interacting with BE Exceeds the Threshold
- ALM-50214 Number of Tasks Queuing in the FE Thread Pool for Task Processing Exceeds the Threshold
- ALM-50215 Longest Duration of RPC Requests Received by Each FE Thrift Method Exceeds the Threshold
- ALM-50216 Memory Usage of the FE Node Exceeds the Threshold
- ALM-50217 Heap Memory Usage of the FE Node Exceeds the Threshold
- ALM-50219 Length of the Queue in the Thread Pool for Query Execution Exceeds the Threshold
- ALM-50220 Error Rate of TCP Packet Receiving Exceeds the Threshold
- ALM-50221 BE Data Disk Usage Exceeds the Threshold
- ALM-50222 Disk Status of a Specified Data Directory on BE Is Abnormal
- ALM-50223 Maximum Memory Required by BE Is Greater Than the Remaining Memory of the Machine
- ALM-50224 Failures a Certain Task Type on BE Are Increasing
- ALM-50225 Unavailable FE Instances
- ALM-50226 Unavailable BE Instances
- ALM-50227 Concurrent Doris Tenant Queries Exceeds the Threshold
- ALM-50228 Memory Usage of a Doris Tenant Exceeds the Threshold
- ALM-50229 Doris FE Failed to Connect to OBS
- ALM-50230 Doris BE Cannot Connect to OBS
- ALM-50401 Number of JobServer Waiting Tasks Exceeds the Threshold
- ALM-50402 JobGateway Service Unavailable
- ALM-51201 LakeSearch Unavailable
- ALM-51202 LakeSearch Heap Memory Usage Exceeds the Threshold
- ALM-51203 GC Duration of the LakeSearch Instance Exceeds the Threshold
- Security Description
- High-Risk Operations
- Interconnecting Jupyter Notebook with MRS Using Custom Python
-
FAQs
- Client Usage
- Web Page Access
- Alarm Monitoring
- Performance Tuning
-
Job Development
- How Do I Get My Data into OBS or HDFS?
- What Types of Spark Jobs Can Be Submitted in a Cluster?
- Can I Run Multiple Spark Tasks at the Same Time After the Minimum Tenant Resources of an MRS Cluster Is Changed to 0?
- What Are the Differences Between the Client Mode and Cluster Mode of Spark Jobs?
- How Do I View MRS Job Logs?
- How Do I Do If the Message "The current user does not exist on MRS Manager. Grant the user sufficient permissions on IAM and then perform IAM user synchronization on the Dashboard tab page." Is Displayed?
- LauncherJob Job Execution Is Failed And the Error Message "jobPropertiesMap is null." Is Displayed
- How Do I Do If the Flink Job Status on the MRS Console Is Inconsistent with That on Yarn?
- How Do I Do If a SparkStreaming Job Fails After Being Executed Dozens of Hours and the OBS Access 403 Error is Reported?
- How Do I Do If an Alarm Is Reported Indicating that the Memory Is Insufficient When I Execute a SQL Statement on the ClickHouse Client?
- Why Submitted Yarn Job Cannot Be Viewed on the Web UI?
- How Do I Modify the HDFS NameSpace (fs.defaultFS) of an Existing Cluster?
- How Do I Do If the launcher-job Queue Is Stopped by YARN due to Insufficient Heap Size When I Submit a Flink Job on the Management Plane?
- Cluster Upgrade/Patching
- Cluster Access
-
Big Data Service Development
- Can MRS Run Multiple Flume Tasks at a Time?
- How Do I Change FlumeClient Logs to Standard Logs?
- Where Are the .jar Files and Environment Variables of Hadoop Located?
- What Compression Algorithms Does HBase Support?
- Can MRS Write Data to HBase Through the HBase External Table of Hive?
- How Do I View HBase Logs?
- How Do I Set the TTL for an HBase Table?
- How Do I Balance HDFS Data?
- How Do I Change the Number of HDFS Replicas?
- How Do I Modify the HDFS Active/Standby Switchover Class?
- What Is the Recommended Number Type of DynamoDB in Hive Tables?
- Can the Hive Driver Be Interconnected with DBCP2?
- Can I Export the Query Result of Hive Data?
- How Do I Do If an Error Occurs When Hive Runs the beeline -e Command to Execute Multiple Statements?
- How Do I Do If a "hivesql/hivescript" Job Fails to Submit After Hive Is Added?
- How Do I Reset Kafka Data?
- How Do I Obtain the Client Version of MRS Kafka?
- What Access Protocols Are Supported by Kafka?
- How Do I Do If Error Message "Not Authorized to access group xxx" Is Displayed When a Kafka Topic Is Consumed?
- What Are the Differences Between Sample Project Building and Application Development? Is Python Code Supported?
- How Do I Connect to Spark Shell from MRS?
- How Do I Connect to Spark Beeline from MRS?
- Where Are the Execution Logs of Spark Jobs Stored?
- How Do I Specify a Log Path When Submitting a Task in an MRS Storm Cluster?
- How Do I Check Whether the ResourceManager Configuration of Yarn Is Correct?
- How Do I Modify the allow_drop_detached Parameter of ClickHouse?
- API
-
Cluster Management
- How Do I View All Clusters?
- How Do I View Log Information?
- How Do I View Cluster Configuration Information?
- How Do I Install Kafka and Flume in an MRS Cluster?
- How Do I Stop an MRS Cluster?
- Can I Change MRS Cluster Nodes on the MRS Console?
- How Do I Shield Cluster Alarm/Event Notifications?
- Why Is the Resource Pool Memory Displayed in the MRS Cluster Smaller Than the Actual Cluster Memory?
- How Do I Configure the knox Memory?
- What Is the Python Version Installed for an MRS Cluster?
- How Do I View the Configuration File Directory of Each Component?
- How Do I Do If the Time on MRS Nodes Is Incorrect?
- How Do I Do If Trust Relationships Between Nodes Are Abnormal?
- How Do I Adjust the Memory Size of the manager-executor Process?
-
Kerberos Usage
- How Do I Change the Kerberos Authentication Status of a Created MRS Cluster?
- What Are the Ports of the Kerberos Authentication Service?
- How Do I Deploy the Kerberos Service in a Running Cluster?
- How Do I Access Hive in a Cluster with Kerberos Authentication Enabled?
- How Do I Access Spark in a Cluster with Kerberos Authentication Enabled?
- How Do I Prevent Kerberos Authentication Expiration?
- Metadata Management
-
Troubleshooting
- Accessing the Web Pages
-
Cluster Management
- Replacing a Disk in an MRS Cluster
- MRS Backup Failure
- Inconsistency Between df and du Command Output on the Core Node
- Disassociating a Subnet from the ACL Network
- MRS Becomes Abnormal After hostname Modification
- DataNode Restarts Unexpectedly
- Network Is Unreachable When Using pip3 to Install the Python Package in an MRS Cluster
- Failed to Download the MRS Cluster Client
- Scale-Out Failure
- Error Occurs When MRS Executes the Insert Command Using Beeline
- Using CDM to Migrate Data to HDFS
- Alarms Are Frequently Generated in the MRS Cluster
- Memory Usage of the PMS Process Is High
- High Memory Usage of the Knox Process
- It Takes a Long Time to Access HBase from a Client Installed on a Node Outside the Security Cluster
- How Do I Locate a Job Submission Failure?
- OS Disk Space Is Insufficient Due to Oversized HBase Log Files
- Using ClickHouse
-
Using DBService
- DBServer Instance Is in Abnormal Status
- DBServer Instance Remains in the Restoring State
- Default Port 20050 or 20051 Is Occupied
- DBServer Instance Is Always in the Restoring State Because the Incorrect /tmp Directory Permission
- DBService Backup Failure
- Components Failed to Connect to DBService in Normal State
- DBServer Failed to Start
- DBService Backup Failed Because the Floating IP Address Is Unreachable
- DBService Failed to Start Due to the Loss of the DBService Configuration File
-
Using Flink
- "IllegalConfigurationException: Error while parsing YAML configuration file: "security.kerberos.login.keytab" Is Displayed When a Command Is Executed on an Installed Client
- "IllegalConfigurationException: Error while parsing YAML configuration file" Is Displayed When a Command Is Executed After Configurations of the Installed Client Are Changed
- The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
- Failed to Create a Cluster by Executing the yarn-session Command When a Different User Is Used
- Flink Service Program Fails to Read Files on the NFS Disk
- Using Flume
-
Using HBase
- Slow Response to HBase Connection
- RegionServer Failed to Start Because the Port Is Occupied
- HBase Failed to Start Due to Insufficient Node Memory
- HBase Failed to Start Due to Inappropriate Parameter Settings
- RegionServer Failed to Start Due to Residual Processes
- HBase Failed to Start Due to a Quota Set on HDFS
- HBase Failed to Start Due to Corrupted Version Files
- High CPU Usage Caused by Zero-Loaded RegionServer
- HBase Failed to Started with "FileNotFoundException" in RegionServer Logs
- The Number of RegionServers Displayed on the Native Page Is Greater Than the Actual Number After HBase Is Started
- RegionServer Instance Is in the Restoring State
- HBase Failed to Start in a Newly Installed Cluster
- HBase Failed to Start Due to the Loss of the ACL Table Directory
- HBase Failed to Start After the Cluster Is Powered Off and On
- Failed to Import HBase Data Due to Oversized File Blocks
- Failed to Load Data to the Index Table After an HBase Table Is Created Using Phoenix
-
Using HDFS
- All NameNodes Become the Standby State After the NameNode RPC Port of HDFS Is Changed
- An Error Is Reported When the HDFS Client Is Used After the Host Is Connected Using a Public Network IP Address
- Failed to Use Python to Remotely Connect to the Port of HDFS
- An Error Is Reported During HDFS and Yarn Startup
- HDFS Permission Setting Error
- A DataNode of HDFS Is Always in the Decommissioning State
- HDFS Failed to Start Due to Insufficient Memory
- A Large Number of Blocks Are Lost in HDFS due to the Time Change Using ntpdate
- CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
- Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
- Common File Read/Write Faults
- Maximum Number of File Handles Is Set to a Too Small Value, Causing File Reading and Writing Exceptions
- File Fails to Be Uploaded to HDFS Due to File Errors
- After dfs.blocksize Is Configured and Data Is Put, Block Size Remains Unchanged
- Failed to Read Files, and "FileNotFoundException" Is Displayed
- Failed to Write Files to HDFS, and "item limit of / is exceeded" Is Displayed
- Adjusting the Log Level of the Shell Client
- File Read Fails, and "No common protection layer" Is Displayed
- Failed to Write Files Because the HDFS Directory Quota Is Insufficient
- Balancing Fails, and "Source and target differ in block-size" Is Displayed
- A File Fails to Be Queried or Deleted, and the File Can Be Viewed in the Parent Directory (Invisible Characters)
- Uneven Data Distribution Due to Non-HDFS Data Residuals
- Uneven Data Distribution Due to the Client Installation on the DataNode
- Handling Unbalanced DataNode Disk Usage on Nodes
- Locating Common Balance Problems
- An Error Is Reported When the HDFS Client Is Installed on the Core Node in a Common Cluster
- Client Installed on a Node Outside the Cluster Fails to Upload Files Using hdfs
- Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
-
Using Hive
- Content Recorded in Hive Logs
- Causes of Hive Startup Failure
- How to Specify a Queue When Hive Submits a Job
- How to Set Map and Reduce Memory on the Client
- Specifying the Output File Compression Format When Importing a Table
- desc Table Cannot Be Completely Displayed
- NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
- A Newly Created User Has No Query Permissions
- An Error Is Reported When SQL Is Executed to Submit a Task to a Specified Queue
- An Error Is Reported When the "load data inpath" Command Is Executed
- An Error Is Reported When the "load data local inpath" Command Is Executed
- An Error Is Reported When the "create external table" Command Is Executed
- An Error Is Reported When the dfs -put Command Is Executed on the Beeline Client
- Insufficient Permissions to Execute the set role admin Command
- An Error Is Reported When UDF Is Created Using Beeline
- Difference Between Hive Service Health Status and Hive Instance Health Status
- Hive Alarms and Triggering Conditions
- "authentication failed" Is Displayed During an Attempt to Connect to the Shell Client
- Failed to Access ZooKeeper from the Client
- "Invalid function" Is Displayed When a UDF Is Used
- Hive Service Status Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Unknown
- Health Status of a HiveServer or MetaStore Instance Is Concerning
- Garbled Characters Returned upon a select Query If Text Files Are Compressed Using ARC4
- Hive Task Failed to Run on the Client But Successful on Yarn
- An Error Is Reported When the select Statement Is Executed
- Failed to Drop a Large Number of Partitions
- Failed to Start a Local Task
- Failed to Start WebHCat
- Sample Code Error for Hive Secondary Development After Domain Switching
- MetaStore Exception Occurs When the Number of DBService Connections Exceeds the Upper Limit
- "Failed to execute session hooks: over max connections" Reported by Beeline
- beeline Reports the "OutOfMemoryError" Error
- Task Execution Fails Because the Input File Number Exceeds the Threshold
- Task Execution Fails Because of Stack Memory Overflow
- Task Failed Due to Concurrent Writes to One Table or Partition
- Failed to Load Data to Hive Tables
- HiveServer and HiveHCat Process Faults
- An Error Occurs When the INSERT INTO Statement Is Executed on Hive But the Error Message Is Unclear
- Timeout Reported When Adding the Hive Table Field
- Failed to Restart the Hive Service
- Hive Failed to Delete a Table
- An Error Is Reported When msck repair table table_name Is Run on Hive
- Using Hue
-
Using Kafka
- An Error Is Reported When Kafka Is Run to Obtain a Topic
- Flume Normally Connects to Kafka But Fails to Send Messages
- Producer Failed to Send Data and Threw "NullPointerException"
- Producer Fails to Send Data and "TOPIC_AUTHORIZATION_FAILED" Is Thrown
- Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
- Consumer Is Initialized Successfully, But the Specified Topic Message Cannot Be Obtained from Kafka
- Consumer Fails to Consume Data and Remains in the Waiting State
- Consumer Fails to Consume Data in a Newly Created Cluster, and the Message " GROUP_COORDINATOR_NOT_AVAILABLE" Is Displayed
- SparkStreaming Fails to Consume Kafka Messages, and the Message "Couldn't find leader offsets" Is Displayed
- Consumer Fails to Consume Data and the Message " SchemaException: Error reading field 'brokers'" Is Displayed
- Checking Whether Data Consumed by a Customer Is Lost
- Kafka Broker Reports Abnormal Processes and the Log Shows "IllegalArgumentException"
- Error "AdminOperationException" Is Displayed When a Kafka Topic Is Deleted
- When a Kafka Topic Fails to Be Created, "NoAuthException" Is Displayed
- Failed to Set an ACL for a Kafka Topic, and "NoAuthException" Is Displayed
- When a Kafka Topic Fails to Be Created, "replication factor larger than available brokers" Is Displayed
- Consumer Repeatedly Consumes Data
- Leader for the Created Kafka Topic Partition Is Displayed as none
- Safety Instructions on Using Kafka
- Obtaining Kafka Consumer Offset Information
- Adding or Deleting Configurations for a Topic
- Reading the Content of the __consumer_offsets Internal Topic
- Configuring Logs for Shell Commands on the Client
- Obtaining Topic Distribution Information
- Kafka HA Usage Description
- High Usage of Multiple Disks on a Kafka Cluster Node
- Using Oozie
-
Using Spark
- An Error Occurs When the Split Size Is Changed in a Spark Application
- An Error Is Reported When Spark Is Used
- A Spark Job Fails to Run Due to Incorrect JAR File Import
- An Error Is Reported During Spark Running
- Executor Memory Reaches the Threshold Is Displayed in Driver
- Message "Can't get the Kerberos realm" Is Displayed in Yarn-cluster Mode
- Failed to Start spark-sql and spark-shell Due to JDK Version Mismatch
- ApplicationMaster Failed to Start Twice in Yarn-client Mode
- Submission Status of the Spark Job API Is Error
- Alarm 43006 Is Repeatedly Generated in the Cluster
- Failed to Create or Delete a Table in Spark Beeline
- Failed to Connect to the Driver When a Node Outside the Cluster Submits a Spark Job to Yarn
- Large Number of Shuffle Results Are Lost During Spark Task Execution
- Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
- Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell
- Spark Task Submission Failure
- Spark Task Execution Failure
- JDBCServer Connection Failure
- Failed to View Spark Task Logs
- Authentication Fails When Spark Connects to Other Services
- Using Sqoop
-
Using Storm
- Invalid Hyperlink of Events on the Storm UI
- Failed to Submit a Topology
- Topology Submission Fails and the Message "Failed to check principle for keytab" Is Displayed
- Worker Runs Abnormally After a Topology Is Submitted and Error "Failed to bind to:host:ip" Is Displayed
- "well-known file is not secure" Is Displayed When the jstack Command Is Used to Check the Process Stack
- When the Storm-JDBC plug-in is used to develop Oracle write Bolts, data cannot be written into the Bolts.
- The GC Parameter Configured for the Service Topology Does Not Take Effect
- Internal Server Error Is Displayed When the User Queries Information on the UI
- Using Ranger
-
Using Yarn
- Plenty of Jobs Are Found After Yarn Is Started
- "GC overhead" Is Displayed on the Client When Tasks Are Submitted Using the Hadoop Jar Command
- Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn
- Temporary Files Are Not Deleted When a MapReduce Job Is Abnormal
- Failed to View Job Logs on the Yarn Web UI
- Using ZooKeeper
- Accessing OBS
- Appendix
-
Overview
-
Component Operation Guide (LTS) (Ankara Region)
-
Using CarbonData
- Overview
- Common CarbonData Parameters
- CarbonData Operation Guide
- CarbonData Performance Tuning
- CarbonData Access Control
- CarbonData Syntax Reference
- CarbonData Troubleshooting
-
CarbonData FAQ
- Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values?
- How to Avoid Minor Compaction for Historical Data?
- How to Change the Default Group Name for CarbonData Data Loading?
- Why Does INSERT INTO CARBON TABLE Command Fail?
- Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters?
- Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial Executors Is Zero?
- Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?
- Why Do I Fail to Create a Hive Table?
- How Do I Logically Split Data Across Different Namespaces?
- Why the UPDATE Command Cannot Be Executed in Spark Shell?
- How Do I Configure Unsafe Memory in CarbonData?
- Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS?
- Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed?
- Why Do Files of a Carbon Table Exist in the Recycle Bin Even If the drop table Command Is Not Executed When Mis-deletion Prevention Is Enabled?
- How Do I Restore the Latest tablestatus File That Has Been Lost or Damaged When TableStatus Versioning Is Enabled?
-
Using CDL
- Instructions for Using CDL
- Supported Data Formats
- Using CDL from Scratch
- Creating a CDL User
- Encrypting Data
- Preparing for Creating a CDL Job
-
Creating a CDL Job
- Creating a CDL Data Synchronization Job
- Creating a CDL Data Comparison Job
-
Common CDL Jobs
- Importing Data from MySQL to HDFS
- Importing Data from Oracle to HDFS
- Synchronizing Data from PgSQL to Kafka
- Synchronizing Data from Oracle to Hudi
- Synchronizing Data from MySQL to Hudi
- Synchronizing Data from PgSQL to Hudi
- Synchronizing Data from OpenGauss to Hudi
- Synchronizing drs-opengauss-json Database from ThirdKafka to Hudi
- Synchronizing drs-oracle-json Database from ThirdKafka to Hudi
- Synchronizing drs-oracle-avro Database from ThirdKafka to Hudi
- Synchronizing Open-Source Debezium JSON Data from ThirdKafka to Hudi
- Synchronizing Data from Hudi to GaussDB(DWS)
- Synchronizing Data from Hudi to ClickHouse
- DDL Operations
- Creating a CDL Job
- Common CDL Service APIs
- CDL Log Overview
-
CDL FAQs
- Error ORA-01284 Is Reported When An Oracle Job Is Started
- Hudi Does Not Receive Data After a CDL Job Is Executed
- Error 104 or 143 Is Reported After a CDL Job Runs for a Period of Time
- Error Is Reported When the Job of Capturing Data From PgSQL to Hudi Is Started
- Error 403 Is Reported When a CDL Job Is Stopped
- When Ranger Authentication Is Enabled, Why Can a User Still Perform Operations on the Tasks Created by Itself After All Permissions of the User Are Deleted?
- How Do I Capture Data from a Specified Location When a MySQL Link Task Is Started?
- Why Is the Value of Task configured for the OGG Source Different from the Actual Number of Running Tasks When Data Is Synchronized from OGG to Hudi?
- Why Are There Too Many Topic Partitions Corresponding to the CDL Synchronization Task Names?
- What Should I When a CDL Task Is Executed to Synchronize Data to the Hudi, an Error Message Indicating that the Current User Does Not Have the Permission to Create Tables in the Database Created by Another User?
- What Should I Do If a CDL Task Failed When I Perform DDL Operations?
- What Should I Do If a CDL Data Synchronization Task Fails and the YARN Task Waits for More Than 10 Minutes Before Running Again?
-
Using ClickHouse
- Using ClickHouse from Scratch
- ClickHouse Permission Management
- ClickHouse Table Engine Overview
- Creating a ClickHouse Table
-
Common ClickHouse SQL Syntax
- CREATE DATABASE: Creating a Database
- CREATE TABLE: Creating a Table
- INSERT INTO: Inserting Data into a Table
- DELETE: Lightweight Deleting Table Data
- SELECT: Querying Table Data
- ALTER TABLE: Modifying a Table Schema
- ALTER TABLE: Modifying Table Data
- DESC: Querying a Table Structure
- DROP: Deleting a Table
- SHOW: Displaying Information About Databases and Tables
- UPSERT: Writing Data
- Migrating ClickHouse Data
- Adaptive MV Usage in ClickHouse
- Configuring Interconnection Between ClickHouse and HDFS
- Configuring Interconnection Between ClickHouse and Kafka
- Configuring the Connection Between ClickHouse and Open-Source ClickHouse
- Configuring Strong Data Consistency Between ClickHouse Replicas
- Configuring the Support for Transactions on ClickHouse
- Pre-Caching ClickHouse Metadata to the Memory
- Collecting Dumping Logs of the ClickHouse System Tables
- ClickHouse Log Overview
-
ClickHouse FAQ
- How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal?
- How Do I Quickly Restore the Status of a Logical Cluster in a Scale-in Fault Scenario?
- What Should I Do If a File System Error Is Reported and Core Dump Occurs During Process Startup and part Loading After a ClickHouserServer Instance Node Is Power Cycled?
- What Should I Do If an Exception Occurred in the replication_queue and Data Is Inconsistent Between Replicas After a ClickHouse Cluster Is Powered On from a Sudden Poweroff?
-
Using Containers
- Introduction to Containers
- Adding or Deleting an Application
- Monitoring Applications
- Starting and Stopping Applications
- Adjusting Application Resources
- Modifying Application Configurations
- Service Governance
- Upgrade Example for a Fixed Version Number on the Consumer Side
- Creating a Containers Role
- Deploying an ALB
- Introduction to Containers Logs
- Using DBService
-
Using Doris
- Installing a MySQL Client
- Using Doris from Scratch
- Permissions Management
- Multi-Tenancy
- Native Web UI
- Doris Data Model
- Doris Cold and Hot Data Separation
- Data Operations
- Typical SQL Syntax
- Backing Up and Restoring Data
- Hive Data Analysis
- Ecosystem
-
Doris FAQs
- What Should I Do If "Failed to find enough host with storage medium and tag" Occasionally Occurs During Table Creation Due to the Configuration of the SSD and HDD Data Directories?
- What Should I Do If a Query Is Performed on the BE Node Where Some Copies Are Lost or Damaged and an Error Is Reported?
- What Should I Do If RPC Timeout Error Is Reported When Stream Load Is Used?
- How Do I Restore the FE Service from a Fault?
- What Do I Do If the Error Message "plugin not enabled" Is Displayed When the MySQL Client Is Used to Connect to the Doris Database?
- How Do I Handle the FE Startup Failure?
- How Do I Handle the Startup Failure Due to Incorrect IP Address Matching for the BE Instance?
- What Should I Do If Error Message "Read timed out" Is Displayed When the MySQL Client Connects to the Doris?
- What Should I Do If an Error Is Reported When the BE Runs a Data Import or Query Task?
- What Should I Do If a Timeout Error Is Reported When Broker Load Imports Data?
- What Should I Do If the Data Volume of a Broker Load Import Task Exceeds the Threshold?
- What Should I Do If an Error Message Is Displayed When Broker Load Is Used to Import Data?
- How Do I Rectify the Serialization Exception Reported When Data Is Imported to Spark Load?
- What Should I Do If An App ID Cannot Be Obtained When Spark Load Imports Data?
- Doris Logs
-
Using Elasticsearch
- Using Elasticsearch from Scratch
- Elasticsearch Usage Suggestions
- Elasticsearch Authentication Mode
- Using the Elasticsearch Client
- Running curl Commands in Linux
- In-House Plug-Ins
- API Authentication Whitelist Configuration
- SSL Encrypted Transmission Configuration
- Custom Data Directory
- Traffic Control
- Index Lifecycle Management
- Using UDFs in SQL Queries
- Connecting Elasticsearch to Other Components
- Switching the Elasticsearch Security Mode
- Synchronizing Index Owner Group
- Migrating Data
- Elasticsearch Log Overview
- Elasticsearch Performance Tuning
-
Common Issues About Elasticsearch
- Common Problems About the Reindex Tool
- What Can I Do If the Query Speed Is Slow in Full-Text Retrieval Scenarios?
- What Can I Do If High Read I/O Occurs When Document IDs Are Specified in the Scenario When the Data Written into the Database Reaches a Certain Volume?
- Custom Elasticsearch Plug-in Installation Guide
- What Can I Do If Status of Elasticsearch Shards (Unassigned Shards) Becomes Down?
- Elasticsearch Fails to be Started Because of the Inconsistent Xms and Xmx Configurations of the Memory
- What Can I Do If "vm.max_map_count is too low" Is Reported When Elasticsearch Fails to Be Started?
- What Can I Do If Instance Startup Failure Is Caused by the Configuration File During the Elasticsearch Startup?
- What Can I Do If Elasticsearch Instance Fault Occurs Due to Insufficient Directory Permission?
- What Can I Do If the Speed of Writing Data into Elasticsearch Is Slow Due to Fault on an Elasticsearch Node?
- What Can I Do If two Different Values Are Returned for hits.total When the Same Statement Is Used to Query Data in Elasticsearch in the Same Condition for Twice?
- What Can I Do If the Heap Memory of an EsNode Instance Overflows During the Running of Elasticsearch?
- What Can I Do If Data Fails to Be Written Because the Type of the Data to Be Written Is Different from That of the Existing Data?
- What Can I Do If the Authentication Failed When Accessing the Index Data?
- EsMaster Memory Overflows During Elasticsearch Cluster Restart
-
Using Flink
- Using Flink from Scratch
- Viewing Flink Job Information
- Configuring Flink Service Parameters
- Configuring Flink Security Features
-
Configuring and Developing a Flink Visualization Job
- Introduction to Flink Web UI
- Flink Web UI Permission Management
- Creating a FlinkServer Role
- Accessing the Flink Web UI
- Creating an Application
- Creating a Cluster Connection
- Creating a Data Connection
- Creating a Stream Table
- Creating a Job
- Restoring a Job
- Configuring Dependency Management
- Configuring and Managing UDFs
- Configuring the FlinkServer UDF Sandbox
- Reusing Flink UDFs
- Importing and Exporting Jobs
- Verifying Flink's Job Inspection
-
Configuring Interconnection Between FlinkServer and Other Components
- Interconnecting FlinkServer with ClickHouse
- Interconnecting FlinkServer with Elasticsearch
- Interconnecting FlinkServer with GaussDB(DWS)
- Interconnecting FlinkServer with JDBC
- Interconnecting FlinkServer with HBase
- Interconnecting FlinkServer with HDFS
- Interconnecting FlinkServer with Hive
- Interconnecting FlinkServer with Hudi
- Interconnecting FlinkServer with Kafka
- Interconnecting FlinkServer with Redis
- Flink Log Overview
- Flink Performance Tuning
- Common Flink Shell Commands
- Reference
- Flink Restart Policy
-
Enhancements to Flink SQL
- Using the DISTRIBUTEBY Feature
- Supporting Late Data in Flink SQL Window Functions
- Configuring Table-Level Time To Live (TTL) for Joining Multiple Flink Streams
- Verifying SQL Statements with the FlinkSQL Client
- Submitting a Job on the FlinkSQL Client
- Joining Big and Small Tables
- Deduplicating Data When Joining Big and Small Tables
- Setting Source Parallelism
- Limiting Read Rate for Flink SQL Kafka and Upsert-Kafka Connector
- Consuming Data in drs-json Format with FlinkSQL Kafka Connector
- Using ignoreDelete in JDBC Data Writes
- Join-To-Live
- Flink on Hudi Development Specifications
-
Using Flume
- Using Flume from Scratch
- Overview
- Installing the Flume Client
- Viewing Flume Client Logs
- Stopping or Uninstalling the Flume Client
- Using the Encryption Tool of the Flume Client
- Flume Service Configuration Guide
- Flume Configuration Parameter Description
- Using Environment Variables in the properties.properties File
-
Non-Encrypted Transmission
- Configuring Non-encrypted Transmission
- Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS
- Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client
- Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase
- Encrypted Transmission
- Viewing Flume Client Monitoring Information
- Connecting Flume to Kafka in Security Mode
- Connecting Flume with Hive in Security Mode
- Configuring the Flume Service Model
- Introduction to Flume Logs
- Flume Client Cgroup Usage Guide
- Secondary Development Guide for Flume Third-Party Plug-ins
- Common Issues About Flume
-
Using HBase
- Using HBase from Scratch
- Using an HBase Client
- Creating HBase Roles
- Configuring HBase Replication
- Configuring HBase Parameters
- Enabling Cross-Cluster Copy
- Using the ReplicationSyncUp Tool
- GeoMesa Command Line
- Using HIndex
- Using Global Secondary Indexes
- Configuring HBase DR
- Configuring HBase Data Compression and Encoding
- Performing an HBase DR Service Switchover
- Performing an HBase DR Active/Standby Cluster Switchover
- Community BulkLoad Tool
- Configuring Secure HBase Replication
- Configuring Region In Transition Recovery Chore Service
- Enabling the HBase Compaction
- Using a Secondary Index
- Hot-Cold Data Separation
- Configuring HBase Table-Level Overload Control
- HBase Log Overview
- HBase Performance Tuning
-
Common Issues About HBase
- Why Does a Client Keep Failing to Connect to a Server for a Long Time?
- Operation Failures Occur in Stopping BulkLoad On the Client
- Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively?
- Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port?
- Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail?
- How Do I Restore a Region in the RIT State for a Long Time?
- Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
- Why Does SocketTimeoutException Occur When a Client Queries HBase?
- Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command?
- Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell?
- When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared?
- Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload?
- What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions?
- How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper?
- Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS?
- Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed
- Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?
- Insufficient Rights When a Tenant Accesses Phoenix
- What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"?
- How Do I Fix Region Overlapping?
- Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB?
- Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches?
- Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used?
- How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool?
- Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins?
- How Do I View Regions in the CLOSED State in an ENABLED Table?
- How Can I Quickly Recover the Service When HBase Files Are Damaged Due to a Cluster Power-Off?
- How Do I Disable HDFS Hedged Read on HBase?
-
Using HetuEngine
- Using HetuEngine from Scratch
- HetuEngine Permission Management
- Creating a HetuEngine User
- Creating a HetuEngine Compute Instance
-
Managing HetuEngine Compute Instances
- Configuring Resource Groups
- Configuring the Number of Worker Nodes
- Configuring a HetuEngine Maintenance Instance
- Configuring the Nodes on Which Coordinator Is Running
- Importing and Exporting Compute Instance Configurations
- Viewing the Instance Monitoring Page
- Viewing Coordinator and Worker Logs
- Configuring Query Fault Tolerance Execution
- Using the HetuEngine Client
- Using the HetuEngine Cross-Source Function
- Using the HetuEngine Cross-Domain Function
-
Configuring Data Sources
- Before You Start
- Configuring a Hive Data Source
- Configuring a Hudi Data Source
- Configuring a ClickHouse Data Source
- Configuring an Elasticsearch Data Source
- Configuring a GaussDB Data Source
- Configuring an HBase Data Source
- Configuring a HetuEngine Data Source
- Configuring an IoTDB Data Source
- Configuring a MySQL Data Source
- Managing Configured Data Sources
-
Using HetuEngine Materialized Views
- Overview of Materialized Views
- SQL Statement Example of Materialized Views
- Configuring Rewriting of Materialized Views
- Configuring Recommendation of Materialized Views
- Configuring Caching of Materialized Views
- Configuring the Validity Period and Data Update of Materialized Views
- Configuring Intelligent Materialized Views
- Viewing Automatic Tasks of Materialized Views
- Using HetuEngine SQL Diagnosis
- Using a Third-Party Visualization Tool to Access HetuEngine
- Developing and Applying Functions and UDFs
- HetuEngine Logs
- HetuEngine Performance Tuning
-
HetuEngine FAQ
- How Do I Perform Operations After the Domain Name Is Changed?
- What Do I Do If Starting a Cluster on the Client Times Out?
- How Do I Handle Data Source Loss?
- How Do I Handle HetuEngine Alarms?
- How Do I Do If an Error Is Reported Indicating that Python Does Not Exist When a Compute Instance Fails to Start?
- How Do I Do If a Compute Instance Fails 30 Seconds After It Is Started?
- What Do I Do If Data Fails to Be Written to a Table Because the Namespace of the Table Is Different from That of the /tmp Directory in the Federation Scenario?
- How Do I Configure HetuEngine SQL Inspection?
-
Using HDFS
- Using Hadoop from Scratch
- Configuring Memory Management
- Creating an HDFS Role
- Using the HDFS Client
- Running the DistCp Command
- Overview of HDFS File System Directories
- Changing the DataNode Storage Directory
- Configuring HDFS Directory Permission
- Configuring NFS
- Planning HDFS Capacity
- Configuring ulimit for HBase and HDFS
- Configuring HDFS DataNode Data Balancing
- Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes
- Configuring the Number of Files in a Single HDFS Directory
- Configuring the Recycle Bin Mechanism
- Setting Permissions on Files and Directories
- Setting the Maximum Lifetime and Renewal Interval of a Token
- Configuring the Damaged Disk Volume
- Configuring Encrypted Channels
- Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable
- Configuring the NameNode Blacklist
- Optimizing HDFS NameNode RPC QoS
- Optimizing HDFS DataNode RPC QoS
- Configuring Reserved Percentage of Disk Usage on DataNodes
- Configuring HDFS NodeLabel
- Configuring HDFS Mover
- Using HDFS AZ Mover
- Configuring HDFS DiskBalancer
- Configuring the Observer NameNode to Process Read Requests
- Performing Concurrent Operations on HDFS Files
- Introduction to HDFS Logs
- HDFS Performance Tuning
-
FAQ
- NameNode Startup Is Slow
- DataNode Is Normal but Cannot Report Data Blocks
- HDFS WebUI Cannot Properly Update Information About Damaged Data
- Why Do DistCp Commands Fail to Run in a Security Cluster and Exceptions Are Thrown?
- How Do I Rectify the Faulty If DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated?
- Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
- Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage
- What Should I Do If Data in the Cache Is Lost When the System Is Powered Off During Small File Storage?
- Why Does Array Border-crossing Occur During FileInputFormat Split?
- Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST?
- How Do I Handle the Problem that HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
- Can I Delete or Modify the Data Storage Directory in DataNode?
- Blocks Miss on the NameNode UI After the Successful Rollback
- Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS
- Why are There Two Standby NameNodes After the active NameNode Is Restarted?
- When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?
- "This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI
- NameNode Fails to Be Restarted Due to EditLog Discontinuity
-
Using Hive
- Using Hive from Scratch
- Configuring Hive Parameters
- Hive SQL
- Permission Management
- Using a Hive Client
- Using HDFS Colocation to Store Hive Tables
- Using the Hive Column Encryption Function
- Customizing Row Separators
- Configuring Hive on HBase in Across Clusters with Mutual Trust Enabled
- Deleting Single-Row Records from Hive on HBase
- Configuring HTTPS/HTTP-based REST APIs
- Enabling or Disabling the Transform Function
- Access Control of a Dynamic Table View on Hive
- Specifying Whether the ADMIN Permissions Is Required for Creating Temporary Functions
- Using Hive to Read Data in a Relational Database
- Supporting Traditional Relational Database Syntax in Hive
- Creating User-Defined Hive Functions
- Enhancing beeline Reliability
- Viewing Table Structures Using the show create Statement as Users with the select Permission
- Writing a Directory into Hive with the Old Data Removed to the Recycle Bin
- Inserting Data to a Directory That Does Not Exist
- Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator
- Disabling of Specifying the location Keyword When Creating an Internal Hive Table
- Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read
- Authorizing Over 32 Roles in Hive
- Restricting the Maximum Number of Maps for Hive Tasks
- HiveServer Lease Isolation
- Hive Supports Isolation of Metastore instances Based on Components
- Switching the Hive Execution Engine to Tez
- Hive Supporting Reading Hudi Tables
- Hive Supporting Cold and Hot Storage of Partitioned Metadata
- Hive Supporting ZSTD Compression Formats
- Locating Abnormal Hive Files
- Using the ZSTD_JNI Compression Algorithm to Compress Hive ORC Tables
- Load Balancing for Hive MetaStore Client Connection
- Data Import and Export in Hive
- Hive Log Overview
- Hive Performance Tuning
-
Common Issues About Hive
- How Do I Delete UDFs on Multiple HiveServers at the Same Time?
- Why Cannot the DROP operation Be Performed on a Backed-up Hive Table?
- How to Perform Operations on Local Files with Hive User-Defined Functions
- How Do I Forcibly Stop MapReduce Jobs Executed by Hive?
- How Do I Monitor the Hive Table Size?
- How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement?
- Why Is Hive on Spark Task Freezing When HBase Is Not Installed?
- Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive
- Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client?
- Description of Hive Table Location (Either Be an OBS or HDFS Path)
- Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?
- Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?
- Why Does Hive Not Support Vectorized Query?
- Why Does Metadata Still Exist When the HDFS Data Directory of the Hive Table Is Deleted by Mistake?
- How Do I Disable the Logging Function of Hive?
- Why Hive Tables in the OBS Directory Fail to Be Deleted?
- Hive Configuration Problems
- How Do I Handle the Error Reported When Setting hive.exec.stagingdir on the Hive Client?
-
Using Hudi
- Getting Started
- Common Hudi Parameters
- Basic Operations
- Hudi SQL Syntax Reference
- Setting Default Values for Hudi Columns
- Hudi Performance Tuning
-
Common Issues About Hudi
-
Data Write
- Parquet/Avro schema Is Reported When Updated Data Is Written
- UnsupportedOperationException Is Reported When Updated Data Is Written
- SchemaCompatabilityException Is Reported When Updated Data Is Written
- What Should I Do If Hudi Consumes Much Space in a Temporary Folder During Upsert?
- Hudi Fails to Write Decimal Data with Lower Precision
- Data in ro and rt Tables Cannot Be Synchronized to a MOR Table Recreated After Being Deleted Using Spark SQL
- Data Collection
- Hive Synchronization
-
Data Write
-
Using Hue
- Using Hue from Scratch
- Accessing the Hue Web UI
- Hue Common Parameters
- Using HiveQL Editor on the Hue Web UI
- Using the SparkSql Editor on the Hue Web UI
- Using the Metadata Browser on the Hue Web UI
- Using File Browser on the Hue Web UI
- Using Job Browser on the Hue Web UI
- Using HBase on the Hue Web UI
- Typical Scenarios
- Hue Log Overview
-
Common Issues About Hue
- Why Do HQL Statements Fail to Execute in Hue Using Internet Explorer?
- Why Does the use database Statement Become Invalid in Hive?
- Why Do HDFS Files Fail to Access Through the Hue Web UI?
- Why Do Large Files Fail to Upload on the Hue Page
- Why Is the Hue Native Page Cannot Be Properly Displayed If the Hive Service Is Not Installed in a Cluster?
- What Should I Do If It Takes a Long Time to Access the Native Hue UI and the File Browser Reports "Read timed out"?
- Using IoTDB
- Using JobGateway
-
Using Kafka
- Using Kafka from Scratch
- Managing Kafka Topics
- Querying Kafka Topics
- Managing Kafka User Permissions
- Managing Messages in Kafka Topics
- Synchronizing Binlog-based MySQL Data to the MRS Cluster
- Creating a Kafka Role
- Kafka Common Parameters
- Safety Instructions on Using Kafka
- Kafka Specifications
- Using the Kafka Client
- Configuring Kafka HA and High Reliability Parameters
- Changing the Broker Storage Directory
- Checking the Consumption Status of Consumer Group
- Kafka Balancing Tool Instructions
- Kafka Token Authentication Mechanism Tool Usage
- Kafka Encryption and Decryption
- Using Kafka UI
- Kafka Logs
- Performance Tuning
- Kafka Feature Description
- Migrating Data Between Kafka Nodes
- Common Issues About Kafka
- Using KMS
- Using LakeSearch
-
Using Loader
- Common Loader Parameters
- Creating a Loader Role
- Managing Loader Links
- Preparing a Driver for MySQL Database Link
-
Importing Data
- Overview
- Importing Data Using Loader
- Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS
- Typical Scenario: Importing Data from an SFTP Server to HBase
- Typical Scenario: Importing Data from an SFTP Server to Hive
- Typical Scenario: Importing Data from an FTP Server to HBase
- Typical Scenario: Importing Data from a Relational Database to HDFS or OBS
- Typical Scenario: Importing Data from a Relational Database to HBase
- Typical Scenario: Importing Data from a Relational Database to Hive
- Typical Scenario: Importing Data from HDFS or OBS to HBase
- Typical Scenario: Importing Data from a Relational Database to ClickHouse
-
Exporting Data
- Overview
- Using Loader to Export Data
- Typical Scenario: Exporting Data from HDFS or OBS to an SFTP Server
- Typical Scenario: Exporting Data from HBase to an SFTP Server
- Typical Scenario: Exporting Data from Hive to an SFTP Server
- Typical Scenario: Exporting Data from HDFS or OBS to a Relational Database
- Typical Scenario: Exporting Data from HDFS to MOTService
- Typical Scenario: Exporting Data from HBase to a Relational Database
- Typical Scenario: Exporting Data from Hive to a Relational Database
- Typical Scenario: Importing Data from HBase to HDFS or OBS
- Typical Scenario: Exporting Data from HDFS to ClickHouse
- Managing Jobs
- Operator Help
-
Client Tools
- Running a Loader Job Through CLI
- loader-tool Usage Guide
- loader-tool Usage Example
- schedule-tool Usage Guide
- schedule-tool Usage Example
- Using loader-backup to Back Up Job Data
- Open Source sqoop-shell Tool Usage Guide
- Example for Using the Open-Source sqoop-shell Tool (SFTP-HDFS)
- Example for Using the Open-Source sqoop-shell Tool (Oracle-HBase)
- Loader Log Overview
- Common Issues About Loader
-
Using MapReduce
- Configuring the Log Archiving and Clearing Mechanism
- Reducing Client Application Failure Rate
- Transmitting MapReduce Tasks from Windows to Linux
- Configuring the Distributed Cache
- Configuring the MapReduce Shuffle Address
- Configuring the Cluster Administrator List
- Introduction to MapReduce Logs
- MapReduce Performance Tuning
-
Common Issues About MapReduce
- How Do I Handle the Problem that MapReduce Task Has No Progress for a Long Time?
- Why the Client Hangs During Job Running?
- Why Cannot HDFS_DELEGATION_TOKEN Be Found in the Cache?
- How Do I Set the Task Priority When Submitting a MapReduce Task?
- Why Physical Memory Overflow Occurs If a MapReduce Task Fails?
- After the Address of MapReduce JobHistoryServer Is Changed, Why the Wrong Page is Displayed When I Click the Tracking URL on the ResourceManager WebUI?
- MapReduce Job Failed in Multiple NameService Environment
- Why a Fault MapReduce Node Is Not Blacklisted?
- Using MemArtsCC
- Using Metadata
-
Using MOTService
- Using MOTService from Scratch
- MOTService Permissions Management
- Creating an MOTService User
- Using the MOTService Client
- Introduction to the MOTService Maintenance Tool gs_om
- MOTService Data Backup and Restoration
- Reinstalling a MOTService Host
- MOTService SQL Coverage and Limitations
- MOTService Data Aging Configuration
- Introduction to MOTService Logs
-
Using Oozie
- Using Oozie from Scratch
- Using the Oozie Client
- Checking ShareLib
- Using Oozie Client to Submit an Oozie Job
-
Using Hue to Submit an Oozie Job
- Creating a Workflow
-
Submitting a Workflow Job
- Submitting a Hive2 Job
- Submitting a Spark Job
- Submitting a Java Job
- Submitting a Loader Job
- Submitting a MapReduce Job
- Submitting a Sub-workflow Job
- Submitting a Shell Job
- Submitting an HDFS Job
- Submitting a Streaming Job
- Submitting a DistCp Job
- Example of Mutual Trust Operations
- Submitting an SSH Job
- Submitting a Hive Script
- Submitting a Coordinator Periodic Scheduling Job
- Submitting a Bundle Batch Processing Job
- Querying the Operation Results
- Oozie Log Overview
-
Common Issues About Oozie
- What Should I Do If Oozie Scheduled Tasks Are Not Executed on Time
- Why Update of the share lib Directory of Oozie on HDFS Does Not Take Effect?
- Common Oozie Troubleshooting Methods
- What Should I Do If the User Who Submits Jobs on the Oozie Client in a Normal Cluster Is Inconsistent with the User Displayed on the Yarn Web UI?
-
Using Ranger
- Logging In to the Ranger Web UI
- Enabling Ranger Authentication
- Configuring Component Permission Policies
- Viewing Ranger Audit Information
- Configuring a Security Zone
- Changing the Ranger Data Source to LDAP for a Normal Cluster
- Viewing Ranger Permission Information
- Adding a Ranger Access Permission Policy for CDL
- Adding a Ranger Access Permission Policy for HDFS
- Adding a Ranger Access Permission Policy for HBase
- Adding a Ranger Access Permission Policy for Hive
- Adding a Ranger Access Permission Policy for Yarn
- Adding a Ranger Access Permission Policy for Spark
- Adding a Ranger Access Permission Policy for Kafka
- Adding a Ranger Access Permission Policy for HetuEngine
- Adding a Ranger Access Permission Policy for Storm
- Adding a Ranger Access Permission Policy for Elasticsearch
- Adding a Ranger Access Permission Policy for OBS
- Hive Tables Supporting Cascading Authorization
- Configuring Multi-Instance for RangerKMS
- Using the RangerKMS Native UI to Manage Permissions and Keys
- Ranger Log Overview
-
Common Issues About Ranger
- Why Ranger Startup Fails During the Cluster Installation?
- How Do I Determine Whether the Ranger Authentication Is Used for a Service?
- Why Cannot a New User Log In to Ranger After Changing the Password?
- When an HBase Policy Is Added or Modified on Ranger, Wildcard Characters Cannot Be Used to Search for Existing HBase Tables
- How Do I Rectify the Problem that RangerKMS Authentication Fails and the KMS Tab Is Not Displayed on the Ranger Management Page?
- Using Redis
-
Using RTDService
- Overview
- RTDService Permission Management
- Accessing the RTDService Web UI
- Tenant Management
-
Service Management
- Configuring Analysis Dimensions
- Adding an Event Source
- Configuring Event Variables
- Adding a Batch Variable
- Adding a Real-Time Query Variable
- Window Variable Management
- Adding a Scoring Model
- Adding an Inference Variable
- Adding a Stored Procedure Rule
- Adding a Blacklist/Whitelist Rule
- Adding a Decision Engine
- Database Tools
- Managing a Template
- Importing and Exporting RTDService Metadata
- Modifying the Event Source Executor Configuration on the Web UI
- RTDService Logs
-
Using Solr
- Using Solr from Scratch
- Creating a Solr Role
- Using the Solr Client
-
Common Service Operations About Solr
- Solr Overview
- Configuration File managed-schema in the Solr Config Set
- Configuration File solrconfig.xml in the Solr Config Set
- Shell Client Operation Commands
- Operations on the Solr Admin UI
- Solr over HDFS
- Solr over HBase
- curl Commands in Linux
- REST Messages Sent in URLs Through Browsers
- Solr User Permission Configuration and Management
- Word Filter Customization
- HBase Full-Text Index
- Sensitive Word Filtering
- Including Collection Names in Query Results
- Solr Multi-System Mutual Trust
- Solr Rich Text Indexing
- (Recommended) Changing the Collection Data Storage Mode from HDFS to Local Disk
- Changing the Index Data Storage Mode from Local Disk to HDFS
- Restoring Data using Solr
- Solr Log Overview
- Solr Performance Tuning
- Common Issues About Solr
-
Using Spark
-
Basic Operation
- Getting Started
- Configuring Parameters Rapidly
- Common Parameters
- Spark on HBase Overview and Basic Applications
- Spark on HBase V2 Overview and Basic Applications
- SparkSQL Permission Management(Security Mode)
-
Scenario-Specific Configuration
- Configuring Multi-active Instance Mode
- Configuring the Multi-Tenant Mode
- Configuring the Switchover Between the Multi-active Instance Mode and the Multi-tenant Mode
- Configuring the Size of the Event Queue
- Configuring Executor Off-Heap Memory
- Enhancing Stability in a Limited Memory Condition
- Viewing Aggregated Container Logs on the Web UI
- Configuring Environment Variables in Yarn-Client and Yarn-Cluster Modes
- Configuring the Default Number of Data Blocks Divided by SparkSQL
- Configuring the Compression Format of a Parquet Table
- Configuring the Number of Lost Executors Displayed in WebUI
- Setting the Log Level Dynamically
- Configuring Whether Spark Obtains HBase Tokens
- Configuring LIFO for Kafka
- Configuring Reliability for Connected Kafka
- Configuring Streaming Reading of Driver Execution Results
- Filtering Partitions without Paths in Partitioned Tables
- Configuring Spark Web UI ACLs
- Configuring Vector-based ORC Data Reading
- Broaden Support for Hive Partition Pruning Predicate Pushdown
- Hive Dynamic Partition Overwriting Syntax
- Configuring the Column Statistics Histogram for Higher CBO Accuracy
- Configuring Local Disk Cache for JobHistory
- Configuring Spark SQL to Enable the Adaptive Execution Feature
- Configuring Event Log Rollover
- Configuring the Spark Native Engine
- Configuring Automatic Merging of Small Files
- Adapting to the Third-party JDK When Ranger Is Used
- Spark Log Overview
- Obtaining Container Logs of a Running Spark Application
- Small File Combination Tools
- Using CarbonData for First Query
-
Spark Performance Tuning
- Spark Core Tuning
-
Spark SQL and DataFrame Tuning
- Optimizing the Spark SQL Join Operation
- Improving Spark SQL Calculation Performance Under Data Skew
- Optimizing Spark SQL Performance in the Small File Scenario
- Optimizing the INSERT...SELECT Operation
- Multiple JDBC Clients Concurrently Connecting to JDBCServer
- Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables
- Optimizing Small Files
- Optimizing the Aggregate Algorithms
- Optimizing Datasource Tables
- Merging CBO
- Optimizing SQL Query of Data of Multiple Sources
- SQL Optimization for Multi-level Nesting and Hybrid Join
- Spark Streaming Tuning
- Spark on OBS Tuning
-
Spark FAQ
-
Spark Core
- How Do I View Aggregated Spark Application Logs?
- Why Cannot Exit the Driver Process?
- Why Does FetchFailedException Occur When the Network Connection Is Timed out
- How to Configure Event Queue Size If Event Queue Overflows?
- What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?
- What Can I Do If "Connection to ip:port has been quiet for xxx ms while there are outstanding requests" Is Reported When Spark Executes an Application and the Application Ends?
- Why Do Executors Fail to be Removed After the NodeManeger Is Shut Down?
- What Can I Do If the Message "Password cannot be null if SASL is enabled" Is Displayed?
- What Should I Do If the Message "Failed to CREATE_FILE" Is Displayed in the Restarted Tasks When Data Is Inserted Into the Dynamic Partition Table?
- Why Tasks Fail When Hash Shuffle Is Used?
- What Can I Do If the Error Message "DNS query failed" Is Displayed When I Access the Aggregated Logs Page of Spark Applications?
- What Can I Do If Shuffle Fetch Fails Due to the "Timeout Waiting for Task" Exception?
- Why Does the Stage Retry due to the Crash of the Executor?
- Why Do the Executors Fail to Register Shuffle Services During the Shuffle of a Large Amount of Data?
- NodeManager OOM Occurs During Spark Application Execution
- Why Does the Realm Information Fail to Be Obtained When SparkBench is Run on HiBench for the Cluster in Security Mode?
-
Spark SQL and DataFrame
- What Do I have to Note When Using Spark SQL ROLLUP and CUBE?
- Why Spark SQL Is Displayed as a Temporary Table in Different Databases?
- How to Assign a Parameter Value in a Spark Command?
- What Directory Permissions Do I Need to Create a Table Using SparkSQL?
- Why Do I Fail to Delete the UDF Using Another Service?
- Why Cannot I Query Newly Inserted Data in a Parquet Hive Table Using SparkSQL?
- How to Use Cache Table?
- Why Are Some Partitions Empty During Repartition?
- Why Does 16 Terabytes of Text Data Fails to Be Converted into 4 Terabytes of Parquet Data?
- How Do I Rectify the Exception Occurred When I Perform an Operation on the Table Named table?
- Why Is a Task Suspended When the ANALYZE TABLE Statement Is Executed and Resources Are Insufficient?
- If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?
- Why Do I Fail to Modify MetaData by Running the Hive Command?
- Why Is "RejectedExecutionException" Displayed When I Exit Spark SQL?
- How Do I Do If I Incidentally Kill the JDBCServer Process During Health Check?
- Why No Result Is found When 2016-6-30 Is Set in the Date Field as the Filter Condition?
- Why Does the "--hivevar" Option I Specified in the Command for Starting spark-beeline Fail to Take Effect?
- Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?
- Why Are Some Functions Not Available when ThriftJDBCServers Are Connected?
- Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?
- Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL?
-
Spark Streaming
- What Can I Do If Spark Streaming Tasks Are Blocked?
- What Should I Pay Attention to When Optimizing Spark Streaming Task Parameters?
- Why Does the Spark Streaming Application Fail to Be Submitted After the Token Validity Period Expires?
- Why Does the Spark Streaming Application Fail to Be Started from the Checkpoint When the Input Stream Has No Output Logic?
- Why Is the Input Size Corresponding to Batch Time on the Web UI Set to 0 Records When Kafka Is Restarted During Spark Streaming Running?
- Spark Ranger FAQ
- Why Is the RESTful Interface Information Obtained by Accessing Spark Incorrect?
- Why Cannot I Switch from the Yarn Web UI to the Spark Web UI?
- What Can I Do If an Error Occurs when I Access the Application Page Because the Application Cached by HistoryServer Is Recycled?
- Why Is not an Application Displayed When I Run the Application with the Empty Part File?
- Why Does Spark Fail to Export a Table with Duplicate Field Names?
- Why JRE fatal error after running Spark application multiple times?
- Why Is "This page can't be displayed" Displayed or an Error Reported When I Use Internet Explorer to Access the Native Web UI of Spark?
- How Does Spark Access External Cluster Components?
- Why Does the Foreign Table Query Fail When Multiple Foreign Tables Are Created in the Same Directory?
- Why Is an Error Reported When I Access the Native Page of an Application in Spark JobHistory?
- Why Do I Fail to Create a Table in the Specified Location on OBS After Logging to spark-beeline?
- Spark Shuffle Exception Handling
- Why Cannot Common Users Log In to the Spark Client When There Are Multiple Service Scenarios in Spark?
- Why Does the Cluster Port Fail to Connect When a Client Outside the Cluster Is Installed or Used?
- How Do I Handle the Exception Occurred When I Query Datasource Avro Formats?
- What Should I Do If Statistics of Hudi or Hive Tables Created Using Spark SQLs Are Empty Before Data Is Inserted?
- Failed to Query Table Statistics by Partition Using Non-Standard Time Format When the Partition Column in the Table Creation Statement is timestamp
- How Do I Use Special Characters with TIMESTAMP and DATE?
- What Should I Do If Recycle Bin Version I Set on the Spark Client Does Not Take Effect?
- How Do I Change the Log Level to INFO When Using Spark yarn-client?
-
Spark Core
-
Basic Operation
- Using Tez
-
Using YARN
- Common YARN Parameters
- Creating Yarn Roles
- Using the YARN Client
- Configuring Resources for a NodeManager Role Instance
- Changing NodeManager Storage Directories
- Configuring Strict Permission Control for Yarn
- Configuring Container Log Aggregation
- Using CGroups with YARN
- Configuring the Number of ApplicationMaster Retries
- Configure the ApplicationMaster to Automatically Adjust the Allocated Memory
- Configuring the Access Channel Protocol
- Configuring Memory Usage Detection
- Configuring the Additional Scheduler WebUI
- Configuring Yarn Restart
- Configuring ApplicationMaster Work Preserving
- Configuring the Localized Log Levels
- Configuring Users That Run Tasks
- Yarn Log Overview
- Yarn Performance Tuning
-
Common Issues About Yarn
- Why Mounted Directory for Container is Not Cleared After the Completion of the Job While Using CGroups?
- Why the Job Fails with HDFS_DELEGATION_TOKEN Expired Exception?
- Why Are Local Logs Not Deleted After YARN Is Restarted?
- Why the Task Does Not Fail Even Though AppAttempts Restarts for More Than Two Times?
- Application Moved Back to the Original Queue After the ResourceManager Is Restarted?
- Why Does Yarn Not Release the Blacklist Even All Nodes Are Added to the Blacklist?
- Why Does the Switchover of ResourceManager Occur Continuously?
- Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
- Why Does an Error Occur When I Query the ApplicationID of a Completed or Non-existing Application Using the RESTful APIs?
- Why May A Single NodeManager Fault Cause MapReduce Task Failures in the Superior Scheduling Mode?
- Why Are Applications Suspended After They Are Moved From Lost_and_Found Queue to Another Queue?
- How Do I Limit the Size of Application Diagnostic Messages Stored in the ZKstore?
- Why Does a MapReduce Job Fail to Run When a Non-ViewFS File System Is Configured as ViewFS?
- Why Do Reduce Tasks Fail to Run in Some OSs After the Native Task Feature is Enabled?
-
Using ZooKeeper
- Using ZooKeeper from Scratch
- Common ZooKeeper Parameters
- Using a ZooKeeper Client
- Configuring the ZooKeeper Permissions
- ZooKeeper Log Overview
-
Common Issues About ZooKeeper
- Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?
- Why Does the ZooKeeper Server Display the java.io.IOException: Len Error Log?
- Why Four Letter Commands Don't Work With Linux netcat Command When Secure Netty Configurations Are Enabled at Zookeeper Server?
- How Do I Check Which ZooKeeper Instance Is a Leader?
- Why Cannot the Client Connect to ZooKeeper using the IBM JDK?
- What Should I Do When the ZooKeeper Client Fails to Refresh a TGT?
- Why Is Message "Node does not exist" Displayed when A Large Number of Znodes Are Deleted Using the deleteallCommand
- Appendix
-
Using CarbonData
-
API Reference (Ankara Region)
- Before You Start
- API Overview
- Selecting an API Type
- Calling APIs
- Application Cases
- API V2
- API V1.1
- Out-of-Date APIs
- Permissions Policies and Supported Actions
- Appendix
-
User Guide (ME-Abu Dhabi Region)
Copiado.
ALM-14023 El porcentaje del espacio total en disco reservado para réplicas supera el umbral
Descripción
El sistema comprueba el porcentaje de espacio total en disco reservado para las réplicas (Espacio total reservado en disco para réplicas/(Espacio total reservado en disco para réplicas + Espacio total restante en disco) cada 30 segundos y compara el porcentaje real con el umbral (el 90% por defecto). Esta alarma se genera cuando el porcentaje de espacio total reservado en disco para réplicas supera el umbral por varias veces consecutivas (Trigger Count).
La alarma se borra en los dos escenarios siguientes: El valor de Trigger Count es de 1 y el porcentaje del espacio total reservado en disco para réplicas es inferior o igual al umbral; el valor de Trigger Count es mayor que 1 y el porcentaje de espacio total reservado en disco para réplicas es menor o igual al 90% del umbral.
Atributo
ID de alarma |
Gravedad de la alarma |
Borrado automáticamente |
---|---|---|
14023 |
Leves |
Sí |
Parámetros
Nombre |
Significado |
---|---|
Source |
Especifica el clúster para el que se genera la alarma. |
ServiceName |
Especifica el servicio para el que se genera la alarma. |
RoleName |
Especifica el rol para el que se genera la alarma. |
NameServiceName |
Especifica el servicio NameService para el que se genera la alarma. |
Trigger condition |
Especifica el umbral que activa la alarma. Si el valor del indicador actual excede este umbral, se genera la alarma. |
Impacto en el sistema
El rendimiento de la escritura de datos en HDFS se ve afectado. Si todo el espacio DataNode restante está reservado para réplicas, se produce un error al escribir datos HDFS.
Causas posibles
- El umbral de alarma está configurado incorrectamente.
- El espacio en disco configurado para el clúster HDFS es insuficiente.
- El volumen de servicios que acceden a HDFS es demasiado grande y, por lo tanto, DataNode está sobrecargado.
Procedimiento
Comprobar si el umbral de alarma es adecuado.
- En el portal del FusionInsight Manager, seleccione O&M > Alarm > Thresholds > Name of the desired cluster > HDFS > Disk > Percentage of Reserved Space for Replicas of Unused Space para comprobar si el umbral de alarma es apropiado. (El umbral predeterminado es 90%. Los usuarios pueden cambiarlo según sea necesario.)
- Elija O&M > Alarm > Thresholds > Name of the desired cluster > HDFS > Disk > Percentage of Reserved Space for Replicas of Unused Space y haga clic en Modify, para cambiar el umbral en función del uso real.
Figura 1 Modificar Umbrales
- Espere 5 minutos y compruebe si la alarma está desactivada.
- De ser así, no se requiere ninguna acción adicional.
- Si no, vaya a 4.
Comprobar si se genera una alarma que indica espacio en disco insuficiente.
- En el portal del FusionInsight Manager, compruebe si existe ALM-14001 El uso del disco HDFS supera el umbral o ALM-14002 El uso del disco de DataNode supera el umbral en la página O&M > Alarm > Alarms.
- Maneje la alarma haciendo referencia a las instrucciones de ALM-14001 El uso del disco HDFS supera el umbral y ALM-14002 El uso del disco de DataNode supera el umbral y compruebe si la alarma está desactivada.
- Espere 5 minutos y compruebe si la alarma está desactivada.
- De ser así, no se requiere ninguna acción adicional.
- Si no, vaya a 7.
Expandir la capacidad de DataNode.
- Amplíe la capacidad del DataNode.
- Espere 5 minutos y compruebe si la alarma está desactivada.
- De ser así, no se requiere ninguna acción adicional.
- Si no, vaya a 9.
Recopilar información de fallas.
- En el portal del FusionInsight Manager, elija O&M > Log > Download.
- Seleccione HDFS en el clúster requerido en el Service.
- Haga clic en en la esquina superior derecha y establezca Start Date y End Date para la recopilación de registros en 20 minutos antes y después del tiempo de generación de alarmas, respectivamente. A continuación, haga clic en Download.
- Póngase en contacto con el y envíe los registros recopilados.
Eliminación de alarmas
Después de rectificar la falla, el sistema borra automáticamente esta alarma.
Información relacionada
Ninguna