El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Hive

Updated on 2024-11-29 GMT+08:00

By connecting to Hive Metastore, or a metadata service compatible with Hive Metatore, Doris can automatically obtain Hive database table information and perform data queries.

In addition to Hive, many other systems also use the Hive Metastore to store metadata. Through Hive Catalog, we can access Hive, and access systems, such as Iceberg and Hudi, that use Hive Metastore as metadata storage.

NOTE:
  • Managed Table is supported.
  • Hive and Hudi metadata stored in Hive Metastore can be identified.
  • If you want to access a catalog that is not created by the current user, you need to grant the user the permission to operate the OBS path where the catalog is.
  • The Hive table format can only be Parquet, ORC, or TextFile.

Prerequisite

  • A cluster containing the Doris service has been created, and all services in the cluster are running properly.
  • The nodes to be connected to the Doris database can communicate with the MRS cluster.
  • A user with Doris management permission has been created.
    • Kerberos authentication is enabled for the cluster (the cluster is in security mode)

      Log in to FusionInsight Manager, create a human-machine user, for example, dorisuser, create a role with Doris administrator permissions, and bind the role to the user.

      Log in to FusionInsight Manager as the new user dorisuser and change the initial password.

    • Kerberos authentication is disabled for the cluster (the cluster is in normal mode)

      After connecting to Doris as user admin, create a role with administrator permissions, and bind the role to the user.

  • The MySQL client has been installed. For details, see Installing a MySQL Client.

Hive Table Operations

  1. Perform the following operations to read Hive data stored in OBS with Doris:

    1. Log in to the MRS management console. Move the cursor to the username in the upper right corner and select My Credentials from the drop-down list.
    2. Click Access Keys, click Create Access Key, and enter the verification code or password. Click OK to generate an access key, and download it.

      Obtain the values of obs.access_key and obs.secret_key required for creating a catalog from the .csv file. The mapping is as follows:

      • The value of obs.access_key is the value in the Access Key Id column of the .csv file.
      • The value of obs.secret_key is the value in the Secret Access Key column of the .csv file.
      NOTE:
      • Keep the CSV file properly. You can only download the file right after the access key is created. If you cannot find the file, you can create an access key again.
      • Keep your access keys secure and change them periodically for security purposes.
    3. You can obtain the value of obs.region from .
    4. Log in to the OBS management console, click Parallel File System, click the name of the OBS parallel file system where the Hive table is stored, and view the value of Endpoint on the overview page. The value is the same as that of obs.endpointT set during catalog creation.

  2. Log in to the node where MySQL is installed and connect the Doris database.

    If Kerberos authentication is enabled for the cluster (the cluster is in security mode), run the following command to connect to the Doris database:

    export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1

    mysql -uDatabase login username -pDatabase login password -PConnection port for FE queries -hIP address of the Doris FE instance

    NOTE:
    • To obtain the query connection port of the Doris FE instance, you can log in to FusionInsight Manager, choose Cluster > Services > Doris > Configurations, and query the value of query_port of the Doris service.
    • To obtain the IP address of the Doris FE instance, log in to FusionInsight Manager of the MRS cluster and choose Cluster > Services > Doris > Instances to view the IP address of any FE instance.
    • You can also use the MySQL connection software or Doris web UI to connect the database.

  3. Create a catalog.

    • Hive table data is stored in HDFS. Run the following command to create a catalog:
      • Kerberos authentication is enabled for the cluster (the cluster is in security mode):

        CREATE CATALOG hive_catalog PROPERTIES (

        'type'='hms',

        'hive.metastore.uris' = 'thrift://192.168.67.161:21088',

        'hive.metastore.sasl.enabled' = 'true',

        'hive.server2.thrift.sasl.qop' = 'auth-conf',

        'hive.server2.authentication' = 'KERBEROS',

        'dfs.nameservices'='hacluster',

        'dfs.ha.namenodes.hacluster'='24,25',

        'dfs.namenode.rpc-address.hacluster.24'=' IP address of the active NameNode:RPC communication port',

        'dfs.namenode.rpc-address.hacluster.25'=' IP address of the active NameNode:RPC communication port',

        'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',

        'hive.version' = '3.1.0',

        'yarn.resourcemanager.address' = '192.168.67.78:26004',

        'yarn.resourcemanager.principal' = 'mapred/hadoop.hadoop.com@HADOOP.COM',

        'hive.metastore.kerberos.principal' = 'hive/hadoop.hadoop.com@HADOOP.COM',

        'hadoop.security.authentication' = 'kerberos',

        'hadoop.kerberos.keytab' = '${BIGDATA_HOME}/FusionInsight_Doris_8.3.1/install/FusionInsight-Doris-2.0.3/doris-be/bin/doris.keytab',

        'hadoop.kerberos.principal' = 'doris/hadoop.hadoop.com@HADOOP.COM',

        'java.security.krb5.conf' = '${BIGDATA_HOME}/FusionInsight_BASE_*/1_16_KerberosClient/etc/krb5.conf',

        'hadoop.rpc.protection' = 'privacy'

        );

      • Kerberos authentication is disabled for the cluster (the cluster is in normal mode):

        CREATE CATALOG hive_catalog PROPERTIES (

        'type'='hms',

        'hive.metastore.uris' = 'thrift://192.168.67.161:21088',

        'hive.version' = '3.1.0',

        'hadoop.username' = 'hive',

        'yarn.resourcemanager.address' = '192.168.67.78:26004',

        'dfs.nameservices'='hacluster',

        'dfs.ha.namenodes.hacluster'='24,25',

        'dfs.namenode.rpc-address.hacluster.24'='192-168-67-172:25000',

        'dfs.namenode.rpc-address.hacluster.25'='192-168-67-78:25000',

        'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'

        );

      NOTE:
      • hive.metastore.uris: URL of Hive MetaStore. The format is thrift://<IP address of Hive MetaStore>:<Port number >. Multiple values are supported and need to be separated by commas (,).
      • dfs.nameservices: NameService name of the cluster. The value can be found in hdfs-site.xml, which is in the ${BIGDATA_HOME}/FusionInsight_HD_*/1_*_NameNode/etc directory on the node where NameNode is deployed.
      • dfs.ha.namenodes.hacluster: prefix of NameService node in a cluster, which contains two values. The value can be found in hdfs-site.xml, which is in the ${BIGDATA_HOME}/FusionInsight_HD_*/1_*_NameNode/etc directory on the node where NameNode is deployed.
      • dfs.namenode.rpc-address.hacluster.xx1: RPC communication address of the active NameNode. You can search for the value of this configuration item in hdfs-site.xml in the ${BIGDATA_HOME}/FusionInsight_HD_*/1_*_NameNode/etc directory on the node where NameNode is deployed. xx is the value of dfs.ha.namenodes.hacluster.
      • dfs.namenode.rpc-address.hacluster.xx2: RPC communication address of the standby NameNode. You can search for the value of this configuration item in hdfs-site.xml in the ${BIGDATA_HOME}/FusionInsight_HD_*/1_*_NameNode/etc directory on the node where NameNode is deployed. xx is the value of dfs.ha.namenodes.hacluster.
      • dfs.client.failover.proxy.provider.hacluster: Java class for the HDFS client to connect the active node in the cluster. The value is org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.
      • hive.version: Hive version. To obtain the version, log in to FusionInsight Manager, choose Cluster > Services > Hive, and view the version on the Dashboard page.
      • yarn.resourcemanager.address: IP address of the active ResourceManager instance. On FusionInsight Manager, choose Cluster > Services > Yarn > Instances to view the service IP address of the active ResourceManager instance.
      • hadoop.rpc.protection: whether to encrypt the RPC stream of each Hadoop module. The default value is privacy. To obtain the value, log in to FusionInsight Manager, choose Cluster > Services > HDFS > Configurations, and search for hadoop.rpc.protection.
      • Kerberos authentication is enabled for the cluster (the cluster is in security mode):
        • hive.metastore.sasl.enabled: whether to enable MetaStore management permission. The value is true.
        • hive.server2.thrift.sasl.qop: whether to encrypt the interaction between HiveServer2 and the client. The value is auth-conf.
        • hive.server2.authentication: security authentication for accessing HiveServer. The value is KERBEROS.
        • yarn.resourcemanager.principal: Principal for accessing the Yarn cluster. The value is mapred/hadoop.hadoop.com@HADOOP.COM.
        • hive.metastore.kerberos.principal: Principal for accessing the Hive cluster. The value is hive/hadoop.hadoop.com@HADOOP.COM.
        • hadoop.security.authentication: security authentication for accessing Hadoop. The value is KERBEROS.
        • hadoop.kerberos.keytab: keytab for accessing the Hadoop cluster. The value is the path of the ${BIGDATA_HOME}/FusionInsight_Doris_*/install/FusionInsight-Doris-*/doris-be/bin/doris.keytab file.
        • hadoop.kerberos.principal: Principal for accessing the Hadoop cluster. The value is doris/hadoop.hadoop.com@HADOOP.COM.
        • java.security.krb5.conf: krb5 file. The value is the path of the ${BIGDATA_HOME}/FusionInsight_BASE_*/1_*_KerberosClient/etc/krb5.conf file.
      • Kerberos authentication is disabled for the cluster (the cluster is in normal mode):

        hadoop.username: username for accessing the Hadoop cluster. The value is hdfs.

    • Hive table data is stored in OBS. Run the following command to create a catalog. For details about related parameter values, see 1.

      CREATE CATALOG hive_obs_catalog PROPERTIES (

      'type'='hms',

      'hive.version' = '3.1.0',

      'hive.metastore.uris' = 'thrift://192.168.67.161:21088',

      'obs.access_key' = 'AK',

      'obs.secret_key' = 'SK',

      'obs.endpoint' = 'Endpoint address of the OBS parallel file system ',

      'obs.region' = 'sa-fb-1'

      );

  4. Query the Hive table:

    • Query catalogs:

      show catalogs;

    • Query the databases in the catalog:

      show databases from hive_catalog;

    • Switch the catalog and access the database:

      switch hive_catalog;

      use default;

    • Query all tables in a database in the catalog:

      show tables from `hive_catalog`.`default`;

      Query a specified table:

      select * from `hive_catalog`.`default`.`test_table`;

      View the schema of the table:

      DESC test_table;

  5. After creating or operating a Hive table, you need to refresh the table in Doris.

    refresh catalog hive_catalog;

  6. Perform an associated query with tables in other data catalog:

    SELECT h.h_shipdate FROM hive_catalog.default.htable h WHERE h.h_partkey IN (SELECT p_partkey FROM internal.db1.part) LIMIT 10;

    NOTE:
    • Identify a table with catalog.database.table full restriction, for example, internal.db1.part.
    • catalog and database can be omitted. If omitted, the catalog and database switched to by SWITCH and USE are used.
    • You can run the INSERT INTO command to insert table data in the Hive catalog to an internal table in the internal catalog.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback