Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Configuring a Co-deployed Hive Data Source

Updated on 2024-11-29 GMT+08:00

Scenario

Add a Hive data source that is in the same Hadoop cluster as HetuEngine on HSConsole.

  • Currently, HetuEngine supports data sources of the following data formats: AVRO, TEXT, RCTEXT, ORC, Parquet, and SequenceFile.
  • When HetuEngine interconnects with Hive, you cannot specify multiple delimiters during table creation. However, if the MultiDelimitSerDe class is specified as the serialization class for a Hive data source to create a multi-delimiter table in text format, you can query the table using HetuEngine.
  • The Hive data source interconnected with HetuEngine supports Hudi table redirection. Hudi table access requests are redirected to the Hudi connector, so the advanced functions of the Hudi connector are available. To use this function, you need to configure the target Hudi data source, ensure that the Metastore URL of the Hudi data source is the same as that of the current Hive data source, and enable Hudi redirection for the Hive data source.
NOTE:

During HetuEngine installation, the co-deployed Hive data source is interconnected by default. The data source name is hive and cannot be deleted. Some default configurations, such as the data source name, data source type, server principal, and client principal, cannot be modified. When the environment configuration changes, for example, the local domain name of the cluster is changed, restarting the HetuEngine service can automatically synchronize the configurations of the co-deployed Hive data source, such as server principal and client principal.

Prerequisites

  • A HetuEngine compute instance has been created.
  • To use the isolation function of Hive Metastore, you need to configure HIVE_METASTORE_URI_HETU on Hive and restart the Hsbroke instance of the HetuEngine service to update the Hive Metastore URI.

Procedure

  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine.
  2. In the Basic Information area on the Dashboard page, click the link next to HSConsole WebUI.
  3. On HSConsole, choose Data Source. Locate the row that contains the target Hive data source, click Edit in the Operation column, and modify the configurations. The following table describes data source configurations that can be modified.

    Parameter

    Description

    Example Value

    Enable Data Source Authentication

    Whether to use the permission policy of the Hive data source for authentication. After this function is enabled, HetuEngine uses SQL standard-based Hive authorization.

    • Clusters with Kerberos authentication disabled (in normal mode): HetuEngine uses the default Hive authorization. This parameter is unavailable.
    • Clusters with Kerberos authentication enabled (in security mode): When Ranger is enabled, HetuEngine additionally uses Ranger authentication in addition to the default Hive authorization. If this function is enabled, Ranger authentication is added on the basis of SQL standard-based Hive authorization. When Ranger is disabled, HetuEngine uses only SQL standard-based Hive authorization.

    No

    Hudi Redirection

    This parameter is available only when the Metastore URL of the target Hudi data source is the same as that of the current Hive data source.

    This function redirects Hudi table access request to the Hudi connector, so the advanced functions of the Hudi connector can be used.

    No

    Hudi Data Source

    This parameter is required for Hudi redirection.

    All configured Hudi data sources are displayed in the drop-down list box. Select only the Hudi data source that has the same Metastore URL.

    -

    Enable Connection Pool

    Whether to enable the connection pool when accessing Hive MetaStore. The default value is Yes

    Yes

    Maximum Connections

    Maximum number of connections in the connection pool when accessing Hive MetaStore.

    50 (Value range: 20–200)

  4. (Optional) If you need to add Custom Configuration, complete the configurations by referring to 6.g and click OK to save the configurations.

Data Type Mapping

Currently, Hive data sources support the following data types: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, REAL, DOUBLE, DECIMAL, NUMERIC, DEC, VARCHAR, VARCHAR (X), CHAR, CHAR (X), STRING, DATE, TIMESTAMP, TIME WITH TIMEZONE, TIMESTAMP WITH TIME ZONE, TIME, ARRAY, MAP, STRUCT, and ROW.

Performance Optimization

  • Metadata caching

    Hive connectors support metadata caching to provide metadata requests for various operations faster. For details, see Adjusting Metadata Cache.

  • Dynamic filtering

    Enabling dynamic filtering helps optimize the calculation of the Join operator of Hive connectors. For details, see Enabling Dynamic Filtering.

  • Query with partition conditions

    Creating a partitioned table and querying data with partition filter criteria help filter out some partition data, improving performance.

  • INSERT statement optimization

    You can improve insert performance by setting task.writer-count to 1 and choosing a larger value for hive.max-partitions-per-writers. For details, see Optimizing INSERT Statements.

Constraints

  • The DELETE syntax can be used to delete data from an entire table or a specified partition in a partitioned table.
  • The Hive metabase does not support schema renaming, that is, the ALTER SCHEMA RENAME syntax is not supported.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback