El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Configuring Spark to Read HBase Data

Updated on 2024-12-13 GMT+08:00

Scenario

Spark on HBase allows users to query HBase tables in Spark SQL and to store data for HBase tables by using the Beeline tool. You can use HBase APIs to create, read data from, and insert data into tables.

Spark On HBase

  1. Log in to Manager and choose Cluster > Cluster Properties to check whether the cluster is in security mode.

    • If yes, go to 2.
    • If no, go to 5.

  2. Choose Cluster > Services > Spark2x. Click Configurations, click All Configurations, click JDBCServer2x, select Default, and modify the following parameter:

    Table 1 Parameter list 1

    Parameter

    Default Value

    Changed To

    spark.yarn.security.credentials.hbase.enabled

    false

    true

    NOTE:

    To ensure that Spark2x can access HBase for a long time, do not modify the following parameters of the HBase and HDFS services:

    • dfs.namenode.delegation.token.renew-interval
    • dfs.namenode.delegation.token.max-lifetime
    • hbase.auth.key.update.interval
    • hbase.auth.token.max.lifetime (The value is fixed to 604800000 ms, that is, 7 days.)

    If the preceding parameter configuration must be modified based on service requirements, ensure that the value of the HDFS parameter dfs.namenode.delegation.token.renew-interval is not greater than the values of the HBase parameters hbase.auth.key.update.interval, hbase.auth.token.max.lifetime, and dfs.namenode.delegation.token.max-lifetime.

  3. Choose SparkResource2x > Default and modify the following parameters.

    Table 2 Parameter list 2

    Parameter

    Default Value

    Changed To

    spark.yarn.security.credentials.hbase.enabled

    false

    true

  4. Restart the Spark2x service for the configuration to take effect.

    NOTE:

    To use the Spark on HBase function on the Spark2x client, you need to download and install the Spark2x client again.

  5. On the Spark2x client, use the spark-sql or spark-beeline connection to query tables created by Hive on HBase. You can create an HBase table by running SQL commands or create an external table to associate the HBase table. Before creating tables, ensure that HBase tables exist in HBase. The HBase table table1 is used as an example.

    1. Run the following commands to create the HBase table using the Beeline tool:

      create table hbaseTable

      (

      id string,

      name string,

      age int

      )

      using org.apache.spark.sql.hbase.HBaseSource

      options(

      hbaseTableName "table1",

      keyCols "id",

      colsMapping "

      name=cf1.cq1,

      age=cf1.cq2

      ");

      NOTE:
      • hbaseTable: name of the created Spark table
      • id string,name string, age int: field name and field type of the Spark table
      • table1: name of the HBase table
      • id: row key column name of the HBase table
      • name=cf1.cq1, age=cf1.cq2: mapping between columns in the Spark table and columns in the HBase table. The name column of the Spark table maps the cq1 column in the cf1 column family of the HBase table, and the age column of the Spark table maps the cq2 column in the cf1 column family of the HBase table.
    2. Import data to the HBase table using a CSV file.

      hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf1:cq1,cf1:cq2,cf1:cq3,cf1:cq4,cf1:cq5 table1 /hperson

      table1 indicates the name of the HBase table and /hperson indicates the path where the CSV file is stored.

    3. Query data in spark-sql or spark-beeline. hbaseTable is the corresponding Spark table name. The command is as follows:

      select * from hbaseTable;

Spark on HBaseV2

  1. Log in to Manager and choose Cluster > Cluster Properties to check whether the cluster is in security mode.

    • If yes, go to 2.
    • If no, go to 5.

  1. Click Cluster and click the name of the desired cluster. Choose Service > Spark2x, click Configurations, click All Configurations, and choose JDBCServer2x > Default. Modify the following parameter.

    Table 3 Parameter list 1

    Parameter

    Default Value

    Changed To

    spark.yarn.security.credentials.hbase.enabled

    false

    true

    NOTE:

    To ensure that Spark2x can access HBase for a long time, do not modify the following parameters of the HBase and HDFS services:

    • dfs.namenode.delegation.token.renew-interval
    • dfs.namenode.delegation.token.max-lifetime
    • hbase.auth.key.update.interval
    • hbase.auth.token.max.lifetime (The value is fixed to 604800000 ms, that is, 7 days.)

    If the preceding parameter configuration must be modified based on service requirements, ensure that the value of the HDFS parameter dfs.namenode.delegation.token.renew-interval is not greater than the values of the HBase parameters hbase.auth.key.update.interval, hbase.auth.token.max.lifetime, and dfs.namenode.delegation.token.max-lifetime.

  2. Choose SparkResource2x > Default and modify the following parameters.

    Table 4 Parameter list 2

    Parameter

    Default Value

    Changed To

    spark.yarn.security.credentials.hbase.enabled

    false

    true

  3. Restart the Spark2x service for the configuration to take effect.

    NOTE:

    If you need to use the Spark on HBase function on the Spark2x client, download and install the Spark2x client again.

  4. On the Spark2x client, use the spark-sql or spark-beeline connection to query tables created by Hive on HBase. You can create an HBase table by running SQL commands or create an external table to associate the HBase table. For details, see the following description. The following uses the HBase table table1 as an example.

    1. Create a table using the spark-beeline tool.

      create table hbaseTable1

      (id string, name string, age int)

      using org.apache.spark.sql.hbase.HBaseSourceV2

      options(

      hbaseTableName "table2",

      keyCols "id",

      colsMapping "name=cf1.cq1,age=cf1.cq2");

      NOTE:
      • hbaseTable1: name of the created Spark table
      • id string,name string, age int: field name and field type of the Spark table
      • table2: name of the HBase table
      • id: row key column name of the HBase table
      • name=cf1.cq1, age=cf1.cq2: mapping between columns in the Spark table and columns in the HBase table. The name column of the Spark table maps the cq1 column in the cf1 column family of the HBase table, and the age column of the Spark table maps the cq2 column in the cf1 column family of the HBase table.
    2. Import data to the HBase table using a CSV file.

      hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf1:cq1,cf1:cq2,cf1:cq3,cf1:cq4,cf1:cq5 table2 /hperson

      table2 indicates the name of the HBase table and /hperson indicates the path where the CSV file is stored.

    3. Query data in spark-sql or spark-beeline. hbaseTable1 indicates the corresponding Spark table name.

      select * from hbaseTable1;

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback