Updated on 2022-08-16 GMT+08:00

Python Example Code

Function

Use Spark to call an HBase API to operate HBase table1 and write the data analysis result of table1 to HBase table2.

Example Code

PySpark does not provide HBase related APIs. In this example, Python with the Java programming language invoked is used.

The following code snippets are used as an example. For complete codes, see SparkHbasetoHbasePythonExample.

# -*- coding:utf-8 -*-

from py4j.java_gateway import java_import
from pyspark.sql import SparkSession

# Create the SparkContext and set kryo serialization. 
spark = SparkSession\
        .builder\
        .appName("SparkHbasetoHbase") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.kryo.registrator", "com.huawei.bigdata.spark.examples.MyRegistrator") \
        .getOrCreate()

# Import the class that will run into sc._jvm. 
java_import(spark._jvm, 'com.huawei.bigdata.spark.examples.SparkHbasetoHbase')

# Create a class instance and invoke the method.
spark._jvm.SparkHbasetoHbase().hbasetohbase(spark._jsc)

# Stop the SparkSession
spark.stop()