Interconnecting Hudi with OBS Using an IAM Agency

After configuring decoupled storage and compute for a cluster by referring to Interconnecting an MRS Cluster with OBS Using an IAM Agency, you can create Hudi COW tables in spark-shell and store them to OBS.

Interconnecting Hudi with OBS

Log in to the client installation node as the client installation user.
Run the following commands to configure environment variables:

source Client installation directory/bigdata_env

source Client installation directory/Hudi/component_env
Modify the configuration file:

vim Client installation directory/Hudi/hudi/conf/hdfs-site.xml
```
<property>
<name>dfs.namenode.acls.enabled</name>
<value>false</value>
</property>
```
For a security cluster, run the following command to perform user authentication. If Kerberos authentication is not enabled for the current cluster, you do not need to run this command.

kinit Username
Start spark-shell and run the following commands to create a COW table and save it in OBS:

import org.apache.hudi.QuickstartUtils._

import scala.collection.JavaConversions._

import org.apache.spark.sql.SaveMode._

import org.apache.hudi.DataSourceReadOptions._

import org.apache.hudi.DataSourceWriteOptions._

import org.apache.hudi.config.HoodieWriteConfig._

val tableName = "hudi_cow_table"

val basePath = "obs://testhudi/cow_table/"

val dataGen = new DataGenerator

val inserts = convertToStringList(dataGen.generateInserts(10))

val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))

df.write.format("org.apache.hudi").

options(getQuickstartWriteConfigs).

option(PRECOMBINE_FIELD_OPT_KEY, "ts").

option(RECORDKEY_FIELD_OPT_KEY, "uuid").

option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").

option(TABLE_NAME, tableName).

mode(Overwrite).

save(basePath);

"obs://testhudi/cow_table/" is the OBS path, and testhudi is the name of the parallel file system. Change them based on site requirements.
Use DataSource to check whether the table is created and whether the data is normal.

val roViewDF = spark.

read.

format("org.apache.hudi").

load(basePath + "/*/*/*/*")

roViewDF.createOrReplaceTempView("hudi_ro_table")

spark.sql("select * from hudi_ro_table").show()

Figure 1 Viewing table data
Run :q to exit the spark-shell CLI.