Interconnecting Spark with OBS Using an IAM Agency
After configuring decoupled storage and compute for a cluster by referring to Interconnecting an MRS Cluster with OBS Using an IAM Agency, you can create tables with OBS paths as their location on the Spark client.
Verifying OBS Access with Spark Beeline
- Log in to FusionInsight Manager and choose Cluster > Services > Spark2x > Configurations > All Configurations.
In the left navigation tree, choose JDBCServer2x > Customization. Add dfs.namenode.acls.enabled to the spark.hdfs-site.customized.configs parameter and set its value to false.
Figure 1 Adding Spark custom parameters
- Search for the spark.sql.statistics.fallBackToHdfs parameter and set its value to false.
Figure 2 Setting spark.sql.statistics.fallBackToHdfs
- Save the configurations and restart the JDBCServer2x instance.
- Log in to the client installation node as the client installation user.
- Run the following commands to configure environment variables:
source Client installation directory/bigdata_env
- For a security cluster, run the following command to perform user authentication. If Kerberos authentication is not enabled for the current cluster, you do not need to run this command.
kinit Username
- Access OBS using Spark beeline. The following example creates a table named test in the obs://mrs-word001/table/ directory.
create table test(id int) location 'obs://mrs-word001/table/';
- Run the following command to query all tables. If table test is returned, OBS access is successful.
show tables;
Figure 3 Returned table names
- Press Ctrl+C to exit Spark beeline.
Verifying OBS Access with Spark SQL
- Log in to the client installation node as the client installation user.
- Run the following commands to configure environment variables:
source Client installation directory/bigdata_env
- Modify the configuration file:
vim Client installation directory/Spark2x/spark/conf/hdfs-site.xml
<property> <name>dfs.namenode.acls.enabled</name> <value>false</value> </property>
- For a security cluster, run the following command to perform user authentication. If Kerberos authentication is not enabled for the current cluster, you do not need to run this command.
kinit Username
- Access OBS using Spark SQL CLI. For example, create a table named test in the obs://mrs-word001/table/ directory.
- Run the show tables; command to confirm that the table is created successfully.
- Run exit; to exit the Spark SQL CLI.
If a large number of logs are printed in the OBS file system, read and write performance may be affected. You can adjust the log level of the OBS client as follows:
cd Client installation directory/Spark2x/spark/conf
vi log4j.properties
Add the OBS log level configuration to the file as follows:
log4j.logger.org.apache.hadoop.fs.obs=WARN
log4j.logger.com.obs=WARNFigure 4 Adding an OBS log level
Using Spark Shell to Read OBS Files
- Log in to the client installation node as the client installation user.
- Run the following commands to configure environment variables:
source Client installation directory/bigdata_env
- Modify the configuration file:
vim Client installation directory/Spark2x/spark/conf/hdfs-site.xml
<property> <name>dfs.namenode.acls.enabled</name> <value>false</value> </property>
- For a security cluster, run the following command to perform user authentication. If Kerberos authentication is not enabled for the current cluster, you do not need to run this command.
kinit Username
- Create an OBS file.
- Run the following commands to log in to the Spark SQL CLI:
cd Client installation directory/Spark2x/spark/bin
./spark-sql
- Run the following commands to create a table and import data to the table:
create database test location "obs://Parallel file system path/test";
use test;
create table test1(a int,b int) using parquet;
insert into test1 values(1,2);
desc formatted test1;
Figure 5 Checking the location of the table
- Run the following commands to log in to the Spark SQL CLI:
- Run the following command to go to the Spark bin directory:
cd Client installation directory/Spark2x/spark/bin
Run ./spark-sql to log in to the Spark SQL CLI.
- In the Spark Shell CLI, run the following command to query the table created in 5.b:
spark.read.format("parquet").load ("obs://Parallel file system path/test1").show();
Figure 6 Viewing table data
- Run the :quit command to exit the Spark Shell CLI.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot