Accessing Alluxio Using a Data Application
The port number used for accessing the Alluxio file system is 19998, and the access address is alluxio://<Master node IP address of Alluxio>:19998/<PATH>. This section uses examples to describe how to access the Alluxio file system using data applications (Spark, Hive, Hadoop MapReduce, and Presto).
Using Alluxio as the Input and Output of a Spark Application
- Log in to the Master node in a cluster as user root using the password set during cluster creation.
- Run the following command to configure environment variables:
source /opt/client/bigdata_env
- If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:
kinit MRS cluster user
Example: kinit admin
- Prepare an input file and copy local data to the Alluxio file system.
For example, prepare the input file test_input.txt in the local /home directory, and run the following command to save the test_input.txt file to Alluxio:
alluxio fs copyFromLocal /home/test_input.txt /input
- Run the following commands to start spark-shell:
spark-shell
- Run the following command in spark-shell (replace <Master node IP address of Alluxio> with the actual IP address):
val s = sc.textFile("alluxio://<Master node IP address of Alluxio>:19998/input")
val double = s.map(line => line + line)
double.saveAsTextFile("alluxio://<Master node IP address of Alluxio>:19998/output")
- Run the alluxio fs ls / command to check whether the output directory /output containing double content of the input file exists in the root directory of Alluxio.
Creating a Hive Table on Alluxio
- Log in to the Master node in a cluster as user root using the password set during cluster creation.
- Run the following command to configure environment variables:
source /opt/client/bigdata_env
- If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:
kinit MRS cluster user
Example: kinit admin
- Prepare an input file. For example, prepare the hive_load.txt input file in the local /home directory. The file content is as follows:
1, Alice, company A 2, Bob, company B
- Run the following command to import the hive_load.txt file to Alluxio:
alluxio fs copyFromLocal /home/hive_load.txt /hive_input
- Run the following command to start the Hive beeline:
beeline
- Run the following command (replace <Master node IP address of Alluxio> with the actual IP address) in the beeline to create a table based on the input file in Alluxio:
>CREATE TABLE u_user(id INT, name STRING, company STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
>LOAD DATA INPATH 'alluxio://<Master node IP address of Alluxio>:19998/hive_input' INTO TABLE u_user;
- Run the following command to view the created table:
select * from u_user;
Running Hadoop Wordcount in Alluxio
- Log in to the Master node in a cluster as user root using the password set during cluster creation.
- Run the following command to configure environment variables:
source /opt/client/bigdata_env
- If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:
kinit MRS cluster user
Example: kinit admin
- Prepare an input file and copy local data to the Alluxio file system.
For example, prepare the input file test_input.txt in the local /home directory, and run the following command to save the test_input.txt file to Alluxio:
alluxio fs copyFromLocal /home/test_input.txt /input
- Run the wordcount job using yarn jar. (Replace <Master node IP address of Alluxio>, <Hadoop version>, and <MRS cluster version> with the actual values.)
yarn jar /opt/share/hadoop-mapreduce-examples-<Hadoop version>-mrs-<MRS cluster version>/hadoop-mapreduce-examples-<Hadoop version>-mrs-<MRS cluster version>.jar wordcount alluxio://<Master node IP address of Alluxio>:19998/input alluxio://<Master node IP address of Alluxio>:19998/output
- Run the alluxio fs ls / command to check whether the output directory /output containing the wordcount result exists in the root directory of Alluxio.
Using Presto to Query Tables in Alluxio
- Log in to the Master node in a cluster as user root using the password set during cluster creation.
- Run the following command to configure environment variables:
source /opt/client/bigdata_env
- If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:
kinit MRS cluster user
Example: kinit admin
- Start the Hive beeline to create a table in Alluxio. (Replace <Master node IP address of Alluxio> with the actual IP address.)
beeline
>CREATE TABLE u_user (id int, name string, company string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'alluxio://<Master node IP address of Alluxio>:19998/u_user';
>insert into u_user values(1,'Alice','Company A'),(2, 'Bob', 'Company B');
- Start the Presto client. For details, see 2 to 8 in Using a Client to Execute Query Statements.
- On the Presto client, run the select * from hive.default.u_user; statement to query the table created in Alluxio: Figure 1 Using Presto to query the table created in Alluxio
Last Article: Configuring an Underlying Storage System
Next Article: Common Operations of Alluxio
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.