Obtaining the Output of an SQL Node
This section describes how to obtain the output of an SQL node and apply the output to subsequent nodes or judgment in job development.
Scenario
When you use EL expression #{Job.getNodeOutput("Name of the previous node")} to obtain the output of the previous node, the output is a two-dimensional array, for example, [["Dean",...,"08"],...,["Smith",...,"53"]]. To obtain the values in the array, use either of the methods provided in Table 1.
Method |
Key Configuration |
Application Scenario Requirements |
---|---|---|
If the output of the SQL node contains only one field, for example [["11"]], you can use the StringUtil EL expression with an embedded object to split the two-dimensional array and obtain the field value in the output of the previous node. #{StringUtil.split(StringUtil.split(StringUtil.split(Job.getNodeOutput("Name of the previous node"),"]")[0],"[")[0],"\\"")[0]} |
This method is easy to use but has the following requirements on application scenarios:
|
|
Use the For Each node to cyclically obtain the values in the two-dimensional array in the dataset.
|
This method is applicable to more scenarios, though jobs need to be split into main jobs and subjobs. |
Obtaining Output Value Using StringUtil
Scenario
The StringUtil EL expression with an embedded object is used to split the two-dimensional array result and obtain the output field value of the previous node, which is a string.
To make it easy to view the obtained value, this example uses the Kafka Client node. In practice, you can select a subsequent node type as needed. By using a StringUtil EL expression with an embedded object on the node, you can obtain the data value returned by the previous node.
#{StringUtil.split(StringUtil.split(StringUtil.split(Job.getNodeOutput("count95"),"]")[0],"[")[0],"\\"")[0]}
Configuration Method
- Log in to the DataArts Studio console, locate the target DataArts Studio instance, and click Access on the instance card.
- Click the Workspaces tab. In the workspace list, locate the target workspace and click DataArts Factory.
- Create table student_score. Create a temporary Hive SQL script, select a Hive connection and database, paste the following SQL statement, and run the script. After the script is successfully executed, delete it.
CREATE TABLE `student_score` (`name` String COMMENT '', `score` INT COMMENT ''); INSERT INTO student_score VALUES ('ZHAO', '90'), ('QIAN', '88'), ('SUN', '93'), ('LI', '94'), ('ZHOU', '85'), ('WU', '79'), ('ZHENG', '87'), ('WANG', '97'), ('FENG', '83'), ('CEHN', '99');
- Create the Hive SQL script to be invoked by the MRS Hive SQL node. Create a Hive SQL script named count95, select a Hive connection and database, paste the following SQL statement, and submit a version.
--Obtain the number of students whose scores are higher than 95 from the student_score table.-- SELECT count(*) FROM student_score WHERE score> "95" ;
- On the Develop Job page, create a data development job. Drag an MRS Hive SQL node and a Kafka Client node and drop them on the canvas. Click and hold to connect the nodes, as shown in Figure 1.
- Configuring parameters for an MRS Hive SQL node Select the count95 script submitted in Step 4 for SQL script and select a Hive connection and database.
Figure 2 Configuring parameters for an MRS Hive SQL node
- Configure parameters for the Kafka Client node. Set Sent Content to #{StringUtil.split(StringUtil.split(StringUtil.split(Job.getNodeOutput("count95"),"]")[0],"[")[0],"\\"")[0]} and select a Kafka connection and a topic name.
Figure 3 Configuring parameters for the Kafka Client node
- After the node configuration is complete, click Test. After the job test is successful, right-click the Kafka Client node to view its log. You can find that the two-dimensional array [["2"]] returned by the MRS Hive SQL node has been converted to 2.
You can set Sent Content of the Kafka Client node to #{Job.getNodeOutput("count95")} and run the job. Then you can view the log of the Kafka Client node to verify that the result returned by the MRS Hive SQL node is two-dimensional array [["2"]].
Figure 4 Check the Kafka Client node logs.
Obtaining Output Values Using the For Each Node
Scenario
You can use the For Each node and the EL expression #{Loop.current[0]} with a Loop embedded object to cyclically obtain the output values of the previous node.
To make it easy to view the obtained values, this example uses the Kafka Client node as the subjob node of the For Each node. In practice, you can select a subjob node type as needed. By using an EL expression with an embedded Loop object on the node, you can obtain the values returned by the previous node of the For Each node.
- Dataset: Enter the execution result of the select statement on the Hive SQL node. Use the #{Job.getNodeOutput("select95")} expression, where select95 is the name of the previous node.
- Subjob Parameter Name: Enter the parameter name defined in the subjob. Transfer the parameter value defined in the main job to the subjob. Set the subjob parameter names to name and score, whose values are those in the first and second columns in the dataset, respectively. EL expressions #{Loop.current[0]} and #{Loop.current[1]} are used.
For the subjobs selected for the For Each node, you must set their parameter names so that the main job can identify the parameter definitions.
Configuration Method
Developing a Subjob
- Log in to the DataArts Studio console, locate the target DataArts Studio instance, and click Access on the instance card.
- Click the Workspaces tab. In the workspace list, locate the target workspace and click DataArts Factory.
- On the Develop Job page, create a data development subjob named EL_test_slave. Select a Kafka Client node, configure job parameters, and orchestrate the job shown in Figure 6.
Set the parameter name to name and score. This parameter is only used by the For Each node in the main job to identify subjob parameters. You do not need to set the parameter value.
- Configure parameters for the Kafka Client node. Set Sent Content to ${name}: ${score} and select a Kafka connection and a topic name.
Do not use the #{Job.getParam("job_param_name")} EL expression because this expression can only obtain the values of the parameters configured in the current job, but cannot obtain the parameter values transferred from the parent job or the global variables configured in the workspace. The expression only works for the current job.
To obtain the parameter values passed from the parent job and the global variables configured for the workspace, you are advised to use the ${job_param_name} expression.
Figure 7 Configuring parameters for the Kafka Client node
- Submit the subjob after the configuration is complete.
Developing a Main Job
- Go to the Develop Script page.
- Create table student_score. Create a temporary Hive SQL script, select a Hive connection and database, paste the following SQL statement, and run the script. After the script is successfully executed, delete it.
CREATE TABLE `student_score` (`name` String COMMENT '', `score` INT COMMENT ''); INSERT INTO student_score VALUES ('ZHAO', '90'), ('QIAN', '88'), ('SUN', '93'), ('LI', '94'), ('ZHOU', '85'), ('WU', '79'), ('ZHENG', '87'), ('WANG', '97'), ('FENG', '83'), ('CEHN', '99');
- Create the Hive SQL script to be invoked by the MRS Hive SQL node. Create a Hive SQL script named select95, select a Hive connection and database, paste the following SQL statement, and submit a version.
--Display the names and scores of students whose scores are higher than 95 in the student_score table.-- SELECT * FROM student_score WHERE score> "95" ;
- On the Develop Job page, create a data development job named EL_test_master. Drag a HIVE SQL node and a For Each node and drop them on the canvas. Click and hold to connect the nodes, as shown in Figure 5.
- Configure parameters for the MRS Hive SQL node. Select the select95 script submitted in Step 3 for SQL script and select a Hive connection and database.
Figure 8 Configuring parameters for an MRS Hive SQL node
- Configure properties for the For Each node.
- Subjob in a Loop: Select EL_test_slave, the subjob that has been developed.
- Dataset: Enter the execution result of the select statement on the Hive SQL node. Use the #{Job.getNodeOutput("select95")} expression, where select95 is the name of the previous node.
- Subjob Parameter Name: Enter the parameter name defined in the subjob. Transfer the parameter value defined in the main job to the subjob. Set the subjob parameter names to name and score, whose values are those in the first and second columns in the dataset, respectively. EL expressions #{Loop.current[0]} and #{Loop.current[1]} are used.
Figure 9 Configuring properties for the For Each node
- Save the job.
Testing the Main Job
- Click Test above the main job EL_test_master canvas to test the job. After the main job is executed, the subjob EL_test_slave is cyclically invoked through the For Each node and executed.
- In the navigation pane on the left, choose Monitor Instance to view the job execution result.
- After the job is executed, view the cyclic execution result of the subjob EL_test_slave on the Monitor Instance page.
Figure 10 Execution result of the subjob
- View the log of the cyclic execution of subjob EL_test_slave. The log shows that the output values of the previous node of the For Each node was obtained through the For Each node and the EL expression with a Loop embedded object.
Figure 11 Viewing the log
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot