Notebook
Functions
The Notebook node is used to execute a Notebook job predefined in DLI.
Constraints
This function depends on OBS.
Parameters
Table 1 and Table 2 describe the parameters of the Notebook node.
Parameter |
Mandatory |
Description |
---|---|---|
Node Name |
Yes |
Name of a node. The name must contain 1 to 128 characters, including only letters, numbers, underscores (_), hyphens (-), slashes (/), less-than signs (<), and greater-than signs (>). |
Spark Job Name |
Yes |
Name of the DLI Spark job. The name must contain 1 to 64 characters, including only letters, numbers, and underscores (_). The default value is the same as the node name. |
Data Lake Insight Queue |
Yes |
Select a queue from the drop-down list box.
NOTE:
|
Job Type |
No |
You can set this parameter as needed after selecting a DLI queue. Type of the Spark image used by the job. The following options are available:
|
Spark Versions |
Yes |
This parameter is mandatory when a DLI queue is selected. Select a Spark version. |
Job Running Resources |
No |
Select the running resource specifications of the job.
|
Input directory |
Yes |
Select a path in the OBS bucket for running the notebook file. The absolute path of the input directory can contain a maximum of 1,024 characters. |
Input Notebook File |
Yes |
Select the notebook file in the OBS input directory. The file is in .ipynb format. The absolute path can contain a maximum of 2,048 characters. |
Notebook File Output Directory |
Yes |
Select a path in the OBS bucket for storing the running result of the notebook file. The absolute path of the output directory can contain a maximum of 1,024 characters. |
Output Notebook File Name |
Yes |
Enter the name of the output notebook file. The name can contain a maximum of 256 characters. The file is in .ipynb format. |
Input Notebook Job Parameters |
No |
Configure the parameters for running the notebook job. |
Spark program resource package |
No |
Enter a parameter in the format of key=value. Press Enter to separate multiple key-value pairs. For details about the parameters, see Spark Configuration. These parameters can be replaced by global variables. For example, if you create a global variable custom_class on the Global Configuration > Global Variables page, you can use "spark.sql.catalog"={{custom_class}} to replace a parameter with this variable after the job is submitted.
NOTE:
The JVM garbage collection algorithm cannot be customized for Spark jobs. |
Parameter |
Mandatory |
Description |
---|---|---|
Node Status Polling Interval (s) |
Yes |
How often the system check completeness of the node. The value ranges from 1 to 60 seconds. |
Max. Node Execution Duration |
Yes |
Execution timeout interval for the node. If retry is configured and the execution is not complete within the timeout interval, the node will be executed again. |
Retry upon Failure |
Yes |
Whether to re-execute a node if it fails to be executed. Possible values:
|
Policy for Handling Subsequent Nodes If the Current Node Fails |
Yes |
Operation that will be performed if the node fails to be executed. Possible values:
|
Enable Dry Run |
No |
If you select this option, the node will not be executed, and a success message will be returned. |
Task Groups |
No |
Select a task group. If you select a task group, you can control the maximum number of concurrent nodes in the task group in a fine-grained manner in scenarios where a job contains multiple nodes, a data patching task is ongoing, or a job is rerunning. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot