Using a Notebook Instance to Develop and Debug Code Online

This section describes how to develop tasks using a notebook instance.

Prerequisites

There is a running notebook instance. For how to create a notebook instance, see Creating a Notebook Instance.

Constraints

You can create a maximum of 10 notebooks in the current workspace.
Only MRS Spark and Fabric SQL data types are supported.
You can only use the notebook instances you created to develop notebook jobs.

Developing a Notebook

On the DataArts Factory page, click the notebook instance name in the lower right corner. The names of all notebook instances are displayed. Click the name of a notebook instance. The notebook environment directory tree is displayed under the job directory tree in the middle of the DataArts Factory page.

Notebook environment directories are visible only to yourself.
Right-click My Files and select New Notebook. In the displayed dialog box, enter the file name and select the path for storing the notebook.
- The file name can contain only letters, digits, hyphens (-), underscores (_), and periods (.). It can contain a maximum of 64 characters.
- The default Path is /My Files.
- Click or right-click My Files to create a folder. Then, right-click the blank area in the folder and select New Notebook.
  - The folder name can contain only letters, digits, hyphens (-), and underscores (_). The folder name can contain a maximum of 64 characters.
  - The file path and file name can contain a maximum of 768 characters in total.
Figure 1 Creating a notebook
Click OK. The notebook development page is displayed.
On the notebook development page, enter and debug development code. You can select a desired compute engine in the upper right corner.
- By default, the Python engine is used, which does not depend on compute resources.
- You can also choose the Fabric SQL or MRS Spark compute engine. Click the default Python engine to view information about the created compute engines. You can choose the compute engine you need. A notebook file supports only one compute engine. After you select a compute engine, all cells in the notebook file use this compute engine.
- You need to bind compute resources to compute engines. For details about how to bind compute resources, see Binding Compute Resources.
- The following magic commands are supported: %mrs_spark, %%spark, and %aura_frame.
Click Save to save the developed code.
Click and select Run cell. The code of the entire cell will be executed. After the execution is complete, if a message is displayed indicating that the execution is successful, you can view the execution result. If the code execution fails, you can view the possible failure cause in the execution result.
```
%spark info
```
Figure 2 Execution result example 1
```
%%spark
spark.sql("SHOW TABLES").show()
```
Figure 3 Execution result example 2
```
a = "123111" * 2
print(a)
a = "12311" * 3
print(a)
a = "123eee" * 5
print(a)
a = "123aa1122" * 2
print(a)
a = "123absg3wut235456&&&" * 2
print(a)
```
Figure 4 Execution result example 3

Figure 5 Execution result example 4
```
%%spark config
{
    "driverMemory": "2G",
    "driverCores": 1,
    "executorMemory": "2G",
    "executorCores": 1,
    "conf": {
        "spark.pyspark.python": "./temp_env/bin/python",
        "spark.yarn.dist.archives":"obs://notebook-dataarts/python_env/pyspark_conda_env.tar.gz#temp_env",
        "spark.jars": "obs://notebook-dataarts/mrs_jars/spark-utils-1.0-SNAPSHOT.jar"
    }
}
```
Figure 6 Execution result example 5
```
%%spark
import pandas
print(pandas.__version__)
```
Create a package that contains pandas so that pandas can be referenced.

Figure 7 Execution result example 6
```
%mrs_spark
```
Figure 8 Execution result example 7
```
%%spark
import my_script
print(my_script.add_numbers(10, 20))
rdd = sc.parallelize(["apple", "banana", "cherry"])
result = rdd.map(my_script.format_string).collect()
for item in result:
    print(item)
```
Figure 9 Execution result example 8
- Python is displayed in the upper right corner of each cell.
- The following operations are supported: Run All (running all cells), Save (saving all cells), Submit to Project Directory, New Cell (creating a cell), Clear Outputs (clearing all cells), and More (Restart Kernel and Kill Kernel). These operations apply to the entire notebook file.
- You can copy, paste, and cut cells, add a cell above or below the current cell, move a cell up or down, and clear execution results.
- You can debug part of the code in a cell. Select part of the code, click , and select Run selected text to run the selected code. You can also run code in the following ways:
  - Select Run all above to run all cells above the current cell.
  - Select Run all below to run the current cell and all cells below it.
- : Delete a cell.
- : Maximize the code development window. After the window is maximized, only the current cell is displayed.
- : Minimize the code development window. After the window is minimized, you can view all cells.
- : After maximizing the window, click to view the previous cell.
- : After maximizing the window, click to view the next cell.