Developing a Python Script
This section describes how to develop and execute a Python script using DataArts Factory.
Preparing the Environment
- An ECS named ecs-dgc has been created.
In this example, the ECS uses the CentOS 8.0 64bit with ARM (40 GB) public image and the Python environment. You can log in to the ECS and run the python command to check the Python environment.
- You have enabled the DataArts Migration incremental package and created a CDM cluster named cdm-dlfpyhthon. The cluster provides an agent for the DataArts Factory module to communicate with the ECS.
- Ensure that the ECS can communicate with the CDM cluster, which depends on the following conditions:
- If the CDM cluster and the ECS are in the same region, VPC, subnet, and security group, they can communicate with each other by default. If they are in the same VPC but in different subnets or security groups, you must configure routing rules and security group rules. For details about how to configure security group rules, see configuring security group rules.
- If the CDM cluster and the ECS are in different regions, a public network or a dedicated connection is required for enabling communication between the CDM cluster and the cloud service. If the Internet is used for communication, ensure that an EIP has been bound to the CDM cluster, the host where the data source is located can access the Internet, and the port has been enabled in the firewall rules.
- The ECS and the CDM cluster belong to the same enterprise project. If they do not, you can modify the enterprise project of the workspace.
Constraints
- Python scripts do not support script parameters or job parameters.
Creating an ECS Data Connection
Before developing a Python script, you need to create a connection to the ECS.
- On the DataArts Studio console, locate a workspace and click Management Center.
- In the navigation pane, choose Manage Data Connections.
Figure 1 Manage Data Connections
- Click Create Data Connection.
Figure 2 Create Data Connection
- Configure parameters by referring to Table 1 and create a data connection named python_test.
Table 1 Host Connection Parameter
Mandatory
Description
Data Connection Name
Yes
Name of the host connection. The value can contain only letters, digits, hyphens (-), and underscores (_).
Tag
No
The attribute of the data connection to create. Tags make management easier. You can set a tag or select a tag created in Tags from the drop-down list.
NOTE:The tag name can contain letters, digits, and underscores (_), and cannot start with underscores (_). It can contain up to 100 characters.
Host Address
Yes
IP address of the Linux host
For details, see Viewing Details About an ECS.
Agent
Yes
Agents provided by the CDM cluster, which is required if Proxy connection is selected for Connection Type.
Port
Yes
SSH port number of the host
Username
Yes
Username of the host
Login Mode
Yes
Mode for logging in to the host
- Key pair
- Password
Key Pair
Yes
If you select Key pair for Login Mode, you need to obtain the private key file, upload it to OBS, and select the OBS path. This parameter is available only when Login Mode is set to Key pair.
NOTE:The uploaded private key file must be in PEM format, and the uploaded private key file and the public key configured on the host must be in the same key pair.
Key Pair Password
No
If no password is set for the key pair, you do not need to set this parameter.
Password
Yes
Password for logging in to the host.
KMS Key
Yes
Key created on Key Management Service (KMS) and used for encrypting and decrypting user passwords and key pairs. You can select a created key from KMS.
Host Connection Description
No
Description of the host connection
Figure 3 Creating a host connection
The key parameters are as follows:
- Host Address: Enter the IP address of the ECS.
- Agent: Select the CDM cluster.
- Click Test to test connectivity of the data connection. If the test passes, the data connection is created.
- After the test is successful, click OK. The system will create the data connection for you.
Developing a Python Script
- Choose DataArts Factory > Develop Script and create a Python script named python_test.
Figure 4 Creating a Python script
- Edit the Python statement in the editor, select the host connection, and click Submit and Unlock..
- This example defines a string template for saving company information and uses the template to output information about different companies.
template='No.:{:0>9s} \t CompanyName:{:s} \t Website:https://www.{:s}.com' context1=template.format('1','CompanyXXX','companyxxx') context2=template.format('2','CompanyYYY','companyyyy') print(context1) print(context2)
- The script development area in Figure 5 is a temporary debugging area. After you close the script tab, the development area will be cleared.
- Connection: Select the data connection created in Creating an ECS Data Connection.
- This example defines a string template for saving company information and uses the template to output information about different companies.
- Click Execute to execute the Python statement.
- View the script execution result.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.