Help Center> RES> API Reference> APIs (Old Version)> Job-related APIs> Submitting Streaming Training Jobs

Submitting Streaming Training Jobs

Function

This API is used to submitting streaming training jobs.

URI

POST /v1/{project_id}/stream-etl-job

Table 1 describes the URI parameters.

Table 1 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the request parameters.

Table 2 Request parameters

Parameter

Mandatory

Type

Description

workspace_id

No

String

Workspace ID. The default value is 0.

job_name

Yes

String

Training job name. The value can contain a maximum of 20 characters.

job_description

No

String

Training job description. The value can contain a maximum of 256 characters.

nearline_platform

Yes

JSON

Offline computing platform. For details, see Table 3.

strategy

Yes

JSON

Strategy information. For details, see Table 5.

Table 3 nearline_platform parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported.

platform_parameter

Yes

JSON

Platform parameter. For details, see Table 4.

computing_resource

No

String

Resource specifications required for the normal running of the DLI jobs.

config_load_path

Yes

String

OBS path that stores the files generated by the selected configurations

Table 4 platform_parameter parameters

Parameter

Mandatory

Type

Description

cluster_name

Yes

String

Cluster name.

cluster_id

No

String

Cluster ID.

Table 5 strategy parameters

Parameter

Mandatory

Type

Description

strategy_type

Yes

String

The optional value is nearline.

name

Yes

String

Strategy alias. The value can contain a maximum of 60 characters.

algorithm_type

Yes

String

Algorithm type. The option is as follows:

NEARLINE_ONLINE_TRAINING

parameter

Yes

JSON

Algorithm parameter. For details, see Table 6.

Table 6 parameter parameters

Parameter

Mandatory

Type

Description

data_source

Yes

JSON

Data source parameter. For details, see Table 7.

The standard recommendation data supported by the real-time streaming nearline job comes from List of User Behaviors.

data_source_config

Yes

JSON

Data source configuration. For details, see Table 10.

algorithm_config

Yes

JSON

Algorithm configuration. For details, see Table 11.

Table 7 data_source parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. Currently, only DIS is supported. The data required by the real-time nearline jobs is added to the DIS platform where RES reads the data for nearline computing tasks.

in_stream_conf

Yes

JSON

Platform parameter. For details, see Table 8.

out_stream_conf

Yes

JSON

Platform parameter. For details, see Table 9.

Table 8 in_stream_conf parameters

Parameter

Mandatory

Type

Description

stream_name

No

String

Name of the DIS stream. The stream is used to receive nearline behavior data.

starting_offsets

Yes

String

Start position for reading DIS data.

  • LATEST: Latest data is read first.
  • EARLIEST: Earliest data is read first.
Table 9 out_stream_conf parameters

Parameter

Mandatory

Type

Description

stream_name

No

String

Name of the DIS stream. The stream is used to store the ranking preprocessing data generated by the calculation of behavior data and profile libraries for model training. Data in the stream is intermediate data generated by streaming training jobs. You only need to specify the stream name and do not need to send or obtain data from the stream.

starting_offsets

Yes

String

Start position for reading DIS data. LATEST indicates that the latest data is read first.

Table 10 data_source_config parameters

Parameter

Mandatory

Type

Description

interval

Yes

Integer

Time interval for the running of nearline jobs, in seconds. For example, the value 10 indicates that the nearline strategy performs the computing tasks every 10 seconds, including stream data reading and processing.

Table 11 algorithm_config parameters

Parameter

Mandatory

Type

Description

online_job_uuid

Yes

String

UUID of the associated online service.

flow_name

Yes

String

Name of an online process of a associated online service. The behavior parameters, model file path, and data preprocessing information required by the streaming training job are obtained from the online process.

online_training_config

Yes

JSON

Platform parameter. For details, see Table 12.

bad_record_log

No

String

Path to access the error data log. Folders that house the error data are placed in the path.

Table 12 online_training_config parameters

Parameter

Mandatory

Type

Description

spec_id

Yes

Integer

Resource specification ID of a ranking job Before using ModelArts, query the access keys by referring to Querying the Access Keys of ModelArts and associate the access keys with ModelArts by referring to Associating the AK/SK with ModelArts. Then, obtain the value returned by the spec_id parameter by referring to Querying the Compute Node Specifications of ModelArts.

optimize_parameters

Yes

JSON

Platform parameter. For details, see Table 13.

update_interval

Yes

Integer

Interval for updating the ranking model, in minutes. For example, the value 10 indicates that the ranking model is saved to OBS every 10 minutes.

Table 13 optimize_parameters parameters

Parameter

Mandatory

Type

Description

type

Yes

String

Optimizer type. The option is as follows:

  • ftrl

initial_accumulator_value

Yes

Double

Parameter that can adjust the learning step dynamically. The value ranges from 0 (0 is not included) to 1. The default value is 0.1.

lambda1

Yes

Double

Overlaid on the norm (x, 1) of the model and used to limit the model value to prevent overfitting. The value ranges from 0 to 1. The default value is 0.

lambda2

Yes

Double

Overlaid on the norm (x, 2) of the model and used to limit the model value to prevent overfitting. The value ranges from 0 to 1. The default value is 0.

learning_rate

Yes

Double

Hyper-parameter that controls the step size of the optimizer in the optimization direction. The value ranges from 0 (0 is not included) to 1. The default value is 0.1.

Response

Table 14 describes the response parameters.

Table 14 Response parameters

Parameter

Mandatory

Type

Description

is_success

Yes

Boolean

Whether the request is successful

nearline_uuid

Yes

String

Candidate set ID

job_id

Yes

String

Job ID

Example

  • Example request
    {
    	"job_name": "Nearline-update",
    	"job_description": "",
    	"nearline_platform": {
    		"platform": "DLI",
    		"platform_parameter": {
    			"cluster_name": "dli-1"
    		},
    		"config_load_path": "<OBS path for storing the configuration files>",
    		"computing_resource": ""
    	},
    	"storage": {
    		"user_profile_storage": {
    			"platform": "CloudTable",
    			"platform_parameter": {
    				"cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223",
    				"cluster_name": "cloudtable-62d2",
    				"table_name": "write-profile-user"
    			}
    		},
    		"item_profile_storage": {
    			"platform": "CloudTable",
    			"platform_parameter": {
    				"cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223",
    				"cluster_name": "cloudtable-62d2",
    				"table_name": "write-profile-item"
    			}
    		},
    		"filter_set_storage": {
    			"platform": "CloudTable",
    			"platform_parameter": {
    				"cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223",
    				"cluster_name": "cloudtable-62d2",
    				"table_name": "write-profile-filter"
    			}
    		}
    	},
    	"strategy": {
    		"name": "Update user profiles based on behavior data",
    		"algorithm_type": "NEARLINE_UPDATE_USER_PORTRAIT",
    		"strategy_type": "nearline",
    		"parameter": {
    			"data_source_config": {
    				"behavior_type": ["view", "click", "collect", "uncollect", "search_click", "comment", "share", "like", "dislike", "grade", "consume", "use"],
    				"interval": "10"
    			},
    			"data_source": {
    				"platform": "DIS",
    				"platform_parameter": {
    					"stream_name": "dis-evan",
    					"starting_offsets": "latest"
    				}
    			},
    			"algorithm_config": {
    				"update_context": true,
    				"update_item_hotvalue_flag": true,
    				"filter_history_flag": true,
    				"max_history_num": 100,
    				"result_path": "<Path for storing the real-time sample data>",
    	                        "global_features_information_path":"<Path for storing the global configuration tables>",
    				"bad_record_log":"<Path for storing exception data logs>"
    			}
    		}
    	}
    }
  • Example of a successful response
    {
        "is_success": true,
        "job_id": "cdf49df766f2499586685b08212fd03f",
        "nearline_uuid": "61496485f0ba4a77b02b4f66f3c11078"
    }
  • Example of a failed response
    {
        "is_success": false,
        "error_code": "res.1008",
        "error_msg": "The request parameter(job_name) is null."
    }

Status Code

For details about status codes, see Status Codes.