Submitting Streaming Training Jobs
Function
This API is used to submitting streaming training jobs.
URI
POST /v1/{project_id}/stream-etl-job
Table 1 describes the URI parameters.
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| project_id | Yes | String | Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID. |
Request
Table 2 describes the request parameters.
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| workspace_id | No | String | Workspace ID. The default value is 0. |
| job_name | Yes | String | Training job name. The value can contain a maximum of 20 characters. |
| job_description | No | String | Training job description. The value can contain a maximum of 256 characters. |
| nearline_platform | Yes | JSON | Offline computing platform. For details, see Table 3. |
| strategy | Yes | JSON | Strategy information. For details, see Table 5. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| platform | Yes | String | Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported. |
| platform_parameter | Yes | JSON | Platform parameter. For details, see Table 4. |
| computing_resource | No | String | Resource specifications required for the normal running of the DLI jobs. |
| config_load_path | Yes | String | OBS path that stores the files generated by the selected configurations |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| cluster_name | Yes | String | Cluster name. |
| cluster_id | No | String | Cluster ID. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| strategy_type | Yes | String | The optional value is nearline. |
| name | Yes | String | Strategy alias. The value can contain a maximum of 60 characters. |
| algorithm_type | Yes | String | Algorithm type. The option is as follows: NEARLINE_ONLINE_TRAINING |
| parameter | Yes | JSON | Algorithm parameter. For details, see Table 6. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| data_source | Yes | JSON | Data source parameter. For details, see Table 7. The standard recommendation data supported by the real-time streaming nearline job comes from List of User Behaviors. |
| data_source_config | Yes | JSON | Data source configuration. For details, see Table 10. |
| algorithm_config | Yes | JSON | Algorithm configuration. For details, see Table 11. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| platform | Yes | String | Platform name. Currently, only DIS is supported. The data required by the real-time nearline jobs is added to the DIS platform where RES reads the data for nearline computing tasks. |
| in_stream_conf | Yes | JSON | Platform parameter. For details, see Table 8. |
| out_stream_conf | Yes | JSON | Platform parameter. For details, see Table 9. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| stream_name | No | String | Name of the DIS stream. The stream is used to receive nearline behavior data. |
| starting_offsets | Yes | String | Start position for reading DIS data.
|
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| stream_name | No | String | Name of the DIS stream. The stream is used to store the ranking preprocessing data generated by the calculation of behavior data and profile libraries for model training. Data in the stream is intermediate data generated by streaming training jobs. You only need to specify the stream name and do not need to send or obtain data from the stream. |
| starting_offsets | Yes | String | Start position for reading DIS data. LATEST indicates that the latest data is read first. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| interval | Yes | Integer | Time interval for the running of nearline jobs, in seconds. For example, the value 10 indicates that the nearline strategy performs the computing tasks every 10 seconds, including stream data reading and processing. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| online_job_uuid | Yes | String | UUID of the associated online service. |
| flow_name | Yes | String | Name of an online process of a associated online service. The behavior parameters, model file path, and data preprocessing information required by the streaming training job are obtained from the online process. |
| online_training_config | Yes | JSON | Platform parameter. For details, see Table 12. |
| bad_record_log | No | String | Path to access the error data log. Folders that house the error data are placed in the path. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| spec_id | Yes | Integer | Resource specification ID of a ranking job Before using ModelArts, query the access keys by referring to Querying the Access Keys of ModelArts and associate the access keys with ModelArts by referring to Associating the AK/SK with ModelArts. Then, obtain the value returned by the spec_id parameter by referring to Querying the Compute Node Specifications of ModelArts. |
| optimize_parameters | Yes | JSON | Platform parameter. For details, see Table 13. |
| update_interval | Yes | Integer | Interval for updating the ranking model, in minutes. For example, the value 10 indicates that the ranking model is saved to OBS every 10 minutes. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| type | Yes | String | Optimizer type. The option is as follows:
|
| initial_accumulator_value | Yes | Double | Parameter that can adjust the learning step dynamically. The value ranges from 0 (0 is not included) to 1. The default value is 0.1. |
| lambda1 | Yes | Double | Overlaid on the norm (x, 1) of the model and used to limit the model value to prevent overfitting. The value ranges from 0 to 1. The default value is 0. |
| lambda2 | Yes | Double | Overlaid on the norm (x, 2) of the model and used to limit the model value to prevent overfitting. The value ranges from 0 to 1. The default value is 0. |
| learning_rate | Yes | Double | Hyper-parameter that controls the step size of the optimizer in the optimization direction. The value ranges from 0 (0 is not included) to 1. The default value is 0.1. |
Response
Table 14 describes the response parameters.
Example
- Example request
{ "job_name": "Nearline-update", "job_description": "", "nearline_platform": { "platform": "DLI", "platform_parameter": { "cluster_name": "dli-1" }, "config_load_path": "<OBS path for storing the configuration files>", "computing_resource": "" }, "storage": { "user_profile_storage": { "platform": "CloudTable", "platform_parameter": { "cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223", "cluster_name": "cloudtable-62d2", "table_name": "write-profile-user" } }, "item_profile_storage": { "platform": "CloudTable", "platform_parameter": { "cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223", "cluster_name": "cloudtable-62d2", "table_name": "write-profile-item" } }, "filter_set_storage": { "platform": "CloudTable", "platform_parameter": { "cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223", "cluster_name": "cloudtable-62d2", "table_name": "write-profile-filter" } } }, "strategy": { "name": "Update user profiles based on behavior data", "algorithm_type": "NEARLINE_UPDATE_USER_PORTRAIT", "strategy_type": "nearline", "parameter": { "data_source_config": { "behavior_type": ["view", "click", "collect", "uncollect", "search_click", "comment", "share", "like", "dislike", "grade", "consume", "use"], "interval": "10" }, "data_source": { "platform": "DIS", "platform_parameter": { "stream_name": "dis-evan", "starting_offsets": "latest" } }, "algorithm_config": { "update_context": true, "update_item_hotvalue_flag": true, "filter_history_flag": true, "max_history_num": 100, "result_path": "<Path for storing the real-time sample data>", "global_features_information_path":"<Path for storing the global configuration tables>", "bad_record_log":"<Path for storing exception data logs>" } } } } - Example of a successful response
{ "is_success": true, "job_id": "cdf49df766f2499586685b08212fd03f", "nearline_uuid": "61496485f0ba4a77b02b4f66f3c11078" } - Example of a failed response
{ "is_success": false, "error_code": "res.1008", "error_msg": "The request parameter(job_name) is null." }
Status Code
For details about status codes, see Status Codes.
Last Article: Submitting Realtime Streaming Nearline Jobs
Next Article: Submitting Data Quality Jobs
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.