Submitting Realtime Streaming Nearline Jobs
Function
This API is used to submit real-time streaming nearline jobs and perform nearline computing tasks.
URI
POST /v1/{project_id}/nearline-job
Table 1 describes the URI parameters.
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| project_id | Yes | String | Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID. |
Request
Table 2 describes the request parameters.
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| workspace_id | No | String | Workspace ID. The default value is 0. |
| job_name | Yes | String | Training job name. The value can contain a maximum of 20 characters. |
| job_description | No | String | Training job description. The value can contain a maximum of 256 characters. |
| nearline_platform | Yes | JSON | Offline computing platform. For details, see Table 3. |
| storage | Yes | JSON | Storage information. For details, see Table 5. |
| strategy | Yes | JSON | Strategy information. For details, see Table 8. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| platform | Yes | String | Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported. |
| platform_parameter | Yes | JSON | Platform parameter. For details, see Table 4. |
| computing_resource | No | String | Resource specifications required for the normal running of the DLI jobs. |
| config_load_path | Yes | String | OBS path that stores the files generated by the selected configurations |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| cluster_name | Yes | String | Cluster name |
| cluster_id | No | String | Cluster ID |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| user_profile_storage | No | JSON | User profile storage. This parameter is mandatory if the algorithm_type in the strategy field is set to NEARLINE_WRITE_USER_PROFILE, NEARLINE_UPDATE_USER_PORTRAIT, or NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6. |
| item_profile_storage | No | JSON | Item profile storage. This parameter is mandatory if algorithm_type in the strategy field is set to NEARLINE_WRITE_ITEM_PROFILE, NEARLINE_UPDATE_USER_PORTRAIT, or NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6. |
| filter_set_storage | No | JSON | Historical record storage. This parameter is optional if algorithm_type in the strategy field is set to NEARLINE_UPDATE_USER_PORTRAIT or NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6. |
| candidate_set_storage | No | JSON | Candidate set storage. This parameter is mandatory if algorithm_type in the strategy field is set to NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| platform | Yes | String | Platform name. Currently, only CloudTable is supported. |
| platform_parameter | Yes | JSON | Table 7 describes platform parameters. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| cluster_id | Yes | String | Cluster ID |
| table_name | Yes | String | Table name. The value can contain a maximum of 64 characters. |
| cluster_name | No | String | Cluster name |
| data_version | No | String | Data version. The options are V1 and V2. |
| region_info | No | JSON | Pre-partition information. You need to set the pre-partition information only when the data version is V2. No pre-partition information is needed when the data version is V2. For details, see Table 15. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| strategy_type | Yes | String | The optional value is nearline. |
| name | Yes | String | Strategy alias. The value can contain a maximum of 60 characters. |
| algorithm_type | Yes | String | Algorithm type. Four options are provided, which can be seen as follows: NEARLINE_WRITE_USER_PROFILE (Write user profiles based on user information logs.) NEARLINE_WRITE_ITEM_PROFILE (Write item profiles based on item information logs.) NEARLINE_UPDATE_USER_PORTRAIT (Update user profiles based on behavior logs.) NEARLINE_UPDATE_USER_CANDIDATE_SET (Update user candidate sets based on behavior logs.) |
| parameter | Yes | JSON | Algorithm parameter. For details, see Table 9. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| data_source | Yes | JSON | Data source parameter. For details, see Table 10. The standard recommendation data supported by the real-time streaming nearline job comes from List of User Behaviors. |
| data_source_config | Yes | JSON | Data source configuration. For details, see Table 12. |
| algorithm_config | Yes | JSON | Algorithm configuration. For details, see Table 13. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| platform | Yes | String | Platform name. Currently, only DIS is supported. The data required by the real-time nearline jobs is added to the DIS platform where RES reads the data for nearline computing tasks. |
| platform_parameter | Yes | JSON | Platform parameter. For details, see Table 11. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| stream_name | No | String | DIS stream name |
| starting_offsets | Yes | String | Start position for reading DIS data.
|
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| behavior_type | No | List<String> | Behavior type |
| interval | Yes | Integer | Time interval for the running of nearline jobs, in seconds. For example, the value 10 indicates that the nearline strategy performs the computing tasks every 10 seconds, including stream data reading and processing. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| update_context | No | Boolean | Whether to update contextual information This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT. |
| update_item_hotvalue_flag | No | Boolean | Whether to update item popularity. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT. |
| filter_history_flag | No | Boolean | Whether to save the history records of a user or filter the records. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT or NEARLINE_UPDATE_USER_CANDIDATE_SET. |
| max_history_num | No | Int | Maximum length of a saved historical record. This parameter is mandatory if filter_history_flag is set to true. |
| result_path | No | String | Path for storing real-time data samples. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT. |
| rank_type | No | String | Ranking mode of candidate sets. The value can be HOT, RANDOM, or TIME. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET. |
| max_candidate_number | No | Int | Maximum length of the retrieved candidate set. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET. |
| recall_type | No | String | Retrieval mode of candidate sets. The value can be TAG_BASE or ACTION_BASE. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET. |
| use_tag_nums | No | Int | Number of interest tags (The larger the number is, the richer the items in the retrieved candidate sets are). This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET. |
| time_name | No | String | Name of a field that indicates a time feature in item data. This parameter is mandatory if rank_type is set to TIME. |
| rec_day | No | Int | Time period during which data is collected. The value is N days before the current time. This parameter is mandatory if rank_type is set to TIME. |
| global_features_information_path | Yes | String | Path that stores the global feature file |
| bad_record_log | No | String | Path to access the error data log. Folders that house the error data are placed in the path. |
| advanced_search | No | Map<String, List<String>> | Custom search criteria. key is forcibly converted to value for retrieval. |
| candidate | No | JSON | For details, see Table 14. |
| tag_reduce_rate | No | Double | Attenuation parameter of the interest tag. A smaller the value indicates a stronger the attenuation capability. A larger the value indicates a weaker the attenuation capability. If the value is 0, no attenuation occurs. |
| tags_mainten_length | No | Int | Maximum length of an interest tag in each tag system. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| time_feature | No | String | 10-digit timestamp |
| max_size | Yes | Int | Maximum length of a candidate set |
| retain_days | No | Int | Latest N days in which the candidate sets can be retained |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| region_num | Yes | Integer | Number of pre-partitions. Eight pre-partitions are recommended by default. |
| index_region_num | No | Integer | Number of pre-partitions in an index table. This parameter needs to be set only for the Update User Profile Based on User Data strategy and the Update Item Profile Based on Item Data strategy. |
Response
Table 16 describes the response parameters.
Example
- Example request
{ "job_name": "Nearline-update", "job_description": "", "nearline_platform": { "platform": "DLI", "platform_parameter": { "cluster_name": "dli-1" }, "config_load_path": "<OBS path for storing the configuration files>", "computing_resource": "" }, "storage": { "user_profile_storage": { "platform": "CloudTable", "platform_parameter": { "cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223", "cluster_name": "cloudtable-62d2", "table_name": "write-profile-user" } }, "item_profile_storage": { "platform": "CloudTable", "platform_parameter": { "cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223", "cluster_name": "cloudtable-62d2", "table_name": "write-profile-item" } }, "filter_set_storage": { "platform": "CloudTable", "platform_parameter": { "cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223", "cluster_name": "cloudtable-62d2", "table_name": "write-profile-filter" } } }, "strategy": { "name": "Update user profiles based on behavior data", "algorithm_type": "NEARLINE_UPDATE_USER_PORTRAIT", "strategy_type": "nearline", "parameter": { "data_source_config": { "behavior_type": ["view", "click", "collect", "uncollect", "search_click", "comment", "share", "like", "dislike", "grade", "consume", "use"], "interval": "10" }, "data_source": { "platform": "DIS", "platform_parameter": { "stream_name": "dis-evan", "starting_offsets": "latest" } }, "algorithm_config": { "update_context": true, "update_item_hotvalue_flag": true, "filter_history_flag": true, "max_history_num": 100, "result_path": "<Path for storing the real-time sample data>", "global_features_information_path":"<Path for storing the global configuration tables>", "bad_record_log":"<Path for storing exception data logs>" } } } } - Example of a successful response
{ "is_success": true, "job_id": "cdf49df766f2499586685b08212fd03f", "nearline_uuid": "61496485f0ba4a77b02b4f66f3c11078" } - Example of a failed response
{ "is_success": false, "error_code": "res.1008", "error_msg": "The request parameter(job_name) is null." }
Status Code
For details about status codes, see Status Codes.
Last Article: Querying the Compute Node Specifications of ModelArts
Next Article: Submitting Streaming Training Jobs
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.