Help Center> RES> API Reference> APIs (Old Version)> Job-related APIs> Submitting Realtime Streaming Nearline Jobs

Submitting Realtime Streaming Nearline Jobs

Function

This API is used to submit real-time streaming nearline jobs and perform nearline computing tasks.

URI

POST /v1/{project_id}/nearline-job

Table 1 describes the URI parameters.

Table 1 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the request parameters.

Table 2 Request parameters

Parameter

Mandatory

Type

Description

workspace_id

No

String

Workspace ID. The default value is 0.

job_name

Yes

String

Training job name. The value can contain a maximum of 20 characters.

job_description

No

String

Training job description. The value can contain a maximum of 256 characters.

nearline_platform

Yes

JSON

Offline computing platform. For details, see Table 3.

storage

Yes

JSON

Storage information. For details, see Table 5.

strategy

Yes

JSON

Strategy information. For details, see Table 8.

Table 3 nearline_platform parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported.

platform_parameter

Yes

JSON

Platform parameter. For details, see Table 4.

computing_resource

No

String

Resource specifications required for the normal running of the DLI jobs.

config_load_path

Yes

String

OBS path that stores the files generated by the selected configurations

Table 4 platform_parameter parameters

Parameter

Mandatory

Type

Description

cluster_name

Yes

String

Cluster name

cluster_id

No

String

Cluster ID

Table 5 storage parameters

Parameter

Mandatory

Type

Description

user_profile_storage

No

JSON

User profile storage. This parameter is mandatory if the algorithm_type in the strategy field is set to NEARLINE_WRITE_USER_PROFILE, NEARLINE_UPDATE_USER_PORTRAIT, or NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6.

item_profile_storage

No

JSON

Item profile storage. This parameter is mandatory if algorithm_type in the strategy field is set to NEARLINE_WRITE_ITEM_PROFILE, NEARLINE_UPDATE_USER_PORTRAIT, or NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6.

filter_set_storage

No

JSON

Historical record storage. This parameter is optional if algorithm_type in the strategy field is set to NEARLINE_UPDATE_USER_PORTRAIT or NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6.

candidate_set_storage

No

JSON

Candidate set storage. This parameter is mandatory if algorithm_type in the strategy field is set to NEARLINE_UPDATE_USER_CANDIDATE_SET. For details, see Table 6.

Table 6 Storage information

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. Currently, only CloudTable is supported.

platform_parameter

Yes

JSON

Table 7 describes platform parameters.

Table 7 platform_parameter parameters

Parameter

Mandatory

Type

Description

cluster_id

Yes

String

Cluster ID

table_name

Yes

String

Table name. The value can contain a maximum of 64 characters.

cluster_name

No

String

Cluster name

data_version

No

String

Data version. The options are V1 and V2.

region_info

No

JSON

Pre-partition information. You need to set the pre-partition information only when the data version is V2. No pre-partition information is needed when the data version is V2. For details, see Table 15.

Table 8 strategy parameters

Parameter

Mandatory

Type

Description

strategy_type

Yes

String

The optional value is nearline.

name

Yes

String

Strategy alias. The value can contain a maximum of 60 characters.

algorithm_type

Yes

String

Algorithm type. Four options are provided, which can be seen as follows:

NEARLINE_WRITE_USER_PROFILE (Write user profiles based on user information logs.)

NEARLINE_WRITE_ITEM_PROFILE (Write item profiles based on item information logs.)

NEARLINE_UPDATE_USER_PORTRAIT (Update user profiles based on behavior logs.)

NEARLINE_UPDATE_USER_CANDIDATE_SET (Update user candidate sets based on behavior logs.)

parameter

Yes

JSON

Algorithm parameter. For details, see Table 9.

Table 9 parameter parameters

Parameter

Mandatory

Type

Description

data_source

Yes

JSON

Data source parameter. For details, see Table 10.

The standard recommendation data supported by the real-time streaming nearline job comes from List of User Behaviors.

data_source_config

Yes

JSON

Data source configuration. For details, see Table 12.

algorithm_config

Yes

JSON

Algorithm configuration. For details, see Table 13.

Table 10 data_source parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. Currently, only DIS is supported. The data required by the real-time nearline jobs is added to the DIS platform where RES reads the data for nearline computing tasks.

platform_parameter

Yes

JSON

Platform parameter. For details, see Table 11.

Table 11 platform_parameter parameters

Parameter

Mandatory

Type

Description

stream_name

No

String

DIS stream name

starting_offsets

Yes

String

Start position for reading DIS data.

  • LATEST: Latest data is read first.
  • EARLIEST: Earliest data is read first.
Table 12 data_source_config parameters

Parameter

Mandatory

Type

Description

behavior_type

No

List<String>

Behavior type

interval

Yes

Integer

Time interval for the running of nearline jobs, in seconds. For example, the value 10 indicates that the nearline strategy performs the computing tasks every 10 seconds, including stream data reading and processing.

Table 13 algorithm_config parameters

Parameter

Mandatory

Type

Description

update_context

No

Boolean

Whether to update contextual information This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT.

update_item_hotvalue_flag

No

Boolean

Whether to update item popularity. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT.

filter_history_flag

No

Boolean

Whether to save the history records of a user or filter the records. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT or NEARLINE_UPDATE_USER_CANDIDATE_SET.

max_history_num

No

Int

Maximum length of a saved historical record. This parameter is mandatory if filter_history_flag is set to true.

result_path

No

String

Path for storing real-time data samples. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_PORTRAIT.

rank_type

No

String

Ranking mode of candidate sets. The value can be HOT, RANDOM, or TIME. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET.

max_candidate_number

No

Int

Maximum length of the retrieved candidate set. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET.

recall_type

No

String

Retrieval mode of candidate sets. The value can be TAG_BASE or ACTION_BASE. This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET.

use_tag_nums

No

Int

Number of interest tags (The larger the number is, the richer the items in the retrieved candidate sets are). This parameter is mandatory if algorithm_type is set to NEARLINE_UPDATE_USER_CANDIDATE_SET.

time_name

No

String

Name of a field that indicates a time feature in item data. This parameter is mandatory if rank_type is set to TIME.

rec_day

No

Int

Time period during which data is collected. The value is N days before the current time.

This parameter is mandatory if rank_type is set to TIME.

global_features_information_path

Yes

String

Path that stores the global feature file

bad_record_log

No

String

Path to access the error data log. Folders that house the error data are placed in the path.

advanced_search

No

Map<String, List<String>>

Custom search criteria.

key is forcibly converted to value for retrieval.

candidate

No

JSON

For details, see Table 14.

tag_reduce_rate

No

Double

Attenuation parameter of the interest tag. A smaller the value indicates a stronger the attenuation capability. A larger the value indicates a weaker the attenuation capability. If the value is 0, no attenuation occurs.

tags_mainten_length

No

Int

Maximum length of an interest tag in each tag system.

Table 14 canidate parameters

Parameter

Mandatory

Type

Description

time_feature

No

String

10-digit timestamp

max_size

Yes

Int

Maximum length of a candidate set

retain_days

No

Int

Latest N days in which the candidate sets can be retained

Table 15 region_info parameters

Parameter

Mandatory

Type

Description

region_num

Yes

Integer

Number of pre-partitions. Eight pre-partitions are recommended by default.

index_region_num

No

Integer

Number of pre-partitions in an index table. This parameter needs to be set only for the Update User Profile Based on User Data strategy and the Update Item Profile Based on Item Data strategy.

Response

Table 16 describes the response parameters.

Table 16 Response parameters

Parameter

Mandatory

Type

Description

is_success

Yes

Boolean

Whether the request is successful

nearline_uuid

Yes

String

Candidate set ID

job_id

Yes

String

Job ID

Example

  • Example request
    {
    	"job_name": "Nearline-update",
    	"job_description": "",
    	"nearline_platform": {
    		"platform": "DLI",
    		"platform_parameter": {
    			"cluster_name": "dli-1"
    		},
    		"config_load_path": "<OBS path for storing the configuration files>",
    		"computing_resource": ""
    	},
    	"storage": {
    		"user_profile_storage": {
    			"platform": "CloudTable",
    			"platform_parameter": {
    				"cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223",
    				"cluster_name": "cloudtable-62d2",
    				"table_name": "write-profile-user"
    			}
    		},
    		"item_profile_storage": {
    			"platform": "CloudTable",
    			"platform_parameter": {
    				"cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223",
    				"cluster_name": "cloudtable-62d2",
    				"table_name": "write-profile-item"
    			}
    		},
    		"filter_set_storage": {
    			"platform": "CloudTable",
    			"platform_parameter": {
    				"cluster_id": "96219587-3bb2-4eed-a8d0-0cda6dc50223",
    				"cluster_name": "cloudtable-62d2",
    				"table_name": "write-profile-filter"
    			}
    		}
    	},
    	"strategy": {
    		"name": "Update user profiles based on behavior data",
    		"algorithm_type": "NEARLINE_UPDATE_USER_PORTRAIT",
    		"strategy_type": "nearline",
    		"parameter": {
    			"data_source_config": {
    				"behavior_type": ["view", "click", "collect", "uncollect", "search_click", "comment", "share", "like", "dislike", "grade", "consume", "use"],
    				"interval": "10"
    			},
    			"data_source": {
    				"platform": "DIS",
    				"platform_parameter": {
    					"stream_name": "dis-evan",
    					"starting_offsets": "latest"
    				}
    			},
    			"algorithm_config": {
    				"update_context": true,
    				"update_item_hotvalue_flag": true,
    				"filter_history_flag": true,
    				"max_history_num": 100,
    				"result_path": "<Path for storing the real-time sample data>",
    	                        "global_features_information_path":"<Path for storing the global configuration tables>",
    				"bad_record_log":"<Path for storing exception data logs>"
    			}
    		}
    	}
    }
  • Example of a successful response
    {
        "is_success": true,
        "job_id": "cdf49df766f2499586685b08212fd03f",
        "nearline_uuid": "61496485f0ba4a77b02b4f66f3c11078"
    }
  • Example of a failed response
    {
        "is_success": false,
        "error_code": "res.1008",
        "error_msg": "The request parameter(job_name) is null."
    }

Status Code

For details about status codes, see Status Codes.