Submitting Combined Jobs

Function

This API is used to submit combined jobs for offline computing tasks and generate candidate sets using the selected strategies.

URI

POST /v1/{project_id}/training

Table 1 describes the URI parameters.
Table 1 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the request parameters.
Table 2 Request parameters

Parameter

Mandatory

Type

Description

workspace_id

No

String

Workspace ID. The default value is 0.

job_name

Yes

String

Training job name. The value can contain a maximum of 20 characters. Only digits, letters, underscores (_), and hyphens (-) are allowed.

job_description

No

String

Training job description. The value can contain a maximum of 256 characters.

offline_platform

Yes

List

Offline computing platform. For details, see Table 3.

data_source

Yes

List

Data source. For details, see Table 5.

storage

Yes

List

Storage information. For details, see Table 8.

algorithm_setting

Yes

JSON

Algorithm configuration. For details, see Table 10.

filter_rules

No

JSON

List of filter rules. For details, see Table 12.

Table 3 offline_platform parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. The value can be DLI.

platform_parameter

Yes

JSON

Platform parameter. For details, see Table 4.

computing_resource

No

String

Resource specifications required for the normal running of the DLI jobs.

config_load_path

Yes

String

Path to access the configuration items

Table 4 platform_parameter parameters

Parameter

Mandatory

Type

Description

cluster_name

Yes

String

Cluster name. The value can contain a maximum of 64 characters.

Table 5 data_source parameters

Parameter

Mandatory

Type

Description

offline

Yes

List

Offline data source. For details, see Table 6.

Table 6 offline parameters

Parameter

Mandatory

Type

Description

table_type_id

Yes

String

General data templates:

  • USER_META: User feature list
  • ITEM_META: Item feature list
  • USER_BEHAVIOR: User behavior list
  • GENERAL_FORMAT: General format

For details about the data format, see Offline Data Sources.

data_source_url

Yes

String

Data source path. The value can contain a maximum of 1000 characters.

data_format

Yes

String

Data format. The options are csv, parquet, json, or orc.

data_param

No

JSON

Data parameter. For details, see Table 7. This parameter is mandatory when the data format is csv and optional for other data formats.

start_time

No

String

Start time for collecting the general source data, for example, 2018-01-01.

end_time

No

String

End time for collecting the general source data, for example, 2018-02-01.

Table 7 data_param parameters

Parameter

Mandatory

Type

Description

header

Yes

String

Whether to display the table header. true indicates that the table header is displayed; false indicates that the table header is not displayed.

delimiter

Yes

String

Delimiter. The value can contain a maximum of 10 characters.

quote

Yes

String

Quotation character. The value can contain a maximum of 10 characters.

escape

Yes

String

Escape character. The value can contain a maximum of 10 characters.

Table 8 storage parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. Currently, only CloudTable is supported.

platform_parameter

Yes

JSON

Storage platform parameter. For details, see Table 9.

Table 9 platform_parameter parameters

Parameter

Mandatory

Type

Description

cluster_id

Yes

String

Cluster ID

table_name

Yes

String

Table name. The value can contain a maximum of 64 characters.

cluster_name

No

String

Cluster name

data_version

No

String

Data version. The options are V1 and V2.

region_info

No

JSON

Pre-partition information You need to set the pre-partition information only when the data version is V2. No pre-partition information is needed when the data version is V2. For details, see Table 17.

Table 10 algorithm_setting parameters

Parameter

Mandatory

Type

Description

start_time

No

Long

Start time of data training, expressed in the form of a timestamp in milliseconds.

end_time

No

Long

End time of data training, expressed in the form of a timestamp in milliseconds.

strategy

Yes

List

Strategy set. For details, see Table 11.

Table 11 strategy parameters

Parameter

Mandatory

Type

Description

strategy_type

Yes

String

(Optional) Strategy type:

  • Retrieval strategy
  • Ranking strategy

name

Yes

String

Strategy alias. The value can contain a maximum of 60 characters.

algorithm_type

Yes

String

Algorithm type

parameter

Yes

JSON

Algorithm parameter (JSON format)

NOTE:

This API is used to submit a combined job. Parameters vary according to the selected strategies.

  • Retrieval strategy

    For details about the parameters contained in parameter, see parameter retrieval in Parameters for Supported Strategies.

  • Ranking Strategy

    Parameters in parameter contain the following parts:

    • spec_id: ID of the resource specification selected by the training job. The parameter type is Long.
    • run_path: Path for saving the model and log files. The parameter type is String.
    • training_data_path: OBS path for saving the training data. The parameter type is String.
    • test_data_path: OBS path for saving the test data. The parameter type is String.
    • Parameters for Supported Strategies lists the details of the ranking strategy parameters.
Table 12 filter_rules parameters

Parameter

Mandatory

Type

Description

behavior_rules

No

List

Filter rule configuration for user behaviors. For details, see Table 13.

blacklist

No

String

Blacklisting rule configuration

whitelist

No

String

Whitelisting rule configuration

etl_uuid

No

String

UUID generated by extracting user and item features in Feature Engineering, used for configuring attribute filter rules.

Table 13 behavior_rules parameters

Parameter

Mandatory

Type

Description

behavior_type

Yes

String

Behavior types:

  • view indicates that an item/content is exposed to users.
  • click indicates that a user clicks an item or content.
  • collect indicates that a user adds an item or content to favorites.
  • uncollect indicates that a user removes an item or content from favorites.
  • search_click indicates that a user clicks an item in the search results.
  • comment indicates that a user makes comments on an item or content.
  • share indicates that a user shares an item/content with others.
  • like indicates that a user gives an item/content a thumb-up.
  • dislike indicates that a user gives an item/content a thumb-down.
  • grade indicates that a user rates an item/content.
  • consume indicates that a user buys an item (primarily refers to commodities).
  • use indicates that a user watches videos/listens to a kind of music/reads something (primarily refers to content)...

interval

Yes

Integer

Elapsed time (days). The value ranges from 1 to 10,000.

frequency

Yes

Integer

Frequency. The value ranges from 1 to 10,000.

Response

Table 14 describes the response parameters.
Table 14 Response parameters

Parameter

Mandatory

Type

Description

is_success

Yes

Boolean

Whether the request is successful

strategies

Yes

List

Returned strategy result. For details, see Table 15.

job_id

Yes

String

Job ID

filter_uuid

Yes

String

UUID generated by using filter rules

Table 15 strategies parameters

Parameter

Mandatory

Type

Description

strategy_type

Yes

String

(Optional) Strategy type:

  • Retrieval strategy
  • Ranking strategy

name

Yes

String

Strategy alias

algorithm_type

Yes

String

Algorithm type

parameter

Yes

JSON

Algorithm parameter. For details, see Parameters for Supported Strategies.

candidate_set

Yes

List

Set of candidates. For details, see Table 16.

Table 16 candidate_set parameters

Parameter

Mandatory

Type

Description

uuid

Yes

String

Candidate set ID

description

Yes

String

Candidate set description

Table 17 region_info parameters

Parameter

Mandatory

Type

Description

region_num

Yes

Integer

Number of pre-partitions Eight pre-partitions are recommended by default.

index_region_num

No

Integer

Number of pre-partitions in an index table. This parameter is required only for the Initial User Profile-Item Profile-Standard Wide Table Generation operator in feature engineering project. For other offline operators, this parameter is not required because no index table is generated.

Example

  • Example request
    {
      "job_name": "yyn-test",
      "job_description": "yyn-test",
      "data_source": [
        {
          "offline": [
            {
              "table_type_id": "USER_META",
              "data_format": "csv",
              "data_param": {
                "header": "false",
                "delimiter": ",",
                "quote": "\"",
                "escape": "\\"
              },
              "data_source_url": "<OBS path for storing the data sources>"
            }
            {
              "table_type_id": "ITEM_META",
              "data_format": "csv",
              "data_param": {
                "header": "false",
                "delimiter": ",",
                "quote": "\"",
                "escape": "\\"
              },
              "data_source_url": "<OBS path for storing the data sources>"
            },
            {
              "table_type_id": "USER_BEHAVIOR",
              "data_format": "csv",
              "data_param": {
                "header": "false",
                "delimiter": ",",
                "quote": "\"",
                "escape": "\\"
              },
              "data_source_url": "<OBS path for storing the data sources>"
            }
          ]
        }
      ],
      "offline_platform": [
        {
          "platform": "DLI",
          "platform_parameter": {
            "cluster_name": "res_one"
          },
          "config_load_path": "<Path for loading configurations>"
        }
      ],
      "storage": [
        {
          "platform": "CloudTable",
          "platform_parameter": {
            "cluster_id": "cca518b4-a9fb-4dbf-80bb-d6838cbdcc87",
            "cluster_name": "cloudtable-ccb1-sec",
            "table_name": "yyn-555"
          }
        }
      ],
      "algorithm_setting": {
        "strategy": [
          {
    "name": "Recommendation Based on Specific Behavior Popularity" by default,
            "algorithm_type": "SpecificBehavior",
            "strategy_type": "recall",
            "parameter": {
              "data_source_config": {
                "retain_days": 30,
                "behavior_type": "collect",
                "start_time": 1543593600000,
                "end_time": 1543939200000
              },
              "algorithm_config": {},
              "candidate_set_config": {
                "is_recommended_by_category": false
              }
            }
          },
          {
    "name": "ItemCF Recommendation"
            "algorithm_type": "ItemCF",
            "strategy_type": "recall",
            "parameter": {
              "data_source_config": {
                "retain_days": 30,
                "behavior_weights": [
                  {
                    "behavior_type": "view",
                    "weight": 1
                  }
                ]
              },
              "algorithm_config": {
                "similar_metric": "cosine"
              },
              "candidate_set_config": {
                "max_recommended_num": 1000
              }
            }
          },
          {
    "name": "Business Rule - Historical Behavior-based Recommendation",
            "algorithm_type": "HistoryBehaviorMemory",
            "strategy_type": "recall",
            "parameter": {
              "data_source_config": {
                "retain_days": 30
              },
              "algorithm_config": {
                "history_behavior_memories": [
                  {
                    "behavior_type": "view",
                    "least_intension": 1
                  }
                ]
              },
              "candidate_set_config": {}
            }
          },
          {
    "name": "Field-aware factorization machine",
            "strategy_type": "sorting",
            "algorithm_type": "FFM",
            "parameter": {
              "algorithm_parameters": {
                "max_iterations": 50,
                "early_stop_iterations": 5,
                "fields_feature_size_path": "<OBS path for storing data>",
                "algorithm_specify_parameters": {
                  "latent_vector_length": 10
                },
                "regular_parameters": {
                  "l2_regularization": 0,
                  "regular_loss_compute_mode": "full"
                },
                "initial_parameters": {
                  "initial_method": "normal",
                  "mean_value": 0,
                  "standard_deviation": 0.001
                },
                "optimize_parameters": {
                  "type": "grad",
                  "learning_rate": 0.001
                }
              },
              "algorithm_type": "FFM",
              "spec_id": 1,
              "run_path": "<Root path for storing training results>",
              "training_data_path": "<OBS path for storing the training data>",
              "test_data_path": "<OBS path for storing the test data>"
            }
          }
        ],
        "start_time": 1543593600000,
        "end_time": 1543939200000
      },
      "filter_rules": {
        "behavior_rules": [
          {
            "behavior_type": "collect",
            "interval": 7,
            "frequency": 5
          }
        ],
        "blacklist": "<Path for storing the blacklists>",
        "whitelist": "<Path for storing the whitelists>"
      }
    }
  • Example of a successful response
    {
        "is_success": true,
        "strategies": [
            {
    "name": "Recommendation Based on Specific Behavior Popularity" by default,
                "strategy_type": "recall",
                "algorithm_type": "SpecificBehavior",
                "parameter": {
                    "data_source_config": {
                        "retain_days": 30,
                        "behavior_type": "collect",
                        "start_time": 1543593600000,
                        "end_time": 1543939200000
                    },
                    "algorithm_config": {},
                    "candidate_set_config": {
                        "is_recommended_by_category": false
                    }
                },
                "candidate_set": [
                    {
                        "uuid": "bb45ef1d31a7488584724f58d468d9ae",
    "description": "[Recommendation Based on Specific Behavior Popularity by default] Candidate sets generated by the recommendation algorithms of specific behavior popularity"
                    }
                ],
                "strategy_id": 0
            },
            {
    "name": "ItemCF Recommendation"
                "strategy_type": "recall",
                "algorithm_type": "ItemCF",
                "parameter": {
                    "data_source_config": {
                        "retain_days": 30,
                        "behavior_weights": [
                            {
                                "behavior_type": "view",
                                "weight": 1
                            }
                        ]
                    },
                    "algorithm_config": {
                        "similar_metric": "cosine"
                    },
                    "candidate_set_config": {
                        "max_recommended_num": 1000
                    }
                },
                "candidate_set": [
                    {
                        "uuid": "958d09223b2e4175b2740f8f782cc5fc",
    "description": "[ItemCF Recommendation] User-item list candidate sets generated by the ItemCF algorithm"
                    }
                ],
                "strategy_id": 1
            },
            {
    "name": "Business Rule - Historical Behavior-based Recommendation",
                "strategy_type": "recall",
                "algorithm_type": "HistoryBehaviorMemory",
                "parameter": {
                    "data_source_config": {
                        "retain_days": 30
                    },
                    "algorithm_config": {
                        "history_behavior_memories": [
                            {
                                "behavior_type": "view",
                                "least_intension": 1
                            }
                        ]
                    },
                    "candidate_set_config": {}
                },
                "candidate_set": [
                    {
                        "uuid": "1b5301f0c7804e28b66eb46c92249ed2",
    "description": "[Business Rule - Historical Behavior-based Recommendation] User-item list candidate sets generated by CustomRule"
                    }
                ],
                "strategy_id": 2
            },
            {
    "name": "Field-aware factorization machine",
                "strategy_type": "sorting",
                "algorithm_type": "FFM",
                "parameter": {
                    "algorithm_parameters": {
                        "row_features_size": "6",
                        "algorithm_specify_parameters": {
                            "latent_vector_length": 10
                        },
                        "initial_parameters": {
                            "initial_method": "normal",
                            "mean_value": -0.001,
                            "standard_deviation": 0.001
                        },
                        "optimize_parameters": {
                            "type": "grad",
                            "learning_rate": 0.1,
                            "log_loss_reduce_method": "mean"
                        },
                        "regular_parameters": {
                            "l2_loss_weight_lambda": 0.001
                        },
                        "loss_mode": {
                            "l2_loss_mode": "full"
                        },
                        "fields_feature_size_path": "<Data storage path>"
                    },
                    "algorithm_type": "FFM",
                    "spec_id": 1,
    "name": "Field-aware factorization machine",
                    "run_path": "<Root path for storing the training sets>",
                    "training_data_path": "<OBS path for storing the training data>",
                    "test_data_path": "<OBS path for storing the test data>"
                },
                "candidate_set": [
                    {
                        "uuid": "4aa9f06d24254fedbe462bfbfb879e63",
    "description": "Field-aware factorization machine",
                    }
                ],
                "strategy_id": 0
            }
        ],
        "filter_uuid": "857578fafa4746dd873722d661725154",
        "job_id": "f171a66489904462bad0a89d9b7483de"
    }
  • Example of a failed response
    {
        "is_success": false,
        "error_code": "res.2013",
        "error_msg": "There dataSource is empty or less than five."
    }

Status Code

For details about status codes, see Status Codes.