Help Center> RES> API Reference> APIs (Old Version)> Job-related APIs> Submitting Feature Engineering Jobs> Submitting Feature Engineering Jobs

Submitting Feature Engineering Jobs

Function

This API is used to submit feature engineering jobs, including data preprocessing, feature extraction, and the generation of ranking training samples.

URI

POST /v1/{project_id}/etl-job

Table 1 describes the URI parameters.

**Table 1** URI parameters
Parameter	Mandatory	Type	Description
project_id	Yes	String	Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the request parameters.

**Table 2** Request parameters
Parameter	Mandatory	Type	Description
workspace_id	No	String	Workspace ID. The default value is 0.
job_name	Yes	String	Training job name. The value can contain a maximum of 20 characters.
job_description	No	String	Training job description. The value can contain a maximum of 256 characters.
algorithm_type	Yes	String	Algorithm types, which are: INITIAL_PROFILES_GENERATION BUILD_RANK_UNIFORM_DATA_FROM_JSON
algorithm_parameters	Yes	JSON	Algorithm parameter. Each kind of algorithm has specified parameters. Table 8 describes the details about INITIAL_PROFILES_GENERATION. Table 9 describes the details about BUILD_RANK_UNIFORM_DATA_FROM_JSON.
data_source	Yes	List	Algorithm data source INITIAL_PROFILES_GENERATION: Select the general template data as the data source. BUILD_RANK_UNIFORM_DATA_FROM_JSON: Select the general data as the data source.
storage	Yes	JSON	Storage platform. For details, see Table 6.
offline_platform	Yes	JSON	Offline computing platform. For details, see Table 3.

**Table 3** **offline_platform** parameters
Parameter	Mandatory	Type	Description
platform	Yes	String	Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported.
platform_parameter	Yes	JSON	Platform parameter. For details, see Table 4.
computing_resource	No	String	Resource specifications required for the normal running of the DLI jobs.
config_load_path	Yes	String	Path to read the configuration sources.

**Table 4** **platform_parameter** parameters
Parameter	Mandatory	Type	Description
cluster_name	Yes	String	Cluster name
cluster_id	No	String	Cluster ID

**Table 5** **data_source** parameters
Parameter	Mandatory	Type	Description
table_type_id	Yes	String	General data templates: USER_META: User feature list ITEM_META: Item feature list USER_BEHAVIOR: User behavior list For details about the data format, see Offline Data Sources. General format GENERAL_FORMAT
data_source_url	Yes	String	Data source path. The value can contain a maximum of 1000 characters.
data_format	Yes	String	Input data format. The value can be csv, parquet, json, or orc.
data_param	No	JSON	Data parameter. For details, see Table 7. This parameter is mandatory when the data format is csv and optional for other data formats.
start_time	No	String	Start time for collecting the source data. This parameter is mandatory when the data format is json and optional for other data formats.
end_time	No	String	End time for collecting the source data. This parameter is mandatory when the data format is json and optional for other data formats.

**Table 6** **storage** parameters
Parameter	Mandatory	Type	Description
user_profiles_table	No	JSON	User attribute storage table. For details, see Table 8. This parameter is mandatory when algorithm_type is set to INITIAL_PROFILES_GENERATION.
item_profiles_table	No	JSON	Item attribute storage table. For details, see Table 8. This parameter is mandatory when algorithm_type is set to INITIAL_PROFILES_GENERATION.

**Table 7** **data_param** parameters
Parameter	Mandatory	Type	Description
header	Yes	Boolean	Whether to display the table header
delimiter	Yes	String	Delimiter. The value can contain a maximum of 10 characters.
quote	Yes	String	Quotation character. The value can contain a maximum of 10 characters.
escape	Yes	String	Escape character. The value can contain a maximum of 10 characters.

**Table 8** **algorithm_parameters** parameters (**INITIAL_PROFILES_GENERATION** operator)
Parameter	Mandatory	Type	Description
result_path	Yes	String	Path or folder that stores all output data (user and item attributes, feature maps, field features, training sets, and test sets).
global_features_information_path	Yes	String	Global feature file (JSON) that contains the feature names, feature types, and feature value types. For details about the global feature file, see Viewing Global Feature File Configurations.
writer_parameters	No	JSON	Advanced settings. For details, see Table 10.

**Table 9** **algorithm_parameters** parameters (the **BUILD_RANK_UNIFORM_DATA_FROM_JSON** operator)
Parameter	Mandatory	Type	Description
result_path	Yes	String	Path or folder that stores all output data (user and item attributes, feature maps, field features, training sets, and test sets).
global_features_information_path	Yes	String	Global feature file (JSON) that contains the feature names, feature types, and feature value types. For details about the global feature file, see Viewing Global Feature File Configurations.
rank_etl_type	Yes	Enum	Operator type for processing ranking data. Each ranking algorithm requires specific data processing, and the ranking data processing type needs to be selected according to the used ranking algorithms. Data processing results of the LR, FM, FFM, DeepFM, and PIN algorithms can be shared.
rank_etl_parameters	Yes	JSON	Data preprocessing parameter of the ranking algorithm. For details, see Table 11.

**Table 10** **writer_parameters** parameters
Parameter	Mandatory	Type	Description
save_mode	No	String	Mode of retaining the existing wide table data in the result save path. New: No existing data is retained. Append: All existing data is retained. Overwrite: Data of the same date is overwritten and data of different dates is retained.

**Table 11** **rank_etl_parameters** parameters (LR, FM, FFM, DeepFM, and PIN)
Parameter	Mandatory	Type	Description
(divide_by_time_or_rate)	Yes	String	The training set and the test set are differentiated by TIME or RATE. The value can be TIME or RATE.
(training_data_start_time)	No	Long	Start time of training data. This parameter is mandatory when divide_by_time_or_rate is set to TIME. The value is less than the maximum time in the behavior data and the value of training_data_end_time. For example, 1541987933.
(training_data_end_time)	No	Long	End time of training data. This parameter is mandatory when divide_by_time_or_rate is set to TIME. The value must be less than the maximum time in the behavior data and greater than the value of training_data_end_time. For example, 1541987933.
(test_data_start_time)	No	Long	Start time of test data. This parameter is mandatory when divide_by_time_or_rate is set to TIME. The value is less than the maximum time in the behavior data and the value of test_data_end_time. For example, 1541987933.
(test_data_end_time)	No	Long	End time of test data. This parameter is mandatory when divide_by_time_or_rate is set to TIME. The value must be less than the maximum time in the behavior data and greater than the value of test_data_start_time. For example, 1541987933.
(training_data_rate)	No	Double	Percentage of training data in the input data. This parameter is mandatory when divide_by_time_or_rate is set to RATE. The value ranges from 0 to 1.
(test_data_rate)	No	Double	Percentage of test data in the input data. This parameter is mandatory when divide_by_time_or_rate is set to RATE. The value ranges from 0 to 1.
(user_features)	Yes	JSONArray	Input user feature extracted from the global feature file, which can be used for ranking model training after being processed properly. The feature must be defined in the User Attribute Configuration Table. [{ "feature_name": "age", "feature_type": "numerical", "feature_type":"BASIC_INFO", "feature_process_parameters": { "discrete_method": "equal_distance_discrete", "lower_limit": 0.0, "upper_limit": 120.0, "distance": 20 } }, { "feature_name": "user_tag", "feature_type": "map", "feature_type":"TAGS", "feature_process_parameters": { "value_preserve_number": 4 } }]
(item_features)	Yes	JSONArray	Input item feature extracted from the global feature file, which can be used for ranking model training after being processed properly. The feature must be defined in the Item Attribute Configuration List. [{ "feature_name": "product_name", "feature_type": "string", "feature_type":"BASIC_INFO", "feature_process_parameters": { } }, { "feature_name": "categories", "feature_type": "strArray", "feature_type":"BASIC_INFO", "feature_process_parameters": { "value_preserve_number": 3 } }]
(positive_behaviors)	Yes	List[String]	Sample of the positive behaviors that will be converted into a positive sample in the ranking data. The value must be the same as that of actionType in the User Operation Behavior Table. For example, [click,collect,purchase,share].
(negative_behaviors)	Yes	List[String]	Sample of the negative behaviors that will be converted into a negative sample in the ranking data. The value must be the same as that of actionType in the User Operation Behavior Table. For example, [view,dislike].

**Table 12** Features and their processing modes
Parameter	Mandatory	Type	Description
(feature_name)	Yes	String	Feature name
(feature_type)	Yes	String	User feature types: BASIC_INFO TAGS CONTEXT Item feature types: BASIC_INFO TAGS
(feature_value _type)	Yes	String	Feature value type. The options are as follows: Single-value enumeration (string): Character string type. Each value is processed as a character string. Most feature values belong to this type. Single-value number (numerical): Numerical type. Generally, feature values of this type need discretization to reduce feature dimensions. Multi-value enumeration (strArray): strArray type. Each feature value has variable length, for example, features of commodity categories and user interests. The ranking preprocessing operator normalizes all feature values to a unified length for subsequent processing. KV number (map): Map[String,Double] type. Each feature value is a variable-length key-value pair, for example, a user profile and an item profile. The ranking preprocessing operator normalizes all feature values to a unified length for subsequent processing.
(feature_process_parameters)	Yes	JSON	Each type of feature has a corresponding processing method whose parameters are provided by users. Example: { "discrete_method":"equal_distance_discrete", "lower_limit":0.0, "upper_limit":120.0, "distance":20 }

**Table 13** Discrete methods and parameters
Parameter			Mandatory	Type	Description
(discrete_method)	(equal_distance_discrete)	(lower_limit)	No	Double	If the feature value is less than the value of this parameter, the value is regarded as abnormal. You can specify this parameter based on business experience. If you do not specify this parameter, the minimum feature value in the data will be used. The value is [Double.Minvalue, Double.MaxValue). The value must be smaller than the maximum value of the parameter.
		(upper_limit)	No	Double	If the feature value is greater than the value of this parameter, the value is regarded as abnormal. You can specify this parameter based on business experience. If you do not specify this parameter, the maximum feature value in the data will be used. The value is (Double.Minvalue, Double.Maxvalue]. The value must be greater than the minimum value of the parameter.
		(distance)	Yes	Double	The feature range is divided into several segments by using the distance or an interval, and each segment corresponds to a discrete value. The value is (0, Double.Maxvalue).
	(equal_frequency_discrete)	(lower_limit)	No	Double	If the feature value is less than the value of this parameter, the value is regarded as abnormal. You can specify this parameter based on business experience. If you do not specify this parameter, the minimum feature value in the data will be used. The value is [Double.Minvalue, Double.Maxvalue). The value must be smaller than the maximum value of the parameter.
		(upper_limit)	No	Double	If the feature value is greater than the value of this parameter, the value is regarded as abnormal. You can specify this parameter based on business experience. If you do not specify this parameter, the maximum feature value in the data will be used. The value is (Double.Minvalue, Double.Maxvalue]. The value must be greater than the minimum value of the parameter.
		(frequency)	Yes	Int	The feature values are ranked in ascending order. Each value is separated as a segment, and each segment corresponds to a discrete value. The value is (0, Int.Maxvalue).
	(user_define_discrete)	(period_list)	Yes	JSONArray	The minimum value, maximum value, and discrete value of each period are defined by users. If a feature value is located between a minimum value and a maximum value of a period, it is the discrete value of this period. If the feature value is not within any periods defined by the user, it is treated as an abnormal value. Each period is a half-closed half-open interval, that is, a minimum value but not a maximum value is included. Different periods cannot overlap. Example: [ { "period_name": "young", "lower_limit": 0.0, "upper_limit": 18.0 } ,{ "period_name": "mid", "lower_limit": 18.0, "upper_limit": 60.0 } ,{ "period_name": "old", "lower_limit": 60.0, "upper_limit": 120.0 } ]

**Table 14** Custom discrete parameters
Parameter	Mandatory	Type	Description
(lower_limit)	Yes	Double	Minimum value of a period The value ranges from *Double.Minvalue* to Double.Maxvalue. The value must be smaller than the maximum value of the parameter.
(upper_limit)	Yes	Double	Maximum value of a period The value ranges from from Double.Minvalue to Double.Maxvalue. The value must be greater than the minimum value of the parameter.
(period_name)	Yes	String	Name of a period

**Table 15** **strArray** parameters
Parameter	Mandatory	Type	Description
(value_preserve_number)	No	Int	Number of preserved strArray feature values. If the actual value is greater than this value, the extra values are deleted. If the actual value is less than this value, all values are reserved. If this parameter is not specified, the maximum value of the strArray in data is used as the input value. The value ranges from 1 to 100.

**Table 16** KV number parameters
Parameter	Mandatory	Type	Description
(value_preserve_number)	No	Int	Number of preserved KV number feature values. If the actual value is greater than this value, the extra values are deleted. If the actual value is less than this value, all values are reserved. If this parameter is not specified, the maximum value of the KV number feature in the data is used as the input value. The value ranges from 1 to 100.

Response

Table 17 describes the response parameters.

**Table 17** Response parameters
Parameter	Type	Description
job_name	String	Job name
job_id	String	Job ID
is_success	Boolean	Whether the request is successful
error_message	String	Error message that indicates a request has failed. This parameter is unavailable when a request is successful.
error_code	String	Error code that indicates a request has failed. This parameter is unavailable when a request is successful.
create_time	Long	Time when a job is created
etl_uuid	String	Candidate set ID

Example

Example request

{
  "job_name": "ETL-rank_test1",
  "job_description": "hhx test",
  "algorithm_type": "BUILD_RANK_UNIFORM_DATA_FROM_JSON",
  "data_source": [
    {
      "table_type_id": "GENERAL_FORMAT",
      "data_format": "json",
      "data_source_url": "<Path for storing the data sources>",

      "start_time": ""
    }
  ],
  "algorithm_parameters": {
    "result_path": "<Path for storing all output data>",
    "global_features_information_path": "<Path for storing the global feature files>",
    "rank_etl_type": "LR",			
    "rank_etl_parameters": {
      "divide_by_time_or_rate": "RATE",
      "training_data_start_time": "1552117770165",
      "training_data_end_time": "1517414400000",
      "test_data_start_time": "1517414400000",
      "test_data_end_time": "1519217998000",
      "training_data_rate": "0.8",
      "test_data_rate": "0.2",
      "user_features": [
        {
          "feature_name": "provinceId",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "no_discrete"
          }
        },
        {
          "feature_name": "cityId",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "equal_distance_discrete",
            "lower_limit": 0,
            "upper_limit": 10000,
            "distance": 1000
          }
        },
        {
          "feature_name": "districtId",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "no_discrete"
          }
        },
        {
          "feature_name": "payment_type",
          "feature_type": "CONTEXT",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "no_discrete"
          }
        },
        {
          "feature_name": "payment_method",
          "feature_type": "CONTEXT",
          "feature_value_type": "string",
          "feature_process_parameters": {}
        },
        {
          "feature_name": "payment_channel",
          "feature_type": "CONTEXT",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "no_discrete"
          }
        },
        {
          "feature_name": "salary",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "user_define_discrete",
            "period_list": [
              {
                "period_name": "low",
                "lower_limit": 0,
                "upper_limit": 5000
              },
              {
                "period_name": "mid",
                "lower_limit": 5000,
                "upper_limit": 30000
              },
              {
                "period_name": "high",
                "lower_limit": 30000,
                "upper_limit": 100000
              }
            ]
          }
        },
        {
          "feature_name": "user_tags",
          "feature_type": "TAGS",
          "feature_value_type": "map",
          "feature_process_parameters": {
            "process_method": "map_format",
            "value_preserve_number": 4
          }
        },
        {
          "feature_name": "hobbies",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "strArray",
          "feature_process_parameters": {
            "process_method": "string_array_format",
            "value_preserve_number": 3
          }
        }
      ],
      "item_features": [
        {
          "feature_name": "product_name",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "string",
          "feature_process_parameters": {}
        },
        {
          "feature_name": "order_price",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "numerical",
          "feature_process_parameters": {
            "discrete_method": "equal_frequency_discrete",
            "frequency": 10
          }
        },
        {
          "feature_name": "weight",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "string",
          "feature_process_parameters": {}
        },
        {
          "feature_name": "volume",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "string",
          "feature_process_parameters": {}
        },
        {
          "feature_name": "categories",
          "feature_type": "BASIC_INFO",
          "feature_value_type": "strArray",
          "feature_process_parameters": {
            "process_method": "string_array_format",
            "value_preserve_number": 3
          }
        },
        {
          "feature_name": "item_tags",
          "feature_type": "TAGS",
          "feature_value_type": "map",
          "feature_process_parameters": {
            "process_method": "map_format",
            "value_preserve_number": 3
          }
        }
      ],
      "positive_behaviors": [
        "consume"
      ],
      "negative_behaviors": [
        "uncollect",
        "dislike"
      ]
    }
  },
  "offline_platform": {
    "platform": "DLI",
    "platform_parameter": {
      "cluster_name": "res_two"
    },
    "config_load_path": "<Path for storing the configuration sources>"
  },
  "storage": {}
}

Example of a successful response

{
    "is_success": true,
    "job_id": "d832b07540594ea980c140fea5a10849",
    "job_name": "gggggggggggggggggg",
    "create_time": "1543891781990",
    "etl_uuid": "a53a685c52f4476f833d256620b6fc80"
}

Example of a failed response

{
"is_success": false,
"error_code": "res.2006",
"error_msg": "The datasourceUrl(<Path for storing the data sources>) is not match Bucket structure."
}

Status Code

For details about status codes, see Status Codes.

Parent topic: Submitting Feature Engineering Jobs

Last Article: Submitting Feature Engineering Jobs

Next Article: Viewing Global Feature File Configurations

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English