APIs for Submitting Data Quality Jobs

Function

This API is used to submit data quality jobs and perform offline computing tasks.

URI

POST /v1/{project_id}/data-quality

Table 1 describes the URI parameters.

Table 1 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID.

Request

Table 2 describes the details about request parameters.

Table 2 Request parameters

Parameter

Mandatory

Type

Description

workspace_id

No

String

Workspace ID. The default value is 0.

job_name

Yes

String

Training job name. The value can contain a maximum of 20 characters. and must start with DataQuality-.

job_description

No

String

Training job description. The value can contain a maximum of 256 characters.

algorithm_type

Yes

String

Algorithm type:

  • DATA_QUALITY_INSPECTION

algorithm_parameters

Yes

JSON

Algorithm parameter. Each kind of algorithm has specified parameters.

  • DATA_QUALITY_INSPECTION. For details, see Table 7.

data_source

Yes

List

Algorithm data source:

  • DATA_QUALITY_INSPECTION. Select the general template data as the data source. For details, see Table 5.

offline_platform

Yes

JSON

Offline computing platform. For details, see Table 3.

Table 3 offline_platform parameters

Parameter

Mandatory

Type

Description

platform

Yes

String

Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported.

platform_parameter

Yes

JSON

Platform parameter. For details, see Table 4.

computing_resource

No

String

Resource specifications required for the normal running of the DLI jobs.

config_load_path

Yes

String

Path to read the configuration sources.

Table 4 platform_parameter parameters

Parameter

Mandatory

Type

Description

cluster_name

Yes

String

Cluster name

cluster_id

No

String

Cluster ID

Table 5 data_source parameters

Parameter

Mandatory

Type

Description

table_type_id

Yes

String

General data templates:

  • USER_META: User Attribute Table
  • ITEM_META: Item Attribute List
  • USER_BEHAVIOR: User Operation Behavior Table

For details about the data format, see Offline Data Sources.

General format

  • GENERAL_FORMAT

data_source_url

Yes

String

Data source path. The value can contain a maximum of 1000 characters.

data_format

Yes

String

Input data format. The value can be csv, parquet, json, or orc.

data_param

No

JSON

Data parameter. For details, see Table 6. This parameter is mandatory when the data format is csv and optional for other data formats.

Table 6 data_param parameters

Parameter

Mandatory

Type

Description

header

Yes

Boolean

Whether to display the table header

delimiter

Yes

String

Delimiter. The value can contain a maximum of 10 characters.

quote

Yes

String

Quotation character. The value can contain a maximum of 10 characters.

escape

Yes

String

Escape character. The value can contain a maximum of 10 characters.

Table 7 algorithm_parameters parameters (the DATA_QUALITY_INSPECTION operator)

Parameter

Mandatory

Type

Description

result_path

Yes

String

Path to access folders that houses all output data (error data and information)

global_features_information_path

Yes

String

Global feature file (JSON) that contains the feature names, feature types, and feature value types. For details about the global feature file, see Viewing Global Feature File Configurations.

Response

Table 8 describes the response parameters.

Table 8 Response parameters

Parameter

Type

Description

job_name

String

Job name

job_id

String

Job ID

is_success

Boolean

Whether the request is successful

error_message

String

Error message that indicates a request has failed. This parameter is unavailable when a request is successful.

error_code

String

Error code that indicates a request has failed. This parameter is unavailable when a request is successful.

create_time

Long

Time when a job is created

etl_uuid

String

Candidate set ID

Example

  • Example request
    {
    	"job_name": "DataQuality-ll",
    	"job_description": "hhx test",
    	"algorithm_type": "DATA_QUALITY_INSPECTION",
    	"algorithm_parameters": {
    		"result_path": "<Path for storing the output data>",
    		"global_features_information_path": "<Path for storing the global feature files>"
    	},
    	"offline_platform": {
    		"platform": "DLI",
    		"platform_parameter": {
    			"cluster_name": "res_cluster"
    		},
    		"config_load_path": "<Path for storing the configuration sources>",
    		"computing_resource": ""
    	},
    "data_source": [{
    		"table_type_id": "USER_META",
    		"data_format": "csv",
    		"data_source_url": "<Path for storing the data sources>",
    		"data_param": {
    			"header": "false",
    			"delimiter": ",",
    			"quote": "\"",
    			"escape": "\\"
    		}
    	}, {
    		"table_type_id": "USER_META_CONF",
    		"data_format": "csv",
    		"data_source_url": "<Path for storing the data sources>",
    		"data_param": {
    			"header": "true",
    			"delimiter": ",",
    			"quote": "\"",
    			"escape": "\\"
    		}
    	}, {
    		"table_type_id": "ITEM_META",
    		"data_format": "csv",
    		"data_source_url": "<Path for storing the data sources>",
    		"data_param": {
    			"header": "false",
    			"delimiter": ",",
    			"quote": "\"",
    			"escape": "\\"
    		}
    	}, {
    		"table_type_id": "ITEM_META_CONF",
    		"data_format": "csv",
    		"data_source_url": "<Path for storing the data sources>",
    		"data_param": {
    			"header": "true",
    			"delimiter": ",",
    			"quote": "\"",
    			"escape": "\\"
    		}
    	}, {
    		"table_type_id": "USER_BEHAVIOR",
    		"data_format": "csv",
    		"data_source_url": "<Path for storing the data sources>",
    		"data_param": {
    			"header": "false",
    			"delimiter": ",",
    			"quote": "\"",
    			"escape": "\\"
    		}
    	}]
    }
  • Example of a successful response
    {
        "is_success": true,
        "job_id": "59c3a237731b4ebfbf561d765b04def7",
        "filter_uuid": "5efc448313fb4dbf95e1e6cc307b92d6"
    }
  • Example of a failed response
    {
    "is_success": false,
    "error_code": "res.2006",
    "error_msg": "The datasourceUrl(<Path for storing the data sources>) is not match Bucket structure."
    }

Status Code

For details about status codes, see Status Codes.