APIs for Submitting Data Quality Jobs
Function
This API is used to submit data quality jobs and perform offline computing tasks.
URI
POST /v1/{project_id}/data-quality
Table 1 describes the URI parameters.
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| project_id | Yes | String | Project ID, which is used for resource isolation. For details about how to obtain the project ID, see Obtaining a Project ID. |
Request
Table 2 describes the details about request parameters.
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| workspace_id | No | String | Workspace ID. The default value is 0. |
| job_name | Yes | String | Training job name. The value can contain a maximum of 20 characters. and must start with DataQuality-. |
| job_description | No | String | Training job description. The value can contain a maximum of 256 characters. |
| algorithm_type | Yes | String | Algorithm type:
|
| algorithm_parameters | Yes | JSON | Algorithm parameter. Each kind of algorithm has specified parameters.
|
| data_source | Yes | List | Algorithm data source:
|
| offline_platform | Yes | JSON | Offline computing platform. For details, see Table 3. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| platform | Yes | String | Platform name. The value can contain a maximum of 64 characters. Currently, only DLI is supported. |
| platform_parameter | Yes | JSON | Platform parameter. For details, see Table 4. |
| computing_resource | No | String | Resource specifications required for the normal running of the DLI jobs. |
| config_load_path | Yes | String | Path to read the configuration sources. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| cluster_name | Yes | String | Cluster name |
| cluster_id | No | String | Cluster ID |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| table_type_id | Yes | String | General data templates:
For details about the data format, see Offline Data Sources. General format
|
| data_source_url | Yes | String | Data source path. The value can contain a maximum of 1000 characters. |
| data_format | Yes | String | Input data format. The value can be csv, parquet, json, or orc. |
| data_param | No | JSON | Data parameter. For details, see Table 6. This parameter is mandatory when the data format is csv and optional for other data formats. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| header | Yes | Boolean | Whether to display the table header |
| delimiter | Yes | String | Delimiter. The value can contain a maximum of 10 characters. |
| quote | Yes | String | Quotation character. The value can contain a maximum of 10 characters. |
| escape | Yes | String | Escape character. The value can contain a maximum of 10 characters. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| result_path | Yes | String | Path to access folders that houses all output data (error data and information) |
| global_features_information_path | Yes | String | Global feature file (JSON) that contains the feature names, feature types, and feature value types. For details about the global feature file, see Viewing Global Feature File Configurations. |
Response
Table 8 describes the response parameters.
| Parameter | Type | Description |
|---|---|---|
| job_name | String | Job name |
| job_id | String | Job ID |
| is_success | Boolean | Whether the request is successful |
| error_message | String | Error message that indicates a request has failed. This parameter is unavailable when a request is successful. |
| error_code | String | Error code that indicates a request has failed. This parameter is unavailable when a request is successful. |
| create_time | Long | Time when a job is created |
| etl_uuid | String | Candidate set ID |
Example
- Example request
{ "job_name": "DataQuality-ll", "job_description": "hhx test", "algorithm_type": "DATA_QUALITY_INSPECTION", "algorithm_parameters": { "result_path": "<Path for storing the output data>", "global_features_information_path": "<Path for storing the global feature files>" }, "offline_platform": { "platform": "DLI", "platform_parameter": { "cluster_name": "res_cluster" }, "config_load_path": "<Path for storing the configuration sources>", "computing_resource": "" }, "data_source": [{ "table_type_id": "USER_META", "data_format": "csv", "data_source_url": "<Path for storing the data sources>", "data_param": { "header": "false", "delimiter": ",", "quote": "\"", "escape": "\\" } }, { "table_type_id": "USER_META_CONF", "data_format": "csv", "data_source_url": "<Path for storing the data sources>", "data_param": { "header": "true", "delimiter": ",", "quote": "\"", "escape": "\\" } }, { "table_type_id": "ITEM_META", "data_format": "csv", "data_source_url": "<Path for storing the data sources>", "data_param": { "header": "false", "delimiter": ",", "quote": "\"", "escape": "\\" } }, { "table_type_id": "ITEM_META_CONF", "data_format": "csv", "data_source_url": "<Path for storing the data sources>", "data_param": { "header": "true", "delimiter": ",", "quote": "\"", "escape": "\\" } }, { "table_type_id": "USER_BEHAVIOR", "data_format": "csv", "data_source_url": "<Path for storing the data sources>", "data_param": { "header": "false", "delimiter": ",", "quote": "\"", "escape": "\\" } }] }
- Example of a successful response
{ "is_success": true, "job_id": "59c3a237731b4ebfbf561d765b04def7", "filter_uuid": "5efc448313fb4dbf95e1e6cc307b92d6" } - Example of a failed response
{ "is_success": false, "error_code": "res.2006", "error_msg": "The datasourceUrl(<Path for storing the data sources>) is not match Bucket structure." }
Status Code
For details about status codes, see Status Codes.
Last Article: Submitting Data Quality Jobs
Next Article: Viewing Global Feature File Configurations
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.