Updated on 2022-09-23 GMT+08:00

Creating Rule Templates

DataArts Quality can monitor offline data, in which quality rules play a vital role. There are 26 built-in rule templates, such as database-level, table-level, field-level, and cross-field rule templates.

Table 1 System built-in rule templates

Rule Type

Dimension

Template

Description

Database-level

Integrity

Database null value scan

Calculates the number of rows in all the tables of a database in which all fields have a null value.

Table-level

Accuracy

Table rows

Calculates the number of rows in a data table.

Integrity

Data table null value scan

Calculates the number of rows in a table in which all fields have a null value.

Field-level

Uniqueness

Field with a unique value

Calculates the number of rows in a data table in which a specified field has a unique value.

Field with duplicate values

Calculates the number of rows in a data table in which a specified field has duplicate values.

Unique combination of multiple fields

Checks whether the combination of multiple fields in a DWS table is unique. A maximum of 10 fields can be combined.

Integrity

Field with a null value

Calculates the number of rows in a data table in which a specified field has a null value.

Accuracy

Average field value

Calculates the average value of a specified field in a data table.

Total field values

Calculates the total values of a specified field in a data table.

Maximum field value

Calculates the maximum value of a specified field in a data table.

Minimum field value

Calculates the minimum value of a specified field in a data table.

Field length verification

Checks whether the length of a field in the DWS table is within the allowed range.

Field value range verification

Checks whether the value of a field in the DWS table is within the allowed range.

Field time verification

Checks whether the time of a field in the DWS table is within the allowed range.

Currently, only fields of the date and timestamp types are supported. Fields of the time type is not supported.

Effectiveness

ID card verification

Verifies the validity of a specified field in a data table based on built-in regular expression rules.

Mailbox verification

Verifies the validity of a specified field in a data table based on built-in regular expression rules.

Regular expression verification

Verifies the validity of a specified field in a data table based on a custom regular expression.

IP address verification

Verifies the validity of a specified field in a data table based on built-in regular expression rules.

Phone number format verification

Verifies the validity of a specified field in a data table based on built-in regular expression rules.

Postal code format verification

Verifies the validity of a specified field in a data table based on built-in regular expression rules.

Date format verification

Verifies the validity of a specified field in a data table based on built-in regular expression rules.

Validity verification

Verifies the validity of a specified field in a data table based on a custom regular expression.

Enumerated value verification

Verifies the validity of a specified field in a data table based on a custom enumerated value.

Ignoring of null values in enumerated value verification

Verifies the validity of a specified field in a data table based on a custom enumerated value. Null values are counted in valid rows.

Cross-field level

Consistency

Field consistency verification

Checks consistency between different fields from the same data source.

Accuracy

Cross-field time verification

Checks whether the time relationship between a specified field in the data table and the reference field meets the expectation.

Currently, only fields of the date and timestamp types are supported. Fields of the time type is not supported.

If the built-in rule templates do not meet your requirements, you can create rules in either of the following ways:
  • Custom template: Choose Quality Monitoring > Rule Templates and click Create. The created rule template will be automatically classified into the corresponding rule type and displayed as a custom template. A quality job using a custom template does not support exceptional data output and quality scoring.
  • Custom rule: When creating a quality job, set Rule Type to Custom rule and enter an SQL statement to define how to monitor the quality of data objects.

This section describes how to create a rule using a custom template. For details about how to create a custom rule, see Creating Quality Jobs.

  1. Choose Quality Monitoring > Rule Template, and click Create on the page displayed.

    Figure 1 Rule Template page

  2. In the dialog box displayed, enter the rule template name, select the rule matching dimension, define the SQL template, and describe the output result.

    • Dimension: You can complete single-column, cross-column, cross-row, and cross-table analysis from six dimensions: completeness, validity, timeliness, consistency, accuracy, and uniqueness. When customizing a quality rule, select a dimension for rule matching.
    • Directory: Select the directory where the rule is located.
    • Define Relationship: Enter SQL statements to search for data.

      For example, to count the number of rows in a table, enter select count(${Column1}) from ${Schema_Table1}. The value of ${Column1} is generated by clicking Add Field Parameter, and the value of ${Schema_Table1} is generated by clicking Add Database/Table Parameter.

    • Output Description describes each column in the SQL result. Column descriptions are separated by commas (,).
      For example, if the relationship is set to select max (${Column1}), min(${Column2}) from ${Schema_Table1}, the output result is Maximum value,Minimum value. The result description must correspond to the output result sequence defined by the relationship.
      Figure 2 Configuring a rule template
    • Abnormal Table Template: You need to enter a complete SQL statement to specify the abnormal data to be exported.

  3. After you click Yes, the system publishes the rule template by default. The default version is V1.0.

Managing a Rule Template

A published version of a custom rule template cannot be directly modified. If you want to modify a rule template, you can publish a new version. In addition, you can suspend the historical version and migrate jobs associated with the historical version to the new version.

  1. On the DataArts Quality homepage, select Rule Template from the left navigation bar. Locate the target rule template in the displayed list and click Publish in the Operation column.

    Figure 3 Publishing a rule template

  2. Dimensions and output description can be modified, and relationships can be redefined.
  3. Click Publish. In the displayed dialog box, set the version.

    Figure 4 Publishing a new version

  4. After the rule template is submitted for publishing, you can click View Publish History in the Operation column. You can view the publish history, change the version, and suspend the version.

    Figure 5 Publish History page

  5. To suspend a historical version, click Suspend on the right of the historical version.

    • If the version is not associated with any job, click OK to suspend it.
    • If the version has associated jobs, select a new version, associate the jobs with the new version, and click OK.
      Figure 6 Migrating and suspending a version

  6. On the Version Comparison tab page, you can compare the versions to see their differences.

    Figure 7 Version comparison

Exporting Rule Templates

To export custom rule templates, perform the following steps (you can export a maximum of 200 rule templates at a time):

  1. In the left navigation pane, choose Quality Monitoring > Rule Templates, and select the templates to export in the right pane.
  2. Click Export. The Export Rule Template dialog box is displayed.
  3. Click Export to switch to the Export Records tab.
  4. In the list of exported files, locate an exported template and click Download in the Operation column to download the Excel file of the rule template to the local PC.

Importing Rule Templates

You can import a file containing a maximum of 4 MB data.

  1. In the left navigation pane, choose Quality Monitoring > Rule Templates. In the right pane, click Import.

  2. On the Import Configurations tab page, set Duplicate Name Policy.

    • Terminate: If template names repeat, all templates will fail to be imported.
    • Skip: If template names repeat, the templates will still be imported.

  3. Click Upload and select the prepared data file.

    Edit the data file using either of the following methods:

    • (Recommended) Click Export to export data and import the data to the system directly or import it after modification.
    • Click Download Template, fill in the template with the data to import, and import the data to the system.

  4. Configure the mapped resource for the catalog and select the directory where rule template has been imported.

  5. Click Import to import the Excel template to the system.
  6. Click the Import Records tab to view the import records.