Updated on 2025-10-14 GMT+08:00

Basic Functions of DataCheck

Statistical Value Check

  • Support data check when source database is DWS, MySQL, PostgreSQL, or BigQuery and destination database is DWS.
  • Check common fields, such as numeric, time, and character types.
  • Support three check levels, including high, middle, and low.
  • Check schemas, table names, and column names.
  • Specify the check scope of records. By default, all records are checked.
  • Support various check methods, including COUNT(*), MAX, MIN, SUM, and sampling details check.
  • Output the check result and related check details.
Table 1 Data check levels

Check Level

Description

Syntax

Low

Quantity check

Number of records: COUNT(*)

Middle

  • Quantity check
  • Numeric type check
  • Number of records: COUNT(*)
  • Value check: MAX, MIN, and SUM

High

  • Quantity check
  • Numeric type check
  • Date type check
  • Character type check
  • Number of records: COUNT(*)
  • Value check: MAX, MIN, and SUM
  • Date check: MAX, MIN
  • Character check: order by limit 1000, which reads the data and checks whether the content is the same.

Metadata Check

  • Table definition verification is supported when the source database is a DWS, MySQL, PostgreSQL, or BigQuery database and the destination database is DWS.
  • Four verification types are supported: character, integer, decimal, and time (including date).
  • If the column names and types match, the check is passed.

Precise Data Comparison

  • Precise data comparison is supported when the source database is a DWS, MySQL, PostgreSQL, or BigQuery database and the destination database is DWS.
  • This function is used to compare data by column in the tables at the source and destination databases based on the primary key or the unique record of the specified column ID, and outputs the comparison results, including the extra records and different column values at the source database or destination DWS database.
  • You are advised to perform precise data comparison during off-peak hours, as the operation consumes executor resources and occupies the load of databases at both ends.
  • For exact comparison, you need to specify the number of data records to be queried in batches (1000 by default) and the number of different results (100 by default. If the number of different results reaches the threshold, the precise comparison stops).