Help Center/ DataArts Studio/ Best Practices/ How Do I View the Number of Table Rows and Database Size?
Updated on 2024-08-30 GMT+08:00

How Do I View the Number of Table Rows and Database Size?

In the data governance process, we often need to collect statistics on the number of rows in a data table or the size of a database. The number of rows in a data table can be obtained using SQL commands or data quality jobs. The database size can be viewed in the data catalog component. For details, see the following operation guide:

Counting the Number of Rows in a Data Table

For different types of data sources, DataArts Studio provides multiple methods to view the number of rows in a table.

  • For data sources such as DWS, DLI, RDS, MRS Presto, MRS Hive, MRS Spark, , you can run the SQL script of the corresponding type in the data development component to view the number of table rows.
    select count(*) from tablename
  • For data sources such as DWS, DLI, RDS, MRS Hive, MRS Spark , and Oracle, you can execute quality jobs in the data quality component to view the number of table rows.

For other data sources, you are advised to view the number of table rows on the data source side by referring to the operation description on the data source side.

Obtain the number of rows in a table using a DataArts Studio data quality job. This method can collect statistics on the number of rows in multiple tables in the same database at the same time.

  1. On the DataArts Studio console, locate a workspace and click DataArts Quality.
  2. Click Quality Job. The quality job list is displayed.
  3. Click Create. The quality job basic configuration page is displayed, as shown in the following figure.

    • Job name.
    • Directory: Select the directory where the job is stored.
    • Job Level: Retain the default value.
    Figure 1 Basic Configuration

  4. Click Next to go to the Define Rule page. Click the Open icon of the subjob. The subjob configuration page is displayed.

    Figure 2 Go to the subjob configuration page.

  5. Click the Open icon of a subjob. On the subjob configuration page that is displayed, configure rule information.

    • Basic Information: This parameter is optional. Retain the default value.
    • Object
      • Rule Type: Select Threshold Rule.
      • Data Connection: Select the data source connection created in the management center.
      • Data Object: Select the data table whose statistics are to be collected.
      • Retain the default values for other parameters.
    • Template
      • Template Name: Select Table Rows (DWS, HIVE, SparkSQL, ORACLE).
      • Retain the default values for other parameters.
    • Select All for Scanning Scope.
    • Alarm Condition: This parameter is optional. Retain the default value.
    Figure 3 Subtask Configuration

  6. Click Next to go to the Set Alarm Parameters page.

    Set Alarm Condition to Sub-rule Alarm Condition. The expression can be customized. In this example, the expression can be set to ${1}<=0, indicating that an alarm is triggered when the total number of rows is less than or equal to 0.
    Figure 4 Alarm Triggering Condition

  7. Click Next to go to the Configure Report page.

    If the notification status is enabled, you need to select a notification type and a topic. There are two notification types: Trigger Alarm and Running Success. You can select a notification type based on the actual service scenario.

  8. Click Next to go to the Configure Report page.

    There are two scheduling modes: One-time scheduling and Periodic scheduling. Select One-time scheduling for one-time statistics.

  9. Click Submit. The quality job list page is displayed.

    Figure 5 Quality job list

  10. In the Operation column of the CountingRows job, click Run to generate the instance corresponding to the job.
  11. Click O&M Management to go to the job instance list page and find the corresponding job instance. After the instance running is complete, click Result & Log. On the Running Result tab page, you can view the running result of the quality job, that is, the total number of rows in the table to be collected.

    Figure 6 Viewing the Total Number of Rows in a Table

statistics database

You can directly view the database size in the data catalog component.

  1. On the DataArts Studio console, locate a workspace and click DataArts Catalog.
  2. On the Asset Overview tab page of the Overview page, click the number of databases under Technical Assets to view the number and size of tables in each database.

    Figure 7 This API is used to query technical assets.
    Figure 8 View the volume of imported data.