Updated on 2024-10-23 GMT+08:00

Creating Fact Tables

A fact table for a business process can provide a wealth of information about specific business processes. After a fact table is created, the public affair details are accumulated to facilitate data extraction.

Creating and Publishing a Fact Table

  1. On the DataArts Studio console, locate a workspace and click DataArts Architecture.
  2. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Fact Tables tab.
  3. Select a subject from the subject tree on the left and click Create, or click Create a directly.
  4. On the Create Fact Table page, perform the following operations:
    1. Set the parameters in the Basic Settings area.
      Figure 1 Basic Settings area

      Table 1 Parameters in the Basic Settings area

      Parameter

      Description

      *Subject

      Select a subject (business domain group > business domain > business object) where you can place the fact table.

      *Table Name

      Newline characters and the following characters are not allowed: \ < > % " ' ;

      *Table English Name

      It must start with fact_. Only letters, digits, and underscores (_) are allowed.

      *Data Connection Type

      Select a data connection type from the drop-down list box.

      *Data Connection Name

      Select a data connection from the drop-down list box. It is recommended that the same data connection be used for dimension modeling.

      *Database

      Select a database from the drop-down list box.

      Queue

      DLI queue. This parameter is displayed only for DLI data connections.

      Schema

      DWS or POSTGRESQL mode. This parameter is displayed only for DWS and POSTGRESQL data connections.

      Table Type

      DLI models support the following table types:
      • MANAGED: Data is stored in a DLI table.
      • EXTERNAL: Data is stored in an OBS table. When Table Type is set to EXTERNAL, you must set OBS Path. The OBS path format is /bucket_name/filepath.

      DWS models support the following table types:

      • DWS_ROW: row-store table. Tables are stored to disk partitions by row.
      • DWS_COLUMN: column-store table. Tables are stored to disk partitions by column.
      • DWS_VIEW: view-store table. Tables are stored to disk partitions by view.

      The MRS Hive model supports HIVE_TABLE and HIVE_EXTERNAL_TABLE.

      The MRS Spark model supports HUDI_COW and HUDI_MOR.

      The PostgreSQL model supports only POSTGRESQL_TABLE.

      The MRS_CLICKHOUSE model supports only CLICKHOUSE_TABLE.

      The Oracle model supports only ORACLE_TABLE.

      The MySQL model supports only MYSQL_TABLE.

      Compression Level

      This parameter is available when the data connection type is DWS.

      The following compression levels are available for different table types:

      • DWS_ROW: NO and YES
      • DWS_COLUMN: NO, LOW, MIDDLE, and HIGH.
      • DWS_VIEW: The compression level is not supported.

      Distributed By

      This parameter is displayed only for DWS data connections. You must add a table field before selecting a table field from the drop-down list as a Distributed By field. Multiple table fields can be selected.

      Currently, only REPLICATION and HASH are supported.

      • REPLICATION: A full table is stored on each DN. The advantage of this option is that each DN has all the data of a table. During the join operation, data redistribution can be avoided, reducing network overhead. The disadvantage is that each DN retains the complete data of a table, resulting in data redundancy. Generally, this option is recommended for small dimension tables.
      • HASH: If you select this option, you must specify a distribution key for the user table. When a record is inserted, the system performs hash computing based on values in the distribute keys and then stores data on the corresponding DN. In a Hash table, I/O resources on each node can be used during data read/write, which improves the read/write speed of a table. Generally, this option is recommended for large dimension tables (a large dimension table contains over 1 million records).

      PreCombineField

      This parameter is available only for Spark data connections.

      Path

      This parameter is available only when the data source is MRS Hive and table type is HIVE_EXTERNAL_TABLE.

      The path can contain only letters, digits, slashes (/), periods (.), hyphens (-), underscores (_), and colons (:).

      *Owner

      You can enter an owner name or select an existing owner.

      Advanced Settings

      Set custom items to describe the table. The custom items can be viewed in the table details.

      For example, if you want to identify the source of the table, you can add item source and set its value to the table source information. Then you can view the table source information in the table details.

      *Description

      A description of the fact table. It allows 1 to 600 characters.

    2. On the Field Settings page, click Create and select Dimension or Measure to add a dimension or measure field.
      • If you select Dimension, the Select Dimension page is displayed. Select a dimension (from the public workspace or current workspace), select a model for dimensional modeling, select one or more created dimension tables, and click OK. The dimension tables and their attribute values are added to the list.
      • If you select Measure, set required parameters to add a measure field.
      For details about the field parameters, see Table 2. After adding a field, you can click or to move the field up or down.
      Figure 2 Adding a dimension or measure field
      Table 2 Field parameters

      Parameter

      Description

      Type

      Two types are available: Measure and Dimension.

      Name

      Newline characters and the following characters are not allowed: \ < > % " ' ;

      The added dimension tables and their attribute values are automatically displayed for the dimension attribute field. Generally, you do not need to modify them.

      Field Code

      Table codes must start with letters. Only letters, numbers, and underscores (_) are allowed.

      Data Type

      Data type of the created dimension

      Primary Key

      If this parameter is selected, the field is a primary key.

      NOTE:

      If an MRS Spark connection is used to connect to MRS Hudi data sources, data can be written to the database only if fields have primary keys. Otherwise, table synchronization fails.

      Partition

      If this parameter is selected, the field is a partition field.

      Not Null

      Whether the parameter value can be left empty.

      Associate Standard

      If you have created data standards, click to select one to associate with the field. If Create Data Quality Jobs is selected for Model Design Process on the Function Settings tab page in Configuration Center and a field is associated with a data standard, a quality job is automatically generated after a table is published. A quality rule is generated for each field associated with the data standard. The quality of the field is monitored based on the data standard. You can access the Quality Job page of DataArts Quality to view the job details.

      Alternatively, click Import from Data Standard and select a data standard to associate with the field.

      If no data standard is available, create one. See Creating Data Standards for details.

      Security Level

      You can click to add a security level for the logical entity attribute.

      If you cannot find the security level you want, click go to to go to the DataArts Security console and create a security level.

      You can disable this function on the Models tab page on the Configuration Center page.

      Associate Dimension

      Only dimension fields need to be associated with dimensions.

      Name of the associated dimension and field. Click to replace the associated dimension.

      If the public workspace is enabled, you can select the public workspace dimension.

      Role

      Roles need to be assigned to dimension fields which are added for multiple times. This is not required for measure fields.

      If a field of a dimension is added multiple times, set different roles to distinguish the dimensions.

      Description

      A description of the dimension.

      Audit Status

      Whether to audit the data standard

      Click Audit Data Standard to audit data standards.

      Operation

      Related operations

    3. On the Mapping Settings tab page, click Create Mapping and set mapping parameters.
      Figure 3 Configuring mapping parameters

      Table 3 Parameters of mappings

      Parameter

      Description

      *Mapping

      Only letters, numbers, and underscores (_) are allowed.

      *Model

      Select a created relationship model from the drop-down list box. If no relationship model has been created, create one. See ER Modeling.

      *Table

      Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.

      1. Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
      2. Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
      3. Click OK.
      4. If you want to delete a joined table after setting the JOIN condition, click next to the table name.
      Figure 4 Join Condition dialog box

      Field Mapping

      Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.

  5. Click Publish. In the displayed dialog box, select a reviewer and click OK to submit a request for publishing the fact table.

    In enterprise mode, you can choose to publish the table to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the fact table cannot be published.

    If you have been added as a reviewer, you can select Auto-review and click OK. After the request is approved, the status changes to Published.

    If you select multiple reviewers, the status changes to Published only after all reviewers have approved the publishing request. If any reviewer rejects the request, the status is Rejected.

  6. Wait for the reviewer to approve the fact table.

    After the fact table is approved, it is automatically created in the database.

  7. Go back to the fact table list and locate the table just published. View its synchronization status in the Sync Status column. You can switch between the production environment and development environment to view the synchronization result.
    • If the synchronization is successful, the fact table is successfully published and created in the database.
    • If the synchronization failed, choose More > View History in the row where the fact table is located. On the page displayed, click the History tab to view logs. Troubleshoot the problem based on the logs. After the error is rectified, choose More > Synchronize above the fact table list to issue the synchronization command again. If the problem persists, contact technical support personnel.

      In enterprise mode, you can choose to synchronize the table to the production or development environment. By default, it is synchronized to the production environment. If you do not choose an environment, the fact table cannot be synchronized.

Managing a Fact Table

After a fact table is created, you can access the Fact Tables page of Dimensional Modeling in DataArts Architecture. On the page displayed, you can edit, publish, suspend, and delete the fact table, as well as view the publish logs.

Figure 5 Fact table management

  • Editing a fact table
    1. In the fact table list, select a fact table and click Edit to the right of it. The page for editing the fact table is displayed.
    2. Edit the table as required.
    3. Click Save to save the settings, or click Publish to publish the settings.

      In enterprise mode, you can choose to publish the table to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the fact table cannot be published.

  • Publishing a fact table
    1. In the fact table list, select a fact table and click Publish. The dialog box for publishing the fact table is displayed.
    2. Select a reviewer from the drop-down list.

      In enterprise mode, you can choose to publish the table to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the fact table cannot be published.

    3. Click OK.
  • Viewing the publish history
    1. Select a fact table in the list and choose More > View History on the right.
    2. On the page displayed, you can view the publish history and version comparison information of the fact table.

      If the publish log includes error logs, the publishing has failed. You can click Resynchronize to synchronize the table to other DataArts Studio modules.

  • Associating a fact table with a quality rule
    1. Select a fact table in the fact table list and click Associate Rule above the list.
    2. In the Associate Quality Rule dialog box, you can add rules to the fields in the fact table in batches and associate the rules with the fields.
    3. Click OK.
  • Previewing an SQL statement
    1. Select a fact table in the list and choose More > Preview SQL on the right.
    2. On the page displayed, you can view or copy the SQL statement.
  • Create a conversion
    1. DWS_COLUMN: NO, LOW, MIDDLE, and HIGH.
    2. For details about how to create a derivative metric, see Creating and Publishing a Derivative Metric.
  • Suspending a fact table
    1. In the fact table list, select a fact table and click Suspend. The dialog box for suspending a fact table is displayed.
    2. Select a reviewer from the drop-down list box.
    3. Click OK.
      • You can suspend or delete a fact table only when it is not referenced. For example, a fact table can be deleted only when it is not used by atomic metrics.
  • Deleting a fact table

    If you no longer need a fact table, you can delete it. Fact tables in publishing review, published, or suspension review state cannot be deleted.

    1. In the fact table list, select a fact table and choose More > Delete above the list.
    2. In the dialog box displayed, click Yes.

Associating a Fact Table with a Quality Rule

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane.
  2. Click the Fact Tables tab.
  3. In the fact table list, click the name of the target fact table. Click Associate Rule.
    Figure 6 Associating a fact table with a quality rule
  4. On the page displayed, set the parameters. After the configuration is complete, click OK.
    • Update Existing Rule: If this option is selected, the newly added rule will overwrite the old rule.
    • Table Field: This parameter applies to all fields by default. You can enter a regular expression to filter fields as required.
    • WHERE Clause: This parameter can be used to filter fields.
    • Generate Anomaly Data: If this option is enabled, anomaly data is stored in the specified database based on the configured parameters.
    • Database/Schema: database or schema that stores anomaly data. This parameter is displayed when Generate Anomaly Data is enabled.
    • Table Prefix: prefix of the table that stores anomaly data. This parameter is displayed when Generate Anomaly Data is enabled.
    • Table Suffix: suffix of the table that stores anomaly data. This parameter is displayed when Generate Anomaly Data is enabled.
    • Add Rule: You can click Add Rule to add a rule. For example, add a rule named Unique value, select the rule, click OK, enter an alarm condition expression in the Alarm Condition text box, add other rules in the same way, and click OK.
    • An alarm condition expression consists of alarm parameters and logical operators. When a quality job is running, the system calculates the result of the alarm condition expression and determines whether to trigger the alarm based on the result of the expression. If the expression result is true, the alarm will be triggered. Otherwise, no quality alarm will be triggered. In the Associate Quality Rule dialog box, the alarm parameters of each quality rule are displayed as buttons.
    Figure 7 Associating a fact table with a quality rule

Creating a Field in the Fact Table

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane.
  2. Click the Fact Tables tab.
  3. In the fact table list, click the name of the target table and click Edit in the Operation column.
  4. Click Create in the Table Fields area, select a new field type from the drop-down list, and set the related parameters.
    Figure 8 Creating a field
  5. After the configuration is complete, click OK.

Associating a Fact Table Field with a Data Standard

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane.
  2. Click the Fact Tables tab.
  3. Click the name of the target fact table in the list.
  4. In the table field list on the details page of the fact table, search for the target field, click corresponding to the field to configure the association between the field and the data standard. For details on the sources of data standards, see Creating a Data Standard.
    Figure 9 Associating a fact table field with a data standard
  5. After the configuration is complete, click OK. If a public workspace is available, you need to manually set the data standard source to the public workspace or the current workspace when selecting a data standard in a common workspace. When Public workspace is enabled, the data standards of the public workspace can be referenced in common workspaces.
    Figure 10 Associating a data standard

Associating a Fact Table Field with a Quality Rule

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane.
  2. Click the Fact Tables tab.
  3. In the fact table list, click the name of the target fact table.
  4. In the table field list on the fact table details page, locate the target field and click to associate the field with a quality rule.
    Figure 11 Associating a fact table field with a quality rule
  5. After the configuration is complete, click OK.
    Figure 12 Adding a rule

Associating Fact Table Fields with a Quality Rule in Batches

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane.
  2. Click the Fact Tables tab.
  3. In the fact table list, click the name of the target fact table.
  4. In the table field list on the fact table details page, select the target table fields and click Associate Rule.
    Figure 13 Associating fact table fields with a quality rule
  5. On the page displayed, add a rule and set the rule parameters.
    Figure 14 Adding a rule
  6. After the configuration is complete, click OK.

Importing a Fact Table by Reversing a Database

By reversing databases, you can import one or more created database tables from other data sources into a fact table directory to turn them into fact tables.

  1. On the DataArts Architecture console, choose Models > Dimensional Modeling in the navigation pane on the left.
  2. Above the fact table list, click Reverse Database.
  3. In the displayed dialog box, set required parameters and click OK.

    Table 4 Parameters for reversing the database

    Parameter

    Description

    *Subject

    Select a subject from the drop-down list.

    *Data Connection Type

    The data connection types supported by the reverse database are displayed in the drop-down list box. Select the required data connection type.

    *Data Connection Name

    Select a data connection.

    If you want to reverse a database from other data sources to a fact table directory, you must create a data connection in Management Center to connect to the data source. For details on how to create data connections, see Configuring DataArts Studio Data Connection Parameters.

    *Database

    Select a database.

    *Schema

    Select a value from the drop-down list box. This parameter is available only for DWS and PostgreSQL tables.

    Queue

    DLI queue. This parameter is available only when Data Connection Type is set to DLI.

    Update Table

    When Yes is selected, if the name of the reversed table is the same as that of an existing fact table, the existing fact table is updated.

    Name Source

    Source of the table name or field name after the reverse. The value can be Description or Name. If no description is specified for a table or field, the name is used.

    • Description
    • Name

    *Data Table

    You can select All or Partial.

    Figure 15 Reverse Database dialog box

  4. You can view the result on the Last Reverse tab page. If the reverse operation is successful, click Close. If the reverse operation fails, you can view the failure cause. After the fault is rectified, select the table again and click Reverse to retry.

    Figure 16 Last Reverse tab page