Updated on 2024-04-03 GMT+08:00

Creating Dimensions

A dimension is the perspective to observe and analyze business data and assist in data aggregation, drilling, slicing, and analysis, and used as a GROUP BY condition in SQL statements. Most dimensions have hierarchical structures, such as geographic dimensions (including countries, regions, provinces/states, and cities) and time dimensions (including annually, quarterly, and monthly dimensions).

Impact on the System

After a dimension is published and approved, the system automatically creates a dimension table corresponding to the dimension. The name and code of the dimension table are the same as those of the dimension.

Creating and Publishing a Dimension

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Architecture.
    Figure 1 DataArts Architecture

  2. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  3. Select an object from the subject directory on the left and click Create.

    Before creating a dimension, ensure that a subject is available. For details on how to add a subject, see Designing Subjects.

  4. On the page displayed, set the parameters.

    Set the basic settings and physicalization settings as described below.

    Figure 2 Dimension parameters

    Table 1 Parameters in the Basic Settings area

    Parameter

    Description

    *Subject

    Select a subject from the drop-down list box.

    *Dimension Name

    It must start with letters. Only letters, digits, and the following special characters are allowed: ()-_

    *Dimension English Name

    Only letters, numbers, and underscores (_) are allowed. Dimension codes must start with letters.

    *Type

    • Basic: a dimension that does not have a hierarchical structure.
    • Lookup Table: a dimension created based on a lookup table. The field information and data of the dimension are the same as those of the lookup table, indicating that the content is an enumerable dimension.
    • Hierarchy: a dimension with a hierarchical structure between attributes.

    Advanced Settings

    Set custom items to describe the table. The custom items can be viewed in the table details.

    For example, if you want to identify the source of the table, you can add item source and set its value to the table source information. Then you can view the table source information in the table details.

    *Owner

    You can enter an owner name or select an existing owner.

    *Description

    A description of the dimension to create. It allows 1 to 600 characters.

    Table 2 Parameters in the Physicalization Settings area

    Parameter

    Description

    *Data Connection Type

    Select a data connection type from the drop-down list box.

    *Data Connection Name

    The name of the data connection. Select the required data connection.

    If no data connection is available, access Management Center to create one. For details, see Managing Data Connections.

    *Database

    The name of the database. Select a database from the drop-down list box. If no database is available, access DataArts Factory to create one. For details, see Creating a Database.

    Queue

    DLI queue. This parameter is displayed only for DLI data connections.

    Schema

    DWS or POSTGRESQL mode. This parameter is displayed only for DWS and POSTGRESQL data connections.

    Table Type

    DLI models support the following table types:
    • MANAGED: Data is stored in a DLI table.
    • EXTERNAL: Data is stored in an OBS table. When Table Type is set to EXTERNAL, you must set OBS Path. The OBS path format is /bucket_name/filepath.

    DWS models support the following table types:

    • DWS_ROW: row-store table. Tables are stored to disk partitions by row.
    • DWS_COLUMN: column-store table. Tables are stored to disk partitions by column.
    • DWS_VIEW: view-store table. Tables are stored to disk partitions by view.

    The MRS Hive model supports HIVE_TABLE and HIVE_EXTERNAL_TABLE.

    The MRS Spark model supports HUDI_COW and HUDI_MOR.

    The PostgreSQL model supports only POSTGRESQL_TABLE.

    The MRS ClickHouse model supports only CLICKHOUSE_TABLE.

    The Oracle model supports only ORACLE_TABLE.

    The MySQL model supports only MYSQL_TABLE.

    Compression Level

    This parameter is available when the data connection type is DWS.

    The following compression levels are available for different table types:

    • DWS_ROW: NO and YES
    • DWS_COLUMN: NO, LOW, MIDDLE, and HIGH.
    • DWS_VIEW: The compression level is not supported.

    Distributed By

    This parameter is displayed only for DWS data connections. You can select multiple fields.

    • REPLICATION: A full table is stored on each DN. The advantage of this option is that each DN has all the data of a table. During the join operation, data redistribution can be avoided, reducing network overhead. The disadvantage is that each DN retains the complete data of a table, resulting in data redundancy. Generally, this option is recommended for small dimension tables.
    • HASH: If you select this option, you must specify a distribution key for the user table. When a record is inserted, the system performs hash computing based on values in the distribute keys and then stores data on the corresponding DN. In a Hash table, I/O resources on each node can be used during data read/write, which improves the read/write speed of a table. Generally, this option is recommended for large dimension tables (a large dimension table contains over 1 million records).

    PreCombineField

    This parameter is available only for Spark data connections.

    Path

    This parameter is available only when the data source is MRS Hive and table type is HIVE_EXTERNAL_TABLE.

    The path can contain only letters, digits, slashes (/), periods (.), hyphens (-), underscores (_), and colons (:).

    Add dimension fields in the Attribute Settings area. You can click Add to add multiple dimension fields.

    Figure 3 Field configuration

    Table 3 Parameters in the Attribute Settings area

    Parameter

    Description

    Name

    It must start with letters. Only letters, digits, and the following special characters are allowed: ()-_

    Field Code

    Field codes must start with letters. Only letters, numbers, and underscores (_) are allowed.

    Data Type

    Type of data defined based on the original data.

    Data Standard

    Click to select a data standard to be associated with the field. If Create Data Quality Jobs is selected for Model Design Process on the Function Settings tab page in Configuration Center and a field is associated with a data standard, a quality job is automatically generated after a dimension is published. A quality rule is generated for each field associated with the data standard. The quality of the field is monitored based on the data standard. You can access the Quality Job page of DataArts Quality to view the job details.

    If no data standard is available, create one. See Creating Data Standards for details.

    Security Level

    You can click to add a security level for the logical entity attribute.

    If you cannot find the security level you want, click go to to go to the DataArts Security console and create a security level.

    You can disable this function on the Models tab page on the Configuration Center page.

    Surrogate Key

    Select a field as the surrogate key based on project requirements. By default, the first dimension attribute is the surrogate key.

    Primary Key

    Select a field as the primary key based on project requirements.

    NOTE:

    If an MRS Spark connection is used to connect to MRS Hudi data sources, data can be written to the database only if fields have primary keys. Otherwise, table synchronization fails.

    Partition

    Whether to be set as a partition field.

    Not Null

    Whether the parameter value can be left empty.

    Description

    A description of the dimension field you add.

    Audit Status

    Whether to audit the data standard

    Operation

    Related operations

    On the Mapping Settings tab page, click Create to create a mapping between dimensions and physical tables. Set the parameters.

    Figure 4 Mapping settings

    Table 4 Parameters of mappings

    Parameter

    Description

    *Mapping

    Only letters, numbers, and underscores (_) are allowed.

    *Model

    Select a created relationship model from the drop-down list box. If no relationship model has been created, create one. See Designing Physical Models.

    *Table

    Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.

    1. Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
    2. Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
    3. Click OK.
    4. If you want to delete a joined table after setting the JOIN condition, click next to the table name.
    Figure 5 Join Condition dialog box

    Field Mapping

    Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.

    In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.

  5. Click Publish.

    You can choose to publish the dimension to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the dimension cannot be published.

  6. In the dialog box displayed, select a reviewer and click OK.
  7. Repeat 3 to 6 to create and publish other dimensions.
  8. All dimensions must be approved by reviewers.

    After the application is approved, the system automatically creates a dimension table corresponding to the dimension. The name and code of the dimension table are the same as those of the dimension. On the Dimensional Modeling page, click the Dimension Tables tab to view the created dimension table.

    In the dimension table list, you can view the synchronization status of the dimension table in the Sync Status column.

    Figure 6 Sync Status of the dimension table
    • If the synchronization is successful, the dimension is successfully published and the dimension table is successfully created in the database.
    • If the synchronization failed, click View History in the row where the dimension table is located. On the page displayed, click the History tab to view logs. Troubleshoot the problem based on the logs. After the error is rectified, go back to the dimension table list and click Synchronize above the dimension table list to issue the synchronization command again. If the problem persists, contact technical support personnel.

      You can choose to synchronize the table to the production or development environment. By default, it is synchronized to the production environment. If you do not choose an environment, the table cannot be synchronized.

Editing a Dimension

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, select the target dimension and click Edit in the Operation column.

    Figure 7 Editing a dimension

  3. Edit the dimension information based on service requirements. For details about how to set parameters, see Dimension parameters.
  4. Click Save. Alternatively, click Publish to publish the edited dimension.

    You can choose to publish the dimension to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the dimension cannot be published.

Publishing a Dimension

If a dimension is created but not published, perform the following steps to publish the dimension:

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, find the target dimension and click Publish in the Operation column.

    Figure 8 Publishing a dimension

  3. In the dialog box displayed, select a reviewer and click OK.

    You can choose to publish the dimension to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the dimension cannot be published.

You can also perform the following steps to publish multiple dimensions:

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. Select the dimensions you want to publish and click Publish above the dimension list.

    Figure 9 Publishing multiple dimensions

  3. In the displayed dialog box, select a reviewer, set Job Scheduling Time, and click OK.

    You can choose to publish the dimensions to the production or development environment. By default, they are published to the production environment. If you do not choose an environment, the dimensions cannot be published.

    Job Scheduling Time refers to the scheduling time for automatic quality job creation after the dimension is published.
    Figure 10 Publishing multiple dimensions

Suspending a Dimension

To suspend a published dimension, perform the following steps:

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, find the target dimension and choose More > Suspend in the Operation column.

    Figure 11 Suspending a dimension

  3. In the dialog box displayed, select a reviewer and click OK. The dimension is suspended after the reviewer approves it.

Deleting a Dimension

If a dimension is no longer needed, you can delete it. However, if the dimension has been published, you must suspend the dimension before deleting it. For details, see Suspending a Dimension.

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, find the target dimension and choose More > Delete above the list.

    Figure 12 Deleting a dimension

  3. In the Delete Dimension dialog box, confirm the information and click Yes.

    If you select Delete physical tables in the dialog box, the physical tables in the database are also deleted when you delete the dimension.

Importing a Dimension by Reversing a Database

By reversing databases, you can import one or more created database tables from other data sources into a dimension directory to turn them into dimensions.

  1. On the DataArts Architecture console, choose Models > Dimensional Modeling in the navigation pane on the left.
  2. Above the dimension list, click Reverse Database.
  3. In the displayed dialog box, set required parameters and click OK.

    Table 5 Parameters for reversing the database

    Parameter

    Description

    *Subject

    Select a subject from the drop-down list.

    *Data Connection Type

    The data connection types supported by the reverse database are displayed in the drop-down list box. Select the required data connection type.

    *Data Connection Name

    Select a data connection.

    If you want to reverse a database from other data sources to a dimension directory, you must create a data connection in Management Center to connect to the data source. For details on how to create data connections, see Managing Data Connections.

    *Database

    Select a database.

    *Schema

    Select a value from the drop-down list box. This parameter is available only for DWS and PostgreSQL tables.

    Queue

    DLI queue. This parameter is available only when Data Connection Type is set to DLI.

    Update Table

    When Yes is selected, if the name of the reversed table is the same as that of an existing table in the dimension, the existing dimension is updated.

    *Data Table

    You can select All or Partial.

    Figure 13 Reverse Database dialog box

  4. You can view the result on the Last Reverse tab page. If the reverse operation is successful, click Close. If the reverse operation fails, you can view the failure cause. After the fault is rectified, select the table again and click Reverse to retry.

    Figure 14 Last Reverse tab page