Updated on 2022-09-23 GMT+08:00

Creating Dimensions

A dimension is the perspective to observe and analyze business data and assist in data aggregation, drilling, slicing, and analysis, and used as a GROUP BY condition in SQL statements. Most dimensions have hierarchical structures, such as geographic dimensions (including countries, regions, provinces/states, and cities) and time dimensions (including annually, quarterly, and monthly dimensions). Creating a dimension is a way to standardize the existence and uniqueness of business entities (also called primary data) from the top down.

Impact on the System

After a dimension is published and approved, the system automatically creates a dimension table corresponding to the dimension. The name and code of the dimension table are the same as those of the dimension.

Creating and Publishing a Dimension

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Architecture.
    Figure 1 DataArts Architecture
  2. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  3. Select an object from the subject directory on the left and click Create.

    Before creating a dimension, ensure that a subject is available. For details on how to add a subject, see Designing Subjects.

  4. On the page displayed, set the parameters.

    Set the basic settings and physicalization settings as described below.

    Figure 2 Dimension parameters
    Table 1 Parameters in the Basic Settings area

    Parameter

    Description

    Subject

    Select a subject from the drop-down list box.

    Name

    It must start with letters. Only letters, digits, and the following special characters are allowed: ()-_

    Dimension Code

    Only letters, numbers, and underscores (_) are allowed. Dimension codes must start with letters.

    Type

    • Basic: a dimension that does not have a hierarchical structure.
    • Lookup Table: a dimension created based on a lookup table. The field information and data of the dimension are the same as those of the lookup table, indicating that the content is an enumerable dimension.
    • Hierarchy: a dimension with a hierarchical structure between attributes.

    Owner

    You can enter an owner name or select an existing owner.

    Description

    A description of the dimension to create. Up to 600 characters are supported.

    Table 2 Parameters in the Physicalization Settings area

    Parameter

    Description

    Data Connection Type

    Select a data connection type from the drop-down list box.

    Data Connection

    The name of the data connection. Select the required data connection.

    If no data connection is available, access Management Center to create one. For details, see Creating Data Connections.

    Database

    The name of the database. Select a database from the drop-down list box. If no database is available, access DataArts Factory to create one. For details, see Creating a Database.

    Queue

    DLI queue. This parameter is displayed only for DLI data connections.

    Schema

    DWS or POSTGRESQL mode. This parameter is displayed only for DWS and POSTGRESQL data connections.

    Table Type

    DWS table type. Available values include:

    • DWS_ROW: Tables are stored to disk partitions by row.
    • DWS_COLUMN: Tables are stored to disk partitions by column.

    The MRS_HIVE model supports only HIVE_TABLE.

    Distributed By

    This parameter is displayed only for DWS data connections. You can select multiple fields.

    • REPLICATION: A full table is stored on each DN. The advantage of this option is that each DN has all the data of a table. During the join operation, data redistribution can be avoided, reducing network overhead. The disadvantage is that each DN retains the complete data of a table, resulting in data redundancy. Generally, this option is recommended for small dimension tables.
    • HASH: If you select this option, you must specify a distribution key for the user table. When a record is inserted, the system performs hash computing based on values in the distribute keys and then stores data on the corresponding DN. In a Hash table, I/O resources on each node can be used during data read/write, which improves the read/write speed of a table. Generally, this option is recommended for large dimension tables (a large dimension table contains over 1 million records).

    Add dimension fields in the Attribute Settings area. You can click Add to add multiple dimension fields.

    Figure 3 Field configuration
    Table 3 Parameters in the Attribute Settings area

    Parameter

    Description

    Name

    It must start with letters. Only letters, digits, and the following special characters are allowed: ()-_

    Code

    Field codes must start with letters. Only letters, numbers, and underscores (_) are allowed.

    Data Standard

    Click to select a data standard to be associated with the field. If Create Data Quality Jobs is selected for Model Design Process on the Function Settings tab page in Configuration Center and a field is associated with a data standard, a quality job is automatically generated after a dimension is published. A quality rule is generated for each field associated with the data standard. The quality of the field is monitored based on the data standard. You can access the Quality Job page of DataArts Quality to view the job details.

    If no data standard is available, create one. See Creating Data Standards for details.

    Data Type

    Type of data defined based on the original data.

    Surrogate Key

    Select a field as the surrogate key based on project requirements. By default, the first dimension attribute is the surrogate key.

    Primary Key

    Select a field as the primary key based on project requirements.

    Partition

    Whether to be set as a partition field.

    Not Null

    Whether the parameter value can be left empty.

    Description

    A description of the dimension field you add.

    On the Mapping Settings tab page, click Create to create the mapping between dimensions and fact tables. Set the parameters.

    Figure 4 Mapping settings
    Table 4 Parameters of mappings

    Parameter

    Description

    Mapping

    Only letters, numbers, and underscores (_) are allowed.

    Model

    Select a created relationship model from the drop-down list box. If no relationship model has been created, create one. See Designing Physical Models.

    Table

    Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.

    1. Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
    2. Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
    3. Click OK.
    4. If you want to delete a joined table after setting the JOIN condition, click next to the table name.
    Figure 5 Join Condition dialog box

    Field Mapping

    Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.

    In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.

  5. Click Publish.
  6. In the dialog box displayed, select a reviewer and click OK.
  7. Repeat 3 to 6 to create and publish other dimensions.
  8. All dimensions must be approved by reviewers.

    After the application is approved, the system automatically creates a dimension table corresponding to the dimension. The name and code of the dimension table are the same as those of the dimension. On the Dimensional Modeling page, click the Dimension Tables tab to view the created dimension table.

    In the dimension table list, you can view the synchronization status of the dimension table in the Sync Status column.

    Figure 6 Sync Status of the dimension table
    • If the synchronization is successful, the dimension is successfully published and the dimension table is successfully created in the database.
    • If the synchronization failed, click View History in the row where the dimension table is located. On the page displayed, click the History tab to view logs. Troubleshoot the problem based on the logs. After the error is rectified, go back to the dimension table list and click Synchronize above the dimension table list to issue the synchronization command again. If the problem persists, contact technical support personnel.

Editing a Dimension

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, select the target dimension and click Edit in the Operation column.

    Figure 7 Editing a dimension

  3. Edit the dimension information based on service requirements. For details about how to set parameters, see Dimension parameters.
  4. Click Save. Alternatively, click Publish to publish the edited dimension.

Publishing a Dimension

If a dimension is created but not published, perform the following steps to publish the dimension:

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, find the target dimension and click Publish in the Operation column.

    Figure 8 Publishing a dimension

  3. In the dialog box displayed, select a reviewer and click OK.

You can also perform the following steps to publish multiple dimensions:

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. Select the dimensions you want to publish and click Publish above the dimension list.

    Figure 9 Publishing multiple dimensions

  3. In the displayed dialog box, select a reviewer, set Job Scheduling Time, and click OK.

    Job Scheduling Time refers to the scheduling time for automatic quality job creation after the dimension is published.
    Figure 10 Publishing multiple dimensions

Suspending a Dimension

To suspend a published dimension, perform the following steps:

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, find the target dimension and choose More > Suspend in the Operation column.

    Figure 11 Suspending a dimension

  3. In the dialog box displayed, select a reviewer and click OK. The dimension is suspended after the reviewer approves it.

Deleting a Dimension

If a dimension is no longer needed, you can delete it. However, if the dimension has been published, you must suspend the dimension before deleting it. For details, see Suspending a Dimension.

  1. On the DataArts Architecture page, choose Models > Dimensional Modeling in the left navigation pane. On the displayed page, click the Dimensions tab.
  2. In the dimension list, find the target dimension and choose More > Delete above the list.

    Figure 12 Deleting a dimension

  3. In the Delete Dimension dialog box, confirm the information and click Yes.

    If you select Delete physical tables in the dialog box, the physical tables in the database are also deleted when you delete the dimension.