Creating Dimensions
Dimensional modeling involves dimensions, dimension tables, and fact tables.
A dimension is the perspective to observe and analyze business data and assist in data aggregation, drilling, slicing, and analysis, and used as a GROUP BY condition in SQL statements. Most dimensions have hierarchical structures, such as geographic dimensions (including countries, regions, provinces/states, and cities) and time dimensions (including annually, quarterly, and monthly dimensions).
Impact on the System
After a dimension is published and approved, the system automatically creates a dimension table corresponding to the dimension. The name and code of the dimension table are the same as those of the dimension.
Creating and Publishing a Dimension
- On the DataArts Studio console, locate a workspace and click DataArts Architecture.
- On the DataArts Architecture console, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- Select an object in the subject directory on the left and click Create.
Before creating a dimension, ensure that a subject is available. For details on how to add a subject, see Designing Subjects.
- On the page displayed, set the parameters.
Set the basic settings and physicalization settings as described below.
Figure 1 Dimension parameters
Table 1 Parameters in the Basic Settings area Parameter
Description
*Subject
Select a subject from the drop-down list box.
*Dimension Name
Newline characters and the following characters are not allowed: \ < > % " ' ;
*Dimension English Name
It can contain only letters, digits, and underscores (_), and must start with dim_.
*Type
- Basic: a dimension that does not have a hierarchical structure.
- Lookup Table: a dimension created based on a lookup table. The field information and data of the dimension are the same as those of the lookup table, indicating that the content is an enumerable dimension.
- Hierarchy: a dimension with a hierarchical structure between attributes.
Advanced Settings
Set custom items to describe the table. The custom items can be viewed in the table details.
For example, if you want to identify the source of the table, you can add item source and set its value to the table source information. Then you can view the table source information in the table details.
*Owner
You can enter an owner name or select an existing owner.
*Description
A description of the dimension to create. It allows 1 to 600 characters.
Table 2 Parameters in the Physicalization Settings area Parameter
Description
*Data Connection Type
Select a data connection type from the drop-down list box.
*Data Connection Name
The name of the data connection. Select the required data connection.
If no data connection is available, access Management Center to create one. For details, see Configuring DataArts Studio Data Connection Parameters.
*Database
The name of the database. Select a database from the drop-down list box. If no database is available, access DataArts Factory to create one. For details, see Creating a Database.
Queue
DLI queue. This parameter is displayed only for DLI data connections.
Schema
DWS or POSTGRESQL mode. This parameter is displayed only for DWS and POSTGRESQL data connections.
Table Type
DLI models support the following table types:- MANAGED: Data is stored in a DLI table.
- EXTERNAL: Data is stored in an OBS table. When Table Type is set to EXTERNAL, you must set OBS Path. The OBS path format is /bucket_name/filepath.
DWS models support the following table types:
- DWS_ROW: row-store table. Tables are stored to disk partitions by row.
- DWS_COLUMN: column-store table. Tables are stored to disk partitions by column.
- DWS_VIEW: view-store table. Tables are stored to disk partitions by view.
The MRS Hive model supports HIVE_TABLE and HIVE_EXTERNAL_TABLE.
The MRS Spark model supports HUDI_COW and HUDI_MOR.
The PostgreSQL model supports only POSTGRESQL_TABLE.
The MRS ClickHouse model supports only CLICKHOUSE_TABLE.
The Oracle model supports only ORACLE_TABLE.
The MySQL model supports only MYSQL_TABLE.
Compression Level
This parameter is available when the data connection type is DWS.
The following compression levels are available for different table types:
- DWS_ROW: NO and YES
- DWS_COLUMN: NO, LOW, MIDDLE, and HIGH.
- DWS_VIEW: The compression level is not supported.
Distributed By
This parameter is displayed only for DWS data connections. You can select multiple fields.
- REPLICATION: A full table is stored on each DN. The advantage of this option is that each DN has all the data of a table. During the join operation, data redistribution can be avoided, reducing network overhead. The disadvantage is that each DN retains the complete data of a table, resulting in data redundancy. Generally, this option is recommended for small dimension tables.
- HASH: If you select this option, you must specify a distribution key for the user table. When a record is inserted, the system performs hash computing based on values in the distribute keys and then stores data on the corresponding DN. In a Hash table, I/O resources on each node can be used during data read/write, which improves the read/write speed of a table. Generally, this option is recommended for large dimension tables (a large dimension table contains over 1 million records).
PreCombineField
This parameter is available only for Spark data connections.
Path
This parameter is available only when the data source is MRS Hive and table type is HIVE_EXTERNAL_TABLE.
The path can contain only letters, digits, slashes (/), periods (.), hyphens (-), underscores (_), and colons (:).
Add dimension fields in the Attribute Settings area. You can click Add to add multiple dimension fields.
Figure 2 Field configuration
Table 3 Parameters in the Attribute Settings area Parameter
Description
Name
Newline characters and the following characters are not allowed: \ < > % " ' ;
Field Code
Field codes must start with letters. Only letters, numbers, and underscores (_) are allowed.
Data Type
Type of data defined based on the original data.
Data Standard
Click to select a data standard to be associated with the field. If Create Data Quality Jobs is selected for Model Design Process on the Function Settings tab page in Configuration Center and a field is associated with a data standard, a quality job is automatically generated after a dimension is published. A quality rule is generated for each field associated with the data standard. The quality of the field is monitored based on the data standard. You can access the Quality Job page of DataArts Quality to view the job details.
If no data standard is available, create one. See Creating Data Standards for details.
Security Level
You can click to add a security level for the logical entity attribute.
If you cannot find the security level you want, click go to to go to the DataArts Security console and create a security level.
You can disable this function on the Models tab page on the Configuration Center page.
Surrogate Key
Select a field as the surrogate key based on project requirements. By default, the first dimension attribute is the surrogate key.
Primary Key
Select a field as the primary key based on project requirements.
NOTE:If an MRS Spark connection is used to connect to MRS Hudi data sources, data can be written to the database only if fields have primary keys. Otherwise, table synchronization fails.
Partition
Whether to be set as a partition field.
Not Null
Whether the parameter value can be left empty.
Description
A description of the dimension field you add.
Audit Status
Whether to audit the data standard
Operation
Related operations
On the Mapping Settings tab page, click Create to create a mapping between dimensions and physical tables. Set the parameters.
Figure 3 Mapping settings
Table 4 Parameters of mappings Parameter
Description
*Mapping
Only letters, numbers, and underscores (_) are allowed.
*Model
Select a created relationship model from the drop-down list box. If no relationship model has been created, create one. See ER Modeling.
*Table
Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.
- Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
- Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
- Click OK.
- If you want to delete a joined table after setting the JOIN condition, click next to the table name.
Figure 4 Join Condition dialog box
Field Mapping
Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.
In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.
- Click Publish.
In enterprise mode, you can choose to publish the table to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the dimension cannot be published.
- In the displayed dialog box, select a reviewer and click OK to submit a request for publishing the dimension.
If you have been added as a reviewer, you can select Auto-review and click OK. After the request is approved, the status changes to Published.
If you select multiple reviewers, the status changes to Published only after all reviewers have approved the publishing request. If any reviewer rejects the request, the status is Rejected.
- Repeat 3 to 6 to create other dimensions.
- All dimensions need reviewing.
After the application is approved, the system automatically creates a dimension table corresponding to the dimension. The name and code of the dimension table are the same as those of the dimension. On the Dimensional Modeling page, click the Dimension Tables tab to view the created dimension table.
In the dimension table list, you can view the synchronization status of the dimension table in the Sync Status column.
Figure 5 Sync Status of the dimension table
- If the synchronization is successful, the dimension is successfully published and the dimension table is successfully created in the database.
- If the synchronization failed, click View History in the row where the dimension table is located. On the page displayed, click the History tab to view logs. Troubleshoot the problem based on the logs. After the error is rectified, go back to the dimension table list and click Synchronize above the dimension table list to issue the synchronization command again. If the problem persists, contact technical support personnel.
In enterprise mode, you can choose to synchronize the table to the production or development environment. By default, it is synchronized to the production environment. If you do not choose an environment, the table cannot be synchronized.
Editing a Dimension
- On the DataArts Architecture page, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- Locate a dimension and click Edit in the Operation column.
- Edit the dimension information based on service requirements. For details about how to set parameters, see Dimension parameters.
- Click Save. Alternatively, click Publish to publish the edited dimension.
In enterprise mode, you can choose to publish the table to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the dimension cannot be published.
Publishing a Dimension
If a dimension is created but not published, perform the following steps to publish the dimension:
- On the DataArts Architecture page, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- In the dimension list, find the target dimension and click Publish in the Operation column.
- In the displayed dialog box, select a reviewer and click OK. After the request is approved, the dimension is published.
In enterprise mode, you can choose to publish the table to the production or development environment. By default, it is published to the production environment. If you do not choose an environment, the dimension cannot be published.
You can also perform the following steps to publish multiple dimensions:
- On the DataArts Architecture page, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- Select the dimensions you want to publish and click Publish above the dimension list.
- In the displayed dialog box, select a reviewer, set the job scheduling time, and click OK.
In enterprise mode, you can choose to publish the table to the production or development environment. By default, they are published to the production environment. If you do not choose an environment, the dimensions cannot be published.
DataArts Quality Job Scheduling Time refers to the scheduling time for automatic quality job creation after the dimension is published.Figure 6 Publishing multiple dimensions
Suspending a Dimension
To suspend a published dimension, perform the following steps:
- On the DataArts Architecture page, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- In the dimension list, find the target dimension and choose in the Operation column.
- In the displayed dialog box, select a reviewer and click OK. After the request is approved, the dimension is suspended.
You can also perform the following steps to publish multiple dimensions:
- On the DataArts Architecture page, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- Select dimensions, click More above the dimension list, and select Suspend.
- In the displayed dialog box, select a reviewer and click OK. After the request is approved, the dimension is suspended.
Deleting a Dimension
If a dimension is no longer needed, you can delete it. However, if the dimension has been published, you must suspend the dimension before deleting it. For details, see Suspending a Dimension.
- On the DataArts Architecture page, choose Dimensions tab. in the left navigation pane. On the displayed page, click the
- In the dimension list, find the target dimension and choose above the list.
- In the Delete Dimension dialog box, confirm the information and click Yes.
If you select Delete physical tables in the dialog box, the physical tables in the database are also deleted when you delete the dimension.
Importing a Dimension by Reversing a Database
By reversing databases, you can import one or more created database tables from other data sources into a dimension directory to turn them into dimensions.
- On the DataArts Architecture console, choose in the navigation pane on the left.
- Above the dimension list, click Reverse Database.
- In the displayed dialog box, set required parameters and click OK.
Table 5 Parameters for reversing the database Parameter
Description
*Subject
Select a subject from the drop-down list.
*Data Connection Type
The data connection types supported by the reverse database are displayed in the drop-down list box. Select the required data connection type.
*Data Connection Name
Select a data connection.
If you want to reverse a database from other data sources to a dimension directory, you must create a data connection in Management Center to connect to the data source. For details on how to create data connections, see Configuring DataArts Studio Data Connection Parameters.
*Database
Select a database.
*Schema
Select a value from the drop-down list box. This parameter is available only for DWS and PostgreSQL tables.
Queue
DLI queue. This parameter is available only when Data Connection Type is set to DLI.
Update Table
When Yes is selected, if the name of the reversed table is the same as that of an existing table in the dimension, the existing dimension is updated.
Name Source
Source of the table name or field name after the reverse. The value can be Description or Name. If no description is specified for a table or field, the name is used.
- Description
- Name
*Data Table
You can select All or Partial.
Figure 7 Reverse Database dialog box
- You can view the result on the Last Reverse tab page. If the reverse operation is successful, click Close. If the reverse operation fails, you can view the failure cause. After the fault is rectified, select the table again and click Reverse to retry.
Figure 8 Last Reverse tab page
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot