Designing Physical Models
A physical model is a physical description about the conversion of elements such as entities, attributes, attribute constraints, and relationships from a logical model to a table relationship diagram that can be identified by database software using certain rules and methods.
On the ER Modeling page, you can create an SDI and a DWI layer. The models are implemented through physical modeling. In addition to converting a logical model to a physical model, you can directly create a physical model.
The following parts are included in this topic:
Considerations in Physical Model Design
- Physical models must ensure that the required functions are available and their performance is as good as expected.
- Physical models must ensure data consistency and quality.
- Few or no changes are made to the physical models when new services or functions are added.
Creating a Physical Model
- On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Architecture.
Figure 1 DataArts Architecture
- On the DataArts Architecture page, choose in the left navigation pane.
- On the ER Modeling page, if no ER model has been created, the system displays a dialog box asking you to create one. If you have created ER models before, click to create models.
Figure 2 Creating a hierarchical governance model
Figure 3 ER Modeling page
- In the dialog box displayed, set the parameters and click OK.
Figure 4 Creating a model
Table 1 Parameters for creating a physical model Parameter
Description
Name
Only letters, numbers, and underscores (_) are allowed.
Data Connection Type
Select a data connection type from the drop-down list box.
Data Warehouse Layer
Select SDI or DWI.
- SDI stands for Source Data Integration and is the source data layer. SDI is a simple implementation of source system data.
- DWI stands for Data Warehouse Integration, also called the data consolidation layer. DWI integrates and cleans data from multiple source systems, and implements entity relationship modeling based on the three normal forms.
Description
A description of the ER model. Up to 600 characters are supported.
Creating and Publishing a Table
After creating a DLI, POSTGRESQL, DWS or MRS Hive ER model, you can create a business table in the model.
- On the DataArts Architecture page, choose in the left navigation pane.
- Select the physical model for which you want to create a table, click the physical model to access the model management page, and click Create.
Figure 5 Entry for creating a table
- On the Create Table page, set the parameters as required.
- Set the basic parameters.
Figure 6 Basic Settings tab page
Table 2 Parameters on the Basic Settings tab page Parameter
Description
Subject
Select a subject from the drop-down list box.
Name
The name of the table to create. Table names must start with letters. Only letters, numbers, and the following special characters are allowed: ()-_
Table Code
The code of the table to create. Table codes cannot start with numbers. Only letters, numbers, and the following special characters are allowed: _${}
Data Connection Type
N/A
Data Connection
The name of the data connection. Select the required data connection. You are advised to use the same data connection for an ER model.
If no data connection is available, access Management Center to create one. For details, see Creating Data Connections.
Database
The name of the database. Select a database from the drop-down list box.
Queue
DLI queue. This parameter is available only for DLI tables.
Schema
Schema of DWS or PostgreSQL This parameter is available only for DWS and PostgreSQL tables.
Table Type
DLI models support the following table types:- MANAGED: Data is stored in a DLI table.
- EXTERNAL: Data is stored in an OBS table. When Table Type is set to EXTERNAL, you must set OBS Path. The OBS path format is /bucket_name/filepath.
DWS models support the following table types:
- DWS_ROW: Tables are stored to disk partitions by row.
- DWS_COLUMN: Tables are stored to disk partitions by column.
- DWS_VIEW: Tables are stored to disk partitions by view.
The MRS_HIVE model supports only HIVE_TABLE.
Data Format
This parameter is available only for DLI tables. DLI models support the following table types:
- Parquet: DLI can read non-compressed data or Parquet data that is compressed using Snappy and GZIP.
- CSV: DLI can read non-compressed data or CSV data that is compressed using GZIP.
- ORC: DLI can read non-compressed data or ORC data that is compressed using Snappy.
- JSON: DLI can read non-compressed data or JSON data that is compressed using GZIP.
- Carbon: DLI can read non-compressed Carbon data.
- Avro: DLI can read non-compressed Avro data.
Advanced Settings
Set custom items to describe the table. The custom items can be viewed in the table details.
For example, if you want to identify the source of the table, you can add item source and set its value to the table source information. Then you can view the table source information in the table details.
Tag
Tags are custom identifiers that help you classify and search for data assets. After adding a tag, you can search for related data assets in the DataArts Catalog module with ease.
Click . In the dialog box displayed, select one or more existing tags, or enter a new tag name and press Enter. Then press OK. You can also go to the Tags page of the DataArts Catalog module to add a tag. Then, return to this page and select the newly added tag from the drop-down list box. For details, see Tags.
Owner
You can enter an owner name or select an existing owner.
Description
A description of the table. It allows 1 to 600 characters.
- Click Add to add required fields on the Table Fields page.
Figure 7 Adding required table fields
Table 3 Parameters on the Table Fields tab page Parameter
Description
Name
It must start with letters. Only letters, digits, and the following special characters are allowed: ()-_
Code
Only letters, numbers, and underscores (_) are allowed. A field code must start with a letter.
Data Type
Field data type. If the required data type does not exist, you can add one. See Data Types.
Data Standard
If you have created data standards, click to select one to associate with the field. If Create Data Quality Jobs is selected for Model Design Process on the Function Settings tab page in Configuration Center and a field is associated with a data standard, a quality job is automatically generated after a table is published. A quality rule is generated for each field associated with the data standard. The quality of the field is monitored based on the data standard. You can access the Quality Job page of DataArts Quality to view the job details.
If no data standard is available, create one. See Creating Data Standards for details.
Primary Key
If this parameter is selected, the field is a primary key.
Partition
If this parameter is selected, the field is a partition field.
Not Null
Whether the parameter value can be left empty.
Tag
Click to add a tag.
- In the dialog box displayed, select one or more existing tags. If no tag has been added, you can go to the Tags page of the DataArts Catalog module to add a tag. For details, see Tags.
- In the dialog box displayed, enter a new tag name and press Enter. Tag names can contain letters, numbers, and underscores (_), but cannot start with underscores (_).
Description
A description of the field to add.
- (Optional) On the Relationships tab page, click Add to create a relationship.
A relationship refers to the association between a parent and a child table (also called a primary and a secondary table). It describes how a table is associated with another table, or the impact of a table's behavior on another table. Relationships between tables in a data model are particularly important and must be accurately defined. Otherwise, the actual business rules cannot be accurately described in the data model, and data consistency is greatly damaged.
For example, if the student ID attribute of a score table is the primary key for a student table, the relationship between the two tables designed according to the third normal form (3NF) is as follows:- Child table: score table
- Child table field FK: student ID
- Child to parent:
- Parent table: student table
- Parent table field PK: student ID
- Parent to child:
Figure 8 (Optional) Adding a relationship
Table 4 Parameters on the Relations tab page Parameter
Description
Name
Name of the relationship
Child Table
Select a table from the drop-down list box. Click to set the current table as a child table.
For example, if the student ID attribute of a score table is the primary key for a student table, the child table is the score table, and the corresponding parent table is the student table.
Child Table Field FK
Foreign key of the child table. The field of the child table must be the foreign key of the parent table.
For example, if the student ID attribute of a score table is the primary key for a student table, the child table field FK is the student ID in the score table.
Child to Table
indicates that each piece of data in the child table corresponds to only one piece of data in the parent table.
indicates that each piece of data in the child table corresponds to at most one piece of data in the parent table.
indicates that one piece of data in the child table corresponds to multiple pieces of data in the parent table.
indicates that each piece of data in the child table corresponds to at least one piece of data in the parent table.
Parent to Child
indicates that each piece of data in the parent table corresponds to only one piece of data in the child table.
indicates that each piece of data in the parent table corresponds to at most one piece of data in the child table.
indicates that one piece of data in the parent table corresponds to multiple pieces of data in the child table.
indicates that one piece of data in the parent table corresponds to at least one piece of data in the child table.
Parent Table
Select the parent table corresponding to the selected child table.
For example, if the student ID attribute of a score table is the primary key for a student table, the parent table is the student table, and the corresponding child table is the score table.
Parent Table Field PK
Primary key of the parent table. The field of the parent table must be the primary key of the parent table.
For example, if the student ID attribute of a score table is the primary key for a student table, the parent table field PK is the student ID in the student table.
Role
You can customize a role name to identify the relationship.
Operation
Click to delete a relationship. Click to edit the relationship.
- (Optional) On the Mappings tab page, click Create to create a mapping and design a data source based on the created mapping.
- If the table field comes from different relationship models, you must create multiple mappings.
Currently, table data can be obtained from ER models of different connection types. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.
For example, if the data of the first five fields and the last five fields in the current table comes from two different models, create the following mappings:
- map1: Create a table named table01 from ER model A. In the Field Mapping area, set the source fields of the first to fifth fields to the corresponding fields with the same meaning in table01. The last five fields do not need to be set.
- map2: Create a table named table02 from ER model B. In the Field Mapping area, set the source fields of the sixth to tenth fields to the corresponding fields with the same meaning in table02. The first five fields do not need to be set.
- If the field data in a table comes from multiple tables in the same ER model, you can create a mapping.
In the source table of the mapping, you can set JOIN conditions for multiple tables, and then set source fields for the fields in the table. The selected source fields must have the same meanings as the fields in the table.
For example, all fields in the current table come from ER model d1, the first, second, and third fields come from the vendor, payment_type, and rate tables respectively, and other fields come from the dwd_taxi_trip_data table.
You can create a mapping, as shown in Figure 9. Join the dwd_taxi_trip_data table with the vendor, payment_type, and rate tables, and set the source fields in sequence in the field mapping.
For details on the parameters for creating a mapping, see Table 5.
Table 5 Parameters of mappings Parameter
Description
Mapping
Only letters, numbers, and underscores (_) are allowed.
Model
Select a created relationship model from the drop-down list box. If no relationship model has been created, create one. See Designing Physical Models.
Table
Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.
- Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
- Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
- Click OK.
- If you want to delete a joined table after setting the JOIN condition, click next to the table name.
Figure 10 Join Condition dialog box
Field Mapping
Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.
In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.
- If the table field comes from different relationship models, you must create multiple mappings.
- (Optional) If the type of the new table is DWS_VIEW, click Create to create a view.
Figure 11 Creating a view
Table 6 Parameters Parameter
Description
Mapping
Only letters, numbers, and underscores (_) are allowed.
Table
Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.
- Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
- Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
- Click OK.
- If you want to delete a joined table after setting the JOIN condition, click next to the table name.
Figure 12 Join Settings dialog box
Field Mapping
Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.
In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.
- Set the basic parameters.
- Click Publish, select a reviewer, and click Submit.
- Wait for the reviewer to approve the application. After the application is approved, return to the ER Modeling page to view the table status and synchronization status.
Publishing is an asynchronous operation. You can click to refresh the status. After table publishing application is approved, the system performs operations such as creating tables and synchronizing technical assets and business assets based on the configurations of Model Design Process on the Function Settings tab page in Configuration Center. The synchronization status is displayed in the Sync Status column of the table on the Information Architecture page.
- If the synchronization is successful, the table is successfully published. Move the cursor to in the Sync Status column. If the message indicating "creation succeeded" is displayed, the table has been successfully created in the corresponding data source.
- If one or more items fail to be synchronized, you can refresh the status. If the fault persists, choose
Troubleshoot the problem based on the logs. After the error is rectified, click Resynchronize on the History tab page to issue the synchronization command again. If the synchronization still fails, contact technical support for assistance.
and click the Publish Log tab to view logs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.