Updated on 2022-09-23 GMT+08:00

Designing Physical Models

A physical model is a physical description about the conversion of elements such as entities, attributes, attribute constraints, and relationships from a logical model to a table relationship diagram that can be identified by database software using certain rules and methods.

On the ER Modeling page, you can create an SDI and a DWI layer. The models are implemented through physical modeling. In addition to converting a logical model to a physical model, you can directly create a physical model.

The following parts are included in this topic:

Considerations in Physical Model Design

  • Physical models must ensure that the required functions are available and their performance is as good as expected.
  • Physical models must ensure data consistency and quality.
  • Few or no changes are made to the physical models when new services or functions are added.

Creating a Physical Model

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Architecture.
    Figure 1 DataArts Architecture
  2. On the DataArts Architecture page, choose Models > ER Modeling in the left navigation pane.
  3. On the ER Modeling page, if no ER model has been created, the system displays a dialog box asking you to create one. If you have created ER models before, click to create models.
    Figure 2 Creating a hierarchical governance model
    Figure 3 ER Modeling page
  4. In the dialog box displayed, set the parameters and click OK.
    Figure 4 Creating a model
    Table 1 Parameters for creating a physical model

    Parameter

    Description

    Name

    Only letters, numbers, and underscores (_) are allowed.

    Data Connection Type

    Select a data connection type from the drop-down list box.

    Data Warehouse Layer

    Select SDI or DWI.

    • SDI stands for Source Data Integration and is the source data layer. SDI is a simple implementation of source system data.
    • DWI stands for Data Warehouse Integration, also called the data consolidation layer. DWI integrates and cleans data from multiple source systems, and implements entity relationship modeling based on the three normal forms.

    Description

    A description of the ER model. Up to 600 characters are supported.

Creating and Publishing a Table

After creating a DLI, POSTGRESQL, DWS or MRS Hive ER model, you can create a business table in the model.

  1. On the DataArts Architecture page, choose Models > ER Modeling in the left navigation pane.
  2. Select the physical model for which you want to create a table, click the physical model to access the model management page, and click Create.

    Figure 5 Entry for creating a table

  3. On the Create Table page, set the parameters as required.

    1. Set the basic parameters.
      Figure 6 Basic Settings tab page
      Table 2 Parameters on the Basic Settings tab page

      Parameter

      Description

      Subject

      Select a subject from the drop-down list box.

      Name

      The name of the table to create. Table names must start with letters. Only letters, numbers, and the following special characters are allowed: ()-_

      Table Code

      The code of the table to create. Table codes cannot start with numbers. Only letters, numbers, and the following special characters are allowed: _${}

      Data Connection Type

      N/A

      Data Connection

      The name of the data connection. Select the required data connection. You are advised to use the same data connection for an ER model.

      If no data connection is available, access Management Center to create one. For details, see Creating Data Connections.

      Database

      The name of the database. Select a database from the drop-down list box.

      Queue

      DLI queue. This parameter is available only for DLI tables.

      Schema

      Schema of DWS or PostgreSQL This parameter is available only for DWS and PostgreSQL tables.

      Table Type

      DLI models support the following table types:
      • MANAGED: Data is stored in a DLI table.
      • EXTERNAL: Data is stored in an OBS table. When Table Type is set to EXTERNAL, you must set OBS Path. The OBS path format is /bucket_name/filepath.

      DWS models support the following table types:

      • DWS_ROW: Tables are stored to disk partitions by row.
      • DWS_COLUMN: Tables are stored to disk partitions by column.
      • DWS_VIEW: Tables are stored to disk partitions by view.

      The MRS_HIVE model supports only HIVE_TABLE.

      Data Format

      This parameter is available only for DLI tables. DLI models support the following table types:

      • Parquet: DLI can read non-compressed data or Parquet data that is compressed using Snappy and GZIP.
      • CSV: DLI can read non-compressed data or CSV data that is compressed using GZIP.
      • ORC: DLI can read non-compressed data or ORC data that is compressed using Snappy.
      • JSON: DLI can read non-compressed data or JSON data that is compressed using GZIP.
      • Carbon: DLI can read non-compressed Carbon data.
      • Avro: DLI can read non-compressed Avro data.

      Advanced Settings

      Set custom items to describe the table. The custom items can be viewed in the table details.

      For example, if you want to identify the source of the table, you can add item source and set its value to the table source information. Then you can view the table source information in the table details.

      Tag

      Tags are custom identifiers that help you classify and search for data assets. After adding a tag, you can search for related data assets in the DataArts Catalog module with ease.

      Click . In the dialog box displayed, select one or more existing tags, or enter a new tag name and press Enter. Then press OK. You can also go to the Tags page of the DataArts Catalog module to add a tag. Then, return to this page and select the newly added tag from the drop-down list box. For details, see Tags.

      Owner

      You can enter an owner name or select an existing owner.

      Description

      A description of the table. It allows 1 to 600 characters.

    2. Click Add to add required fields on the Table Fields page.
      Figure 7 Adding required table fields
      Table 3 Parameters on the Table Fields tab page

      Parameter

      Description

      Name

      It must start with letters. Only letters, digits, and the following special characters are allowed: ()-_

      Code

      Only letters, numbers, and underscores (_) are allowed. A field code must start with a letter.

      Data Type

      Field data type. If the required data type does not exist, you can add one. See Data Types.

      Data Standard

      If you have created data standards, click to select one to associate with the field. If Create Data Quality Jobs is selected for Model Design Process on the Function Settings tab page in Configuration Center and a field is associated with a data standard, a quality job is automatically generated after a table is published. A quality rule is generated for each field associated with the data standard. The quality of the field is monitored based on the data standard. You can access the Quality Job page of DataArts Quality to view the job details.

      If no data standard is available, create one. See Creating Data Standards for details.

      Primary Key

      If this parameter is selected, the field is a primary key.

      Partition

      If this parameter is selected, the field is a partition field.

      Not Null

      Whether the parameter value can be left empty.

      Tag

      Click to add a tag.

      • In the dialog box displayed, select one or more existing tags. If no tag has been added, you can go to the Tags page of the DataArts Catalog module to add a tag. For details, see Tags.
      • In the dialog box displayed, enter a new tag name and press Enter. Tag names can contain letters, numbers, and underscores (_), but cannot start with underscores (_).

      Description

      A description of the field to add.

    3. (Optional) On the Relationships tab page, click Add to create a relationship.

      A relationship refers to the association between a parent and a child table (also called a primary and a secondary table). It describes how a table is associated with another table, or the impact of a table's behavior on another table. Relationships between tables in a data model are particularly important and must be accurately defined. Otherwise, the actual business rules cannot be accurately described in the data model, and data consistency is greatly damaged.

      For example, if the student ID attribute of a score table is the primary key for a student table, the relationship between the two tables designed according to the third normal form (3NF) is as follows:
      • Child table: score table
      • Child table field FK: student ID
      • Child to parent:
      • Parent table: student table
      • Parent table field PK: student ID
      • Parent to child:
      Figure 8 (Optional) Adding a relationship
      Table 4 Parameters on the Relations tab page

      Parameter

      Description

      Name

      Name of the relationship

      Child Table

      Select a table from the drop-down list box. Click to set the current table as a child table.

      For example, if the student ID attribute of a score table is the primary key for a student table, the child table is the score table, and the corresponding parent table is the student table.

      Child Table Field FK

      Foreign key of the child table. The field of the child table must be the foreign key of the parent table.

      For example, if the student ID attribute of a score table is the primary key for a student table, the child table field FK is the student ID in the score table.

      Child to Table

      indicates that each piece of data in the child table corresponds to only one piece of data in the parent table.

      indicates that each piece of data in the child table corresponds to at most one piece of data in the parent table.

      indicates that one piece of data in the child table corresponds to multiple pieces of data in the parent table.

      indicates that each piece of data in the child table corresponds to at least one piece of data in the parent table.

      Parent to Child

      indicates that each piece of data in the parent table corresponds to only one piece of data in the child table.

      indicates that each piece of data in the parent table corresponds to at most one piece of data in the child table.

      indicates that one piece of data in the parent table corresponds to multiple pieces of data in the child table.

      indicates that one piece of data in the parent table corresponds to at least one piece of data in the child table.

      Parent Table

      Select the parent table corresponding to the selected child table.

      For example, if the student ID attribute of a score table is the primary key for a student table, the parent table is the student table, and the corresponding child table is the score table.

      Parent Table Field PK

      Primary key of the parent table. The field of the parent table must be the primary key of the parent table.

      For example, if the student ID attribute of a score table is the primary key for a student table, the parent table field PK is the student ID in the student table.

      Role

      You can customize a role name to identify the relationship.

      Operation

      Click to delete a relationship. Click to edit the relationship.

    4. (Optional) On the Mappings tab page, click Create to create a mapping and design a data source based on the created mapping.
      • If the table field comes from different relationship models, you must create multiple mappings.

        Currently, table data can be obtained from ER models of different connection types. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.

        For example, if the data of the first five fields and the last five fields in the current table comes from two different models, create the following mappings:

        • map1: Create a table named table01 from ER model A. In the Field Mapping area, set the source fields of the first to fifth fields to the corresponding fields with the same meaning in table01. The last five fields do not need to be set.
        • map2: Create a table named table02 from ER model B. In the Field Mapping area, set the source fields of the sixth to tenth fields to the corresponding fields with the same meaning in table02. The first five fields do not need to be set.
      • If the field data in a table comes from multiple tables in the same ER model, you can create a mapping.

        In the source table of the mapping, you can set JOIN conditions for multiple tables, and then set source fields for the fields in the table. The selected source fields must have the same meanings as the fields in the table.

        For example, all fields in the current table come from ER model d1, the first, second, and third fields come from the vendor, payment_type, and rate tables respectively, and other fields come from the dwd_taxi_trip_data table.

        You can create a mapping, as shown in Figure 9. Join the dwd_taxi_trip_data table with the vendor, payment_type, and rate tables, and set the source fields in sequence in the field mapping.

      For details on the parameters for creating a mapping, see Table 5.

      Figure 9 Configuring a mapping
      Table 5 Parameters of mappings

      Parameter

      Description

      Mapping

      Only letters, numbers, and underscores (_) are allowed.

      Model

      Select a created relationship model from the drop-down list box. If no relationship model has been created, create one. See Designing Physical Models.

      Table

      Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.

      1. Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
      2. Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
      3. Click OK.
      4. If you want to delete a joined table after setting the JOIN condition, click next to the table name.
      Figure 10 Join Condition dialog box

      Field Mapping

      Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.

      In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.

    5. (Optional) If the type of the new table is DWS_VIEW, click Create to create a view.
      Figure 11 Creating a view

      Table 6 Parameters

      Parameter

      Description

      Mapping

      Only letters, numbers, and underscores (_) are allowed.

      Table

      Select a table from which data is obtained. If data is obtained from multiple tables, click next to the table name to set the JOIN condition between the table and other tables.

      1. Select a JOIN mode. The JOIN mode includes left JOIN, right JOIN, inner JOIN, and outer JOIN from left to right.
      2. Set the JOIN condition in the JOIN field. Generally, select the fields with the same meaning in the source table and joined table. Click or to add or delete a JOIN condition. The relationship between JOIN conditions is AND.
      3. Click OK.
      4. If you want to delete a joined table after setting the JOIN condition, click next to the table name.
      Figure 12 Join Settings dialog box

      Field Mapping

      Select a source field with the same meaning as the current mapping field. If a table field comes from multiple models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping. Other fields do not need to be set.

      In the upper right corner of the Mappings area, click to delete a mapping or click to collapse the mapping area.

  4. Click Publish, select a reviewer, and click Submit.
  5. Wait for the reviewer to approve the application. After the application is approved, return to the ER Modeling page to view the table status and synchronization status.

    Publishing is an asynchronous operation. You can click to refresh the status. After table publishing application is approved, the system performs operations such as creating tables and synchronizing technical assets and business assets based on the configurations of Model Design Process on the Function Settings tab page in Configuration Center. The synchronization status is displayed in the Sync Status column of the table on the Information Architecture page.
    • If the synchronization is successful, the table is successfully published. Move the cursor to in the Sync Status column. If the message indicating "creation succeeded" is displayed, the table has been successfully created in the corresponding data source.
    • If one or more items fail to be synchronized, you can refresh the status. If the fault persists, choose More > View History and click the Publish Log tab to view logs.

      Troubleshoot the problem based on the logs. After the error is rectified, click Resynchronize on the History tab page to issue the synchronization command again. If the synchronization still fails, contact technical support for assistance.