Updated on 2022-09-23 GMT+08:00

DataArts Architecture Example

DataArts Architecture can be used to create entity-relationship (ER) models and dimensional models to standardize and visualize data development and output data governance methods that can guide development personnel to work with ease.

This section covers the following scenarios:
  • Design a data model for the taxi travel data in an MRS Hive data lake.
  • The original taxi travel data table sdi_taxi_trip_data is stored in the demo_sdi_db database.
  • The following table lists the data fields in the original data table sdi_taxi_trip_data.
    The following table lists the taxi trip data:
    Table 1 Taxi trip data

    No.

    Field Name

    Field Description

    1

    VendorID

    Vendor ID.

    Possible values are:

    1=A Company

    2=B Company

    2

    tpep_pickup_datetime

    Time when a passenger gets on a taxi.

    3

    tpep_dropoff_datetime

    Time when a passenger gets off a taxi.

    4

    passenger_count

    Number of passengers.

    5

    trip_distance

    Driving distance.

    6

    ratecodeid

    Charge rate code.

    Possible values are:

    1=Standard rate

    2=JFK

    3=Newark

    4=Nassau or Westchester

    5=Negotiated fare

    6=Group ride

    7

    store_fwd_flag

    Store-and-forward flag.

    8

    PULocationID

    Location at which a passenger gets on a taxi.

    9

    DOLocationID

    Location at which a passenger gets off a taxi.

    10

    payment_type

    Payment type.

    Possible values are:

    1=Credit card

    2=Cash

    3=No charge

    4=Dispute

    5=Unknown

    6=Voided trip

    11

    fare_amount

    Fare amount.

    12

    extra

    Extra fee.

    13

    mta_tax

    MTA tax.

    14

    tip_amount

    Tip amount.

    15

    tolls_amount

    Toll amount.

    16

    improvement_surcharge

    Improvement surcharge.

    17

    total_amount

    Total amount.

The process of using DataArts Architecture is as follows:

  1. Preparations
    • Add reviewers: In the DataArts Architecture module, all business processes must be approved. Therefore, add reviewers first before conducting any operations. Only the workspace admin has the permissions required to add reviewers.
    • Configuration Center provides abundant custom options. You can customize the configuration to meet your demands.
  2. Data Survey: A data survey involves collecting data that is generated when sorting business requirements, creating business processes, and classifying data subjects based on the existing business data and industry status.
    • Subject design is a hierarchical architecture that classifies and defines data to help clarify data assets and specify relationships between business domains and business objects.
    • Process design: This example does not contain this. Process design is to generate a structured framework of data processing process, including the categories, levels, boundaries, scope, and input/output relationships, and reflect the business models and characteristics of your enterprise.
  3. Standards: Create lookup tables and data standards.
    • Create and publish a lookup table: A lookup table includes a series of allowed values and additional text descriptions that are generally associated with data standards to generate a range of values for the verification of quality monitoring rules.
    • Create and publish a data standard: A data standard refers to the description of attribute data meanings and business rules that enterprises must comply with. It describes the common understanding of certain data at the company level.
  4. Models: Use ER modeling and dimensional modeling methods to perform hierarchical modeling.
    • ER modeling: Create a model at the SDI and DWI layers, respectively.
      • SDI stands for Source Data Integration and is the source data layer. SDI is a simple implementation of source system data.
      • DWI stands for Data Warehouse Integration, also called the data consolidation layer. DWI integrates and cleans data from multiple source systems, and implements entity relationship modeling based on the three normal forms.
    • Dimensional modeling: Create and publish a dimension at the DWR layer. & Creating and Publishing a Fact Table for the DWR Layer.
      • Data Warehouse Report (DWR) is based on the multi-dimensional model and its data granularity is the same as that of the DWI layer.
      • Dimension is the perspective to observe and analyze business data and assist in data aggregation, drilling, slicing, and analysis, and used as a GROUP BY condition in SQL statements.
      • A fact table that belongs to a business process can enrich the affair information corresponding to the specific business process.
  5. Metric design: Create and publish a technical metric: Create and publish a business metric (not involved in this example) and a technical metric. Technical metrics are classified into atomic, derivative, and compound metrics.
    • A metric consists of its name and value. The metric name and its definition reflect the quality and quantity of the metric. The metric value reflects the quantifiable values of the specified time, location, and condition of the metric.

      Business metrics are used to guide technical metrics, and technical metrics are used to implement business metrics.

    • Atomic metrics are generated based on dimension tables and fact tables of a multidimensional model. The objects and the finest data granularity of an atomic metric are consistent with those of the multidimensional model.

      An atomic metric usually consists of measures and attributes related with measures and business objects, all of which aim to support agile self-service consumption of the metric.

    • Derivative metrics are aggregated from the definitions, modifiers, and dimensions of atomic metrics. Therefore, their definitions, modifiers, and dimensions are derived from the attributes of atomic metric associated tables as well.
    • Compound metrics are generated by adding one or more derivative metrics. The dimensions and modifiers of a compound metric are the same as those of the derivative metrics.

      New dimensions and modifiers cannot be generated outside the scope of derivative metrics, dimensions, and modifiers.

  6. Dimensional modeling: Create and publish a summary table at the DM layer.
    • Data Mart (DM) is where multiple types of data are summarized. DM is designed to display the summarized data.
    • A summary table consists of specific analysis objects (for example, members) and related statistical metrics. The statistical metrics included in a summary table have the same statistical granularity (for example, members). The summary table provides users with all statistics-granularity-themed data (such as a member theme market).

Adding Reviewers

In the DataArts Architecture module, all modeling steps must be reviewed. Therefore, you need to add a reviewer first. DAYU Administrator or the workspace administrator has the permission to add reviewers.

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Architecture.
    Figure 1 DataArts Architecture
  2. In the navigation pane on the left, choose Configuration Center. On the displayed Reviewers page, click Add.
  3. Select a reviewer (administrator or developer), enter the correct email address and phone number, and click OK.
    You can also add your current account as a reviewer. In this way, auto review is supported in subsequent operations. Add more reviewers, if required.
    Figure 2 Adding a reviewer

Configuration Center

DataArts Architecture configuration center provides abundant custom options. You can customize the configuration to meet your demands.
  1. On the DataArts Architecture page, choose Configuration Center in the left navigation pane.
  2. Click the Functions tab and configure functions as needed.
    Figure 3 Functions
  3. Click OK.

Designing a Subject

This section uses the subjects listed in Table 2 as an example.

  • There is a subject area group named City transportation.
  • Under City transportation, there are four subject areas: Trip records, Corporation, Time&Space, and Public dimensions.
  • Under Trip records, there are four business objects: Original records, Standard records, Trip facts, and Record statistics.
  • Under Corporation, there is one business object: Suppliers.
  • Under Time&Space, there is one business object: Time.
  • Under Public dimensions, there is one business object: Public dimensions.
Table 2 Subject design

Subject Area Group Name (L1)

Subject Area Group Code (L1)

Subject Area Name (L2)

Subject Area Code (L2)

Business Object Name (L3)

Business Object Code (L3)

City transportation

city_traffic

Trip records

stroke_reminder

Original records

origin_stroke

Standard records

stand_stroke

Trip facts

stroke_fact

Record statistics

stroke_statistic

Corporation

people

Suppliers

vendor

Time&Space

time_location

Time

date

Public dimensions

public_dimension

Public dimensions

public_dimension

Figure 4 Designing a subject

Procedure

  1. Log in to the DataArts Studio console. Locate the created DataArts Studio instance and click Access.
  2. In the workspace list, locate the target workspace and click DataArts Architecture.
  3. Choose Configuration Center in the navigation pane on the left. Click the Subject Levels tab, and use the default three levels.

    There can be a maximum of seven subject levels, a minimum of two subject levels, and three subject levels by default. L1 to L7 are used to represent the layers. The last level is Business Object and cannot be customized. The names of other levels can be customized. The levels configured in Configuration Center take effect on the Subjects page.

    Figure 5 Configuring the subject levels

  4. On the DataArts Architecture console, choose Data Survey > Subjects in the left navigation pane. On the page displayed, click Create to create an L1 subject, which is a subject area group.

    Figure 6 Creating an L1 subject

    In the dialog box displayed, set the parameters as shown in Figure 6 and click OK.

  5. Create four L2 subjects under the L1 subject City transportation: Trip records, Corporation, Time&Space, and Public dimensions.

    Perform the following procedure to create a subject area named Trip records. The procedure for creating other subject areas is similar.

    1. Right-click the L1 subject City transportation in the subject tree, and select Create from the shortcut menu. Alternatively, click Create in the right pane.
      Figure 7 Creating an L2 subject
    2. In the dialog box displayed, set Subject Name and Subject Code to the values of Subject Area Name and Subject Area Code in Table 2, set other parameters based on project requirements, and click OK.

  6. Create business objects.

    • Under Trip records, create four business objects: Original records, Standard records, Trip facts, and Record statistics.
    • Under Corporation, create one business object: Suppliers.
    • Under Time&Space, create one business object: Time.
    • Under Public dimensions, create one business object: Public dimensions.

    Perform the following procedure to create a business object named Original records in the subject area Trip records. The procedure for creating other business objects is similar.

    1. Right-click the L2 subject Trip records in the subject tree, and select Create from the shortcut menu. Alternatively, click Create in the right pane.
    2. In the dialog box displayed, set Subject Name and Subject Code to the values of Business Object Name and Business Object Code in Table 2, set other parameters based on project requirements, and click OK.

Creating and Publishing Lookup Tables

This section uses the lookup tables listed in Table 3 as an example.

Table 3 Lookup tables

Directory

*Table Name

* Table Code

Table Description

* Field Name

* Field Code

* Data Type

Field Description

payment_type

payment_type

payment_type

None

payment_type_id

payment_type_id

BIGINT

None

payment_type_value

payment_type_value

STRING

None

vendor

vendor

vendor

None

vendor_id

vendor_id

BIGINT

None

vendor_value

vendor_value

STRING

None

rate

rate_code

rate_code

None

rate_code_id

rate_code_id

BIGINT

None

rate_code_value

rate_code_value

STRING

None

Procedure

  1. On the DataArts Architecture console, choose Standards > Lookup Tables in the navigation pane on the left.
  2. Create three lookup table directories: payment_type, vendor, and rate.

    Perform the following procedure to create a directory named payment_type. The procedure for creating other directories is similar.

    1. On the Lookup Tables page, click above the directory tree to create a directory.
      Figure 8 Lookup table directory tree
    2. In the dialog box displayed, enter a directory name, select a parent directory, and click OK.
      Figure 9 Creating a directory for lookup tables

  3. Create three lookup tables: payment_type, vendor, and rate_code.

    Perform the following procedure to create a lookup table named payment_type. The procedure for creating other lookup tables is similar.

    1. On the Lookup Tables page, click payment_type in the directory tree, and click Create on the page displayed.
      Figure 10 Lookup Tables page
    2. Set the parameters based on Table 3 and click Save.
      Figure 11 Creating a lookup table
    3. Refer to 3.a to 3.b to create the lookup table vendor in the vendor directory and the lookup table rate_code in the rate directory.
      Figure 12 Creating a lookup table named vendor
      Figure 13 Creating a lookup table named rate_code

  4. Enter values for the three lookup tables payment_type, vendor, and rate_code.

    On the Lookup Tables page, locate the row that contains the lookup table payment_type, and choose More > Manage Value in the Operation column. On the page displayed, click Add to add the values listed in Table 4.

    Table 4 Values to be added for the lookup table payment_type

    payment_type_id

    payment_type_value

    1

    Credit card

    2

    Cash

    3

    No charge

    4

    Dispute

    5

    Unknown

    6

    Voided trip

    Return to the Lookup Tables page, locate the row that contains the lookup table vendor, and choose More > Manage Value in the Operation column. On the page displayed, click Add to add the values listed in Table 5.

    Table 5 Values to be added for the lookup table vendor

    vendor_id

    vendor_value

    1

    A Company

    2

    B Company

    Return to the Lookup Tables page, locate the row that contains the lookup table rate_code, and choose More > Manage Value in the Operation column. On the page displayed, click Add to add the values listed in Table 6.

    Table 6 Values to be added for the lookup table rate_code

    rate_code_id

    rate_code_value

    1

    Standard rate

    2

    JFK

    3

    Newark

    4

    Nassau or Westchester

    5

    Negotiated fare

    6

    Group ride

  5. Return to the Lookup Tables page, select the three lookup tables, and click Publish.
  6. In the Apply for Publication dialog box, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.

Creating and Publishing Data Standards

In this example, you need to create the three data standards listed in Table 7.

Table 7 Data standards

Directory

*Standard Name

* Standard Code (Custom)

*Data Type

Data Length

Lookup Table

*Lookup Table Field

Description

payment_type

payment_type

payment_type

Long integer (BIGINT)

None

payment_type

payment_type_id

None

vendor

vendor

vendor

Long integer (BIGINT)

None

vendor

vendor_id

None

rate

rate_code

rate_code

Long integer (BIGINT)

None

rate_code

rate_code_id

None

  1. On the DataArts Architecture console, choose Standards > Data Standards in the navigation pane on the left.
  2. If you access the Data Standards page for the first time, you must customize a template. The custom template can be modified in Configuration Center. Additionally, select Lookup table, as shown in the following figure.

    Figure 14 Customize Template

  3. Create three directories for data standards: payment_type, vendor, and rate_code.

    In the upper part of the directory tree on the Data Standards page, click . In the dialog box displayed, enter the directory name as payment_type, select a parent directory, and click OK.

    Figure 15 Creating a directory for data standards

  4. Create three data standards: payment method, Suppliers, and rate code.

    1. In the directory tree on the Data Standards page, select the required directory and click Create on the page displayed on the right.
    2. On the Create Data Standard page, configure the three data standards by referring to the following figures, and click Save. In this example, only a few parameters are selected for the data standard template. You can customize a data standard template by referring to Configuration Center.
    Figure 16 Creating a data standard named payment method
    Figure 17 Creating a data standard named Suppliers
    Figure 18 Creating a data standard named rate code

  5. Return to the Data Standards page, select the three data standards in the list, and click Publish.
  6. In the Apply for Publication dialog box, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.

ER Modeling: Creating a Model at the SDI and DWI Layers Respectively

During ER modeling, create an ER model at the SDI and DWI layer, respectively, and import the original data table to the ER model at the SDI layer through by reversing the database, and create a standard service table named standard travel data in the ER model at the DWI layer.

  1. On the DataArts Architecture page, choose Models > ER Modeling in the left navigation pane.

    • If no ER model has been created, a dialog box is displayed, asking you to create a hierarchical governance model. You can create an SDI ER model named sdi and then create a DWI ER model named dwi. Click OK.
      Figure 19 Dialog box for creating a hierarchical governance model
    • If you have created ER models before, click to create physical models, as shown in the following figure.
      Figure 20 ER Modeling page
      1. Create an ER model at the SDI layer named sdi. In the Physical Models area, click . In the displayed dialog box, configure required parameters and click OK.
        Figure 21 Creating a physical model named sdi
      2. Create an ER model at the DWI layer named dwi. In the Physical Models area, click . In the displayed dialog box, configure required parameters and click OK.
        Figure 22 Creating a physical model named dwi

  2. In the Data Warehouse Layers part, click the newly created SDI ER model. Choose city_traffic > stroke_reminder > origin_stroke, and click Reverse Database on the page displayed on the right to import the source table.

    Figure 23 Model directory

    In the Reverse Database dialog box, set the parameters and click OK. In this example, select the original data table in the source layer database demo_sdi_db.

    Figure 24 Reverse Database dialog box

    After the database is reversed successfully, click Close. You can view the imported table in the table list.

    Figure 25 Viewing the imported table

  3. Perform the following steps to create a standard service table named standard travel data:

    1. In the Data Warehouse Layers part, click the newly created DWI ER model. Choose city_traffic > stroke_reminder > origin_stroke, and click Create on the page displayed on the right.
    2. On the Basic Settings page, set the parameters as follows.
      Figure 26 Basic settings of the trip data table
    3. Click the Table Fields tab and then Add. Add the fields listed in Table 8. Then click in the Data Standard column of the rows where the vendor ID, rate code ID, and payment type reside to associate with the Vendor, Rate Code ID, and Payment Type standards, respectively. Figure 27 lists the fields to be added.
      Table 8 Fields in the standard travel data table

      No.

      Field Name

      Field Code

      Data Type

      Data Standard

      Primary Key

      Partition

      Not Null

      Tag

      1

      Vendor ID

      vendor_id

      Long integer (BIGINT)

      Vendor

      Deselected

      Deselected

      Selected

      -

      2

      Pickup Time

      tpep_pickup_datetime

      Timestamp (TIMESTAMP)

      -

      Deselected

      Deselected

      Selected

      -

      3

      Drop-off Time

      tpep_dropoff_datetime

      Timestamp (TIMESTAMP)

      -

      Deselected

      Deselected

      Selected

      -

      4

      Passenger Quantity

      passenger_count

      Character (STRING)

      -

      Deselected

      Deselected

      Selected

      -

      5

      Trip Distance

      trip_distance

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      6

      Rate Code

      rate_code_id

      Long integer (BIGINT)

      Rate code

      Deselected

      Deselected

      Selected

      -

      7

      Storage Forwarding Flag

      store_fwd_flag

      Character (STRING)

      -

      Deselected

      Deselected

      Selected

      -

      8

      Pickup Location

      pu_location_id

      Character (STRING)

      -

      Deselected

      Deselected

      Selected

      -

      9

      Drop-off Location

      do_location_id

      Character (STRING)

      -

      Deselected

      Deselected

      Selected

      -

      10

      Payment Type

      payment_type

      Long integer (BIGINT)

      Payment type

      Deselected

      Deselected

      Selected

      -

      11

      Fare

      fare_amount

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      12

      Extra Fee

      extra

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      13

      MTA Tax

      mta_tax

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      14

      Handling Fee

      tip_amount

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      15

      Toll

      tolls_amount

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      16

      Improvement Surcharge

      improvement_surcharge

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      17

      Total Fee

      total_amount

      High-precision (DECIMAL)(10,2)

      -

      Deselected

      Deselected

      Selected

      -

      Figure 27 Fields in the trip data table

      You can perform the following operations on the fields in the standard travel data table

      • Associating with data standards

        When creating or editing a table, click the Table Fields tab. Locate the row that contains a field and click in the Data Standard column to associate the field with a data standard. After the field is associated with a data standard and the table is published, a quality job is automatically generated, and a quality rule is generated for each field associated with a data standard. You can monitor the fields based on the data standards and view the field statuses on the Quality Jobs page of the DataArts Quality console. For more information about associating data standards, see .

      • Adding tags

        Tags are user-defined identifiers. After adding a tag, you can search for related data assets in the DataArts Catalog module with ease.

        When creating or editing a table, click the Table Fields tab, locate the row that contains a field, and click in the Tag column. In the displayed dialog box, enter a new tag name and press Enter or select an existing tag from the drop-down list.

      • Associating with quality rules

        After creating a table, you can associate fields in the table with quality rules. After the association is complete and the table is published, a quality job is automatically created on the DataArts Quality page after the table is published. If the table has been published, the system automatically updates the quality job. For more information about associating quality rules, see Associating with Quality Rules.

    4. Click the Mappings tab and create mappings to design data sources of the table.
      • If the table field comes from different relationship models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping.
      • If the table fields come from multiple tables in the same ER model, you can create a mapping. You can set Join for multiple tables of the mapping and set source fields for the fields in the table.

      In this example, you only need to create one mapping. Click Create to create a mapping.

      • Mapping is automatically generated, but is also configurable.
      • Model: Select sdi.
      • Table: Select the original data table sdi_taxi_trip_data, from where data of the standard travel table comes.
        Figure 28 Creating a mapping
      • Field Mapping

        In the Field Mapping area, set source fields for the fields in the table in sequence. The selected source fields must have the same meaning as the fields in the table. As shown in Figure 29, the generated SQL statement is displayed at the bottom of the Field Mapping area.

        • On the DataArts Architecture page, choose Metrics > Configuration Center in the left navigation pane, and click the Functions tab. On the page displayed, if Create data development jobs is selected for Model Design Process, the system creates an ETL job during data development based on the table mapping information during table release. An ETL node is generated for each mapping, and the job name starts with Database name_Table code. Currently, this function is in the internal test stage. Only DLI-to-DLI and DLI-to-DWS mapping jobs can be created.

          You can choose DataArts Factory > Job Development to view the created ETL jobs. By default, ETL jobs are scheduled at 00:00 every day.

        • In this example, the function of automatically creating ETL jobs is not enabled. The function provides only the data flow direction for data development. During data development, you can refer to the mapping to write SQL scripts.
        Figure 29 Field Mapping
    5. After configuring the mapping, you have finished configuring the taxi trip data table. Click Save.

  4. Select the created model and choose More > Export. In the displayed dialog box, select Table for Export and click OK to export the model. Then export the sdi model in the same way. The exported models can be used as backups and imported when needed in the future.

    Figure 30 Export Model dialog box

  5. Publish table models.

    1. Publish the source table imported to the SDI ER model in 2. After the table is published, you can use DataArts Studio to manage and monitor the source table.

      Return to the ER Modeling page, select the sdi model in the model directory. Select the sdi_taxi_trip_data table in the list on the right, and click Publish. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.

    2. Publish a table of the DWI ER model.

      Return to the ER Modeling page, select the dwi model in the model directory. Select the dwi_table_trip_data table in the list on the right, and click Publish. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.

  6. After the application is approved, you can view Status and Sync Status on the ER Modeling page.

    Publishing is an asynchronous operation. You can click to refresh the status. After table publishing application is approved, the system performs operations such as creating tables and synchronizing technical assets and business assets based on the configurations of Model Design Process on the Function Settings tab page in Configuration Center. The synchronization status is displayed in the Sync Status column of the table on the Information Architecture page.
    • If the Sync Status is successful, the table is published successfully. Move the cursor over in the Sync Status column. If the message Table created successfully is displayed, the table has been successfully created in the corresponding data source.
    • If one or more items in the in the Sync Status column fail to be synchronized, you can refresh the status. If the fault persists, choose More > View History to view logs.

      Locate the failure cause based on the error log and rectify the fault. Then return to the ER Modeling page, select the tables to be synchronized from the list, and choose More > Synchronize to synchronize the tables again. If the fault persists, contact technical support.

    Figure 31 Checking the table status

    Click a table name in the list to view the table details. Data Source indicates the location of the table.

    Figure 32 Table details

Creating and Publishing Dimensions for the DWR Layer

During dimension modeling, create three lookup table dimensions (vendor, rate_code, and payment_type) and one hierarchy dimension (date) for the DWR layer.

  1. On the DataArts Architecture console, choose Models > Dimensional Modeling in the navigation pane on the left.
  2. Create the three lookup table dimensions listed in Table 9.

    Table 9 Lookup table dimensions

    *Subject

    *Dimension Name

    * Dimension Code

    *Type

    *Owner

    Description

    *Data Connection Type

    *Data Connection Name

    *Database

    Lookup Table

    vendor

    vendor

    dim_vendor

    Lookup table

    -

    None

    MRS_HIVE

    mrs_hive_link

    demo_dwr_db

    vendor

    public_dimension

    rate_code

    dim_rate_code

    Lookup table

    -

    None

    MRS_HIVE

    mrs_hive_link

    demo_dwr_db

    rate

    public_dimension

    payment_type

    dim_payment_type

    Lookup table

    -

    None

    MRS_HIVE

    mrs_hive_link

    demo_dwr_db

    payment_type

    1. Click the Dimensions tab, choose City transportation > Corporation > Suppliers in the subject tree, and click Create to create a dimension named Suppliers.
      Figure 33 Dimensional modeling
    2. On the Create Dimension page, set the parameters as shown in the figure below and click Save.
      Figure 34 Creating a dimension named Suppliers
    3. Click the Dimensions tab, choose City transportation > Public dimensions > Public dimensions in the subject tree, and click Create to create a dimension named rate code. On the Create Dimension page, set the parameters as shown in the figure below and click Save.
      Figure 35 Creating a dimension named rate code
    4. Click the Dimensions tab, choose City transportation > Public dimensions > Public dimensions in the subject tree, and click Create to create a dimension named payment method. On the Create Dimension page, set the parameters as shown in the figure below and click Save.
      Figure 36 Creating a dimension named payment method

  3. Create a hierarchy dimension named date.

    1. On the Dimensional Modeling tab page, choose City transportation > Time&Space > Time in the subject tree. Then click Create on the Dimensions tab page to create a dimension named date dimension.
    2. Configure the basic settings and physicalization settings as shown in the figure below.
      Table 10 Date dimension

      *Subject

      *Dimension Name

      * Dimension Code

      *Type

      *Owner

      Description

      *Data Connection Type

      *Data Connection Name

      *Database

      date

      date dimension

      dim_date

      Hierarchy

      -

      None

      MRS_HIVE

      mrs_hive_link

      demo_dwr_db

      Figure 37 Date dimension
    3. In the Field Settings area, add fields as described in the table below.
      Table 11 Field settings

      No.

      Field Name

      Field Code

      Data Standard

      Data Type

      Surrogate Key

      Primary Key

      Partition

      Not Null

      1

      dim_date_key

      dim_date_key

      -

      TIMESTAMP

      Selected

      Selected

      Not selected

      Selected

      2

      real_time

      real_time

      -

      TIMESTAMP

      Not selected

      Not selected

      Not selected

      Not selected

      3

      minute_id

      minute_id

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      4

      minute

      minute

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      5

      hour_id

      hour_id

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      6

      hour

      hour

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      7

      day_id

      day_id

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      8

      day

      day

      -

      STRING

      Not selected

      Not selected

      Not selected

      Not selected

      9

      month_id

      month_id

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      10

      month

      month

      -

      STRING

      Not selected

      Not selected

      Not selected

      Not selected

      11

      year_id

      year_id

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      12

      year

      year

      -

      BIGINT

      Not selected

      Not selected

      Not selected

      Not selected

      Figure 38 Field settings
    4. In the Hierarchy Settings area, click Add to create two layers as shown in the figures below.
      Figure 39 Layer 1
      Figure 40 Layer 2
    5. Click Save.

  4. Return to the Dimensions tab page, select the four new dimensions in the dimension list, and click Publish.
  5. In the Apply for Publication dialog box, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
  6. After a dimension is published and approved, the system automatically creates a dimension table for the dimension. The name and code of the dimension table are the same as those of the dimension. On the Dimensional Modeling page, click the Dimension Tables tab to view the created dimension table.

    In the dimension table list, you can view Sync Status of the dimension tables.

    • If all items in Sync Status are displayed as Succeeded, the dimension is published and the dimension table is created in the database.
    • If an item in Sync Status is displayed as Failed, click View History in the row. On the page displayed, click the History tab to view logs. Troubleshoot the fault based on the logs. After the fault is rectified, select the dimension table, click Synchronize above the dimension table list, and click OK in the dialog box displayed. If the fault persists, contact technical support for assistance.
    Figure 41 Sync Status of the dimension tables

Creating and Publishing a Fact Table for the DWR Layer

During dimensional modeling, create a fact table named stroke_order for the DWR layer.

  1. On the DataArts Architecture console, choose Models > Dimensional Modeling in the navigation pane on the left.
  2. Click the Fact Tables tab, choose City transportation > Trip records > Trip facts in the subject tree, and click Create to create a fact table named stroke_order.

    In the Basic Settings area on the Create Fact Table page, set the following parameters:

    • Subject: Subject Area Group: City transportation > Subject Area: Trip records > Business Object: Trip facts
    • Table Name: stroke_order
    • Table Code: fact_stroke_order
    • Data Connection Type: MRS_HIVE
    • Data Connection Name: mrs_hive_link
    • Database: demo_dwr_db
    • Table Type: HIVE_TABLE
    • Owner: an owner in the drop-down list box
    • Description: None

    In the Field Settings area, choose Create > Dimension. In the dialog box displayed, select the dimensions rate_code, vendor, payment_type, and date, and click OK. Choose Create > Dimension. In the dialog box displayed, select the dimension date and click OK. In the dimension field list, adjust the sequence of the dimension fields and modify the information about the two date dimensions, as listed in Table 12.

    Table 12 Dimension fields

    No.

    Field Name

    Field Code

    Data Type

    Primary Key

    Partition

    Not Null

    Associated Standard

    Associated Dimension

    Role

    Description

    1

    rate_code_id

    rate_code_id

    BIGINT

    Not selected

    Not selected

    Not selected

    -

    rate_code

    dim_

    -

    2

    vendor_id

    vendor_id

    BIGINT

    Not selected

    Not selected

    Not selected

    -

    vendor

    dim_

    -

    3

    payment_type_id

    payment_type_id

    BIGINT

    Not selected

    Not selected

    Not selected

    -

    payment_type

    dim_

    -

    4

    pickup_date_key

    dim_pickup_date_key

    TIMESTAMP

    Not selected

    Not selected

    Not selected

    -

    Date

    dim_pickup

    Date dimension table

    5

    tpep_dropoff_datetime

    dim_dropoff_date_key

    TIMESTAMP

    Not selected

    Not selected

    Not selected

    -

    Date

    dim_dropoff

    Date dimension table

    In the Field Settings area, choose Create > Measure and create the fields listed in Table 13 in sequence.

    Table 13 Measure fields

    No.

    Field Name

    Field Code

    Data Type

    Primary Key

    Partition

    Not Null

    Associated Standard

    6

    pu_location_id

    pu_location_id

    STRING

    Not selected

    Not selected

    Not selected

    -

    7

    do_location_id

    do_location_id

    STRING

    Not selected

    Not selected

    Not selected

    -

    8

    fare_amount

    fare_amount

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    9

    extra

    extra

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    10

    mta_tax

    mta_tax

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    11

    tip_amount

    tip_amount

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    12

    tolls_amount

    tolls_amount

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    13

    improvement_surcharge

    improvement_surcharge

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    14

    total_amount

    total_amount

    DECIMAL (10,2)

    Not selected

    Not selected

    Not selected

    -

    Figure 42 Fact table fields

  3. After the configuration, click Publish.
  4. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
  5. Return to the Fact Tables tab page, find the new fact table in the list, and view Sync Status.

    • If all items in Sync Status are displayed as Succeeded, the fact table is published and created in the database.
    • If an item in Sync Status is displayed as Failed, choose More > View History. On the page displayed, click the History tab to view logs. Troubleshoot the fault based on the logs. After the fault is rectified, choose More > Synchronize above the fact table list, and click OK in the dialog box displayed. If the fault persists, contact technical support for assistance.

Creating and Publishing Technical Metrics

In this example, you need to create the technical metrics listed in Table 14 and Table 15.

Table 14 Atomic metrics

*Metric Name

* Metric Code

Data Table

*Subject

*Expression

Description

total_amount

sum_total_amount

Itinerary order

stroke_fact

sum (total amount)

None

Table 15 Derivative metrics

Metric

*Data Table

*Subject

*Atomic Metric

Statistical Dimension

Time Filter

General Filter

total_amount_(payment_type)

Itinerary order

stroke_statistic

total_amount

payment_type

None

None

total_amount_(rate_code)

Itinerary order

stroke_statistic

total_amount

rate_code

None

None

total_amount_(vendor,stroke_order.dim_dropoff_date_key)

Itinerary order

stroke_statistic

total_amount

vendor and stroke_order.dim_dropoff_date_key

None

None

  1. On the DataArts Architecture console, choose Metrics > Technical Metrics in the navigation pane on the left.
  2. Create an atomic metric named total_amount to collect statistics on fares.

    1. Click the Atomic Metrics tab and click Create.
    2. On the Create Atomic Metric page, set the parameters as shown in the figure below and click Publish.
      Figure 43 Creating an atomic metric
    3. Wait for the reviewer to review the application. After the application is approved, the atomic metric will be created.

  3. Create three derivative metrics.

    • Create total_amount_(payment_type) to collect statistics on the total fares based on payment_type.

      On the Technical Metrics page, click the Derivative Metrics tab and click Create. On the Create Derivative Metric page, set the parameters as shown in the figure below. After the configuration is complete, click Trial Run. In the dialog box displayed, click Execute. If the trial running is successful, click Save.

      Figure 44 Creating a derivative metric named total_amount_(payment_type)
    • Create total_amount_(rate_code) to collect statistics on the total fares based on rate_code.

      On the Technical Metrics page, click the Derivative Metrics tab and click Create. On the Create Derivative Metric page, set the parameters as shown in the figure below. After the configuration is complete, click Trial Run. In the dialog box displayed, click Execute. If the trial running is successful, click Save.

      Figure 45 Creating a derivative metric named total_amount_(rate_code)
    • Create total_amount_(vendor,stroke_order.dim_dropoff_date_key) to collect statistics on the total fares based on vendor.

      On the Technical Metrics page, click the Derivative Metrics tab and click Create. On the Create Derivative Metric page, set the parameters as shown in the figure below. After the configuration is complete, click Trial Run. In the dialog box displayed, click Execute. If the trial running is successful, click Save.

      Figure 46 Creating a derivative metric named total_amount_(vendor,stroke_order.dim_dropoff_date_key)

  4. Return to the Derivative Metrics tab page, select the three derivative metrics and click Publish. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.

Creating and Publish Summary Tables for the DM Layer

Create the three summary tables listed in Table 16 for the DM layer.

Table 16 Summary tables

*Subject

*Table Name

* Table Code

Statistical Dimension

Data Connection Type

*Data Connection Name

*Database

Owner

Description

stroke_statistic

payment_type

dws_payment_type

payment_type

MRS_HIVE

mrs_hive_link

demo_dm_db

-

None

stroke_statistic

rate_code

dws_rate_code

rate_code

MRS_HIVE

mrs_hive_link

demo_dm_db

-

None

stroke_statistics

vendor

dws_vendor

vendor and stroke_order.dim_dropoff_date_key

MRS_HIVE

mrs_hive_link

demo_dm_db

-

None

  1. On the DataArts Architecture console, choose Models > Dimensional Modeling in the left navigation pane.
  2. Click the Summary Tables tab.
  3. Create three summary tables: payment_type, rate_code, and vendor.

    1. On the Summary Tables page, choose City transportation > Trip records > Record statistics in the directory tree, and click Create to create a summary table named payment method statistics. On the Create Summary Table page, set the parameters and click Save.

      Set the basic settings as shown in the figure below.

      Figure 47 Creating a summary table named payment method statistics

      In the Time Partition area, enter the field code dtime and select the data type TIMESTAMP. After a table is published, data is written to the table based on the field added here.

      Figure 48 Time Partition settings

      In the Metric Settings area, click Add to add the derivative metric total_amount_(payment_mode). You can add only published derivative or compound metrics that are associated with the specified statistical dimension.

      Figure 49 Metric settings

      Click Save.

    2. On the Summary Tables page, choose City transportation > Trip records > Record statistics in the directory tree, and click Create to create a summary table named rate statistics. On the Create Summary Table page, set the parameters and click Save.
      Figure 50 Creating a summary table named rate statistics (Basic Settings)
      Figure 51 Creating a summary table named rate statistics (Metric Settings)
    3. On the Summary Tables page, choose City transportation > Trip records > Record statistics in the directory tree, and click Create to create a summary table named supplier statistics. On the Create Summary Table page, set the parameters and click Save.
      Figure 52 Creating a summary table named supplier statistics (Basic Settings)
      Figure 53 Creating a summary table named supplier statistics (Metric Settings)

  4. Return to the Dimensions tab page, select the three new summary tables in the dimension list, and click Publish.
  5. In the dialog box displayed, select a reviewer and click OK. After the reviewer approves the publishing application, the summary table is automatically created. If you have the reviewer permissions, select Auto-review and click Submit.
  6. Return to the Summary Tables tab page, find the new summary tables in the list, and view Sync Status.

    • If all items in Sync Status are displayed as Succeeded, the summary tables are published and created in the database.
    • If an item in Sync Status is displayed as Failed, choose More > View History in the row. On the page displayed, click the History tab to view logs. Troubleshoot the fault based on the logs. After the fault is rectified, choose More > Synchronize above the summary table list, and click OK in the dialog box displayed. If the fault persists, contact technical support for assistance.