DataArts Architecture Example
DataArts Architecture can be used to create entity-relationship (ER) models and dimensional models to standardize and visualize data development and output data governance methods that can guide development personnel to work with ease.
- Design a data model for the taxi travel data in an MRS Hive data lake.
- The original taxi travel data table sdi_taxi_trip_data is stored in the demo_sdi_db database.
- The following table lists the data fields in the original data table sdi_taxi_trip_data.
The following table lists the taxi trip data:
Table 1 Taxi trip data No.
Field Name
Field Description
1
VendorID
Vendor ID.
Possible values are:
1=A Company
2=B Company
2
tpep_pickup_datetime
Time when a passenger gets on a taxi.
3
tpep_dropoff_datetime
Time when a passenger gets off a taxi.
4
passenger_count
Number of passengers.
5
trip_distance
Driving distance.
6
ratecodeid
Charge rate code.
Possible values are:
1=Standard rate
2=JFK
3=Newark
4=Nassau or Westchester
5=Negotiated fare
6=Group ride
7
store_fwd_flag
Store-and-forward flag.
8
PULocationID
Location at which a passenger gets on a taxi.
9
DOLocationID
Location at which a passenger gets off a taxi.
10
payment_type
Payment type.
Possible values are:
1=Credit card
2=Cash
3=No charge
4=Dispute
5=Unknown
6=Voided trip
11
fare_amount
Fare amount.
12
extra
Extra fee.
13
mta_tax
MTA tax.
14
tip_amount
Tip amount.
15
tolls_amount
Toll amount.
16
improvement_surcharge
Improvement surcharge.
17
total_amount
Total amount.
The process of using DataArts Architecture is as follows:
- Preparations
- Add reviewers: In the DataArts Architecture module, all business processes must be approved. Therefore, add reviewers first before conducting any operations. Only the workspace admin has the permissions required to add reviewers.
- Configuration Center provides abundant custom options. You can customize the configuration to meet your demands.
- Data Survey: A data survey involves collecting data that is generated when sorting business requirements, creating business processes, and classifying data subjects based on the existing business data and industry status.
- Subject design is a hierarchical architecture that classifies and defines data to help clarify data assets and specify relationships between business domains and business objects.
- Process design: This example does not contain this. Process design is to generate a structured framework of data processing process, including the categories, levels, boundaries, scope, and input/output relationships, and reflect the business models and characteristics of your enterprise.
- Standards: Create lookup tables and data standards.
- Create and publish a lookup table: A lookup table includes a series of allowed values and additional text descriptions that are generally associated with data standards to generate a range of values for the verification of quality monitoring rules.
- Create and publish a data standard: A data standard refers to the description of attribute data meanings and business rules that enterprises must comply with. It describes the common understanding of certain data at the company level.
- Models: Use ER modeling and dimensional modeling methods to perform hierarchical modeling.
- ER modeling: Create a model at the SDI and DWI layers, respectively.
- SDI stands for Source Data Integration and is the source data layer. SDI is a simple implementation of source system data.
- DWI stands for Data Warehouse Integration, also called the data consolidation layer. DWI integrates and cleans data from multiple source systems, and implements entity relationship modeling based on the three normal forms.
- Dimensional modeling: Create and publish a dimension at the DWR layer. & Creating and Publishing a Fact Table for the DWR Layer.
- Data Warehouse Report (DWR) is based on the multi-dimensional model and its data granularity is the same as that of the DWI layer.
- Dimension is the perspective to observe and analyze business data and assist in data aggregation, drilling, slicing, and analysis, and used as a GROUP BY condition in SQL statements.
- A fact table that belongs to a business process can enrich the affair information corresponding to the specific business process.
- ER modeling: Create a model at the SDI and DWI layers, respectively.
- Metric design: Create and publish a technical metric: Create and publish a business metric (not involved in this example) and a technical metric. Technical metrics are classified into atomic, derivative, and compound metrics.
- A metric consists of its name and value. The metric name and its definition reflect the quality and quantity of the metric. The metric value reflects the quantifiable values of the specified time, location, and condition of the metric.
Business metrics are used to guide technical metrics, and technical metrics are used to implement business metrics.
- Atomic metrics are generated based on dimension tables and fact tables of a multidimensional model. The objects and the finest data granularity of an atomic metric are consistent with those of the multidimensional model.
An atomic metric usually consists of measures and attributes related with measures and business objects, all of which aim to support agile self-service consumption of the metric.
- Derivative metrics are aggregated from the definitions, modifiers, and dimensions of atomic metrics. Therefore, their definitions, modifiers, and dimensions are derived from the attributes of atomic metric associated tables as well.
- Compound metrics are generated by adding one or more derivative metrics. The dimensions and modifiers of a compound metric are the same as those of the derivative metrics.
New dimensions and modifiers cannot be generated outside the scope of derivative metrics, dimensions, and modifiers.
- A metric consists of its name and value. The metric name and its definition reflect the quality and quantity of the metric. The metric value reflects the quantifiable values of the specified time, location, and condition of the metric.
- Dimensional modeling: Create and publish a summary table at the DM layer.
- Data Mart (DM) is where multiple types of data are summarized. DM is designed to display the summarized data.
- A summary table consists of specific analysis objects (for example, members) and related statistical metrics. The statistical metrics included in a summary table have the same statistical granularity (for example, members). The summary table provides users with all statistics-granularity-themed data (such as a member theme market).
Adding Reviewers
In the DataArts Architecture module, all modeling steps must be reviewed. Therefore, you need to add a reviewer first. DAYU Administrator or the workspace administrator has the permission to add reviewers.
- On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Architecture.
Figure 1 DataArts Architecture
- In the navigation pane on the left, choose Configuration Center. On the displayed Reviewers page, click Add.
- Select a reviewer (administrator or developer), enter the correct email address and phone number, and click OK.
You can also add your current account as a reviewer. In this way, auto review is supported in subsequent operations. Add more reviewers, if required.Figure 2 Adding a reviewer
Configuration Center
- On the DataArts Architecture page, choose Configuration Center in the left navigation pane.
- Click the Functions tab and configure functions as needed.
Figure 3 Functions
- Click OK.
Designing a Subject
This section uses the subjects listed in Table 2 as an example.
- There is a subject area group named City transportation.
- Under City transportation, there are four subject areas: Trip records, Corporation, Time&Space, and Public dimensions.
- Under Trip records, there are four business objects: Original records, Standard records, Trip facts, and Record statistics.
- Under Corporation, there is one business object: Suppliers.
- Under Time&Space, there is one business object: Time.
- Under Public dimensions, there is one business object: Public dimensions.
Subject Area Group Name (L1) |
Subject Area Group Code (L1) |
Subject Area Name (L2) |
Subject Area Code (L2) |
Business Object Name (L3) |
Business Object Code (L3) |
---|---|---|---|---|---|
City transportation |
city_traffic |
Trip records |
stroke_reminder |
Original records |
origin_stroke |
Standard records |
stand_stroke |
||||
Trip facts |
stroke_fact |
||||
Record statistics |
stroke_statistic |
||||
Corporation |
people |
Suppliers |
vendor |
||
Time&Space |
time_location |
Time |
date |
||
Public dimensions |
public_dimension |
Public dimensions |
public_dimension |
Procedure
- Log in to the DataArts Studio console. Locate the created DataArts Studio instance and click Access.
- In the workspace list, locate the target workspace and click DataArts Architecture.
- Choose Configuration Center in the navigation pane on the left. Click the Subject Levels tab, and use the default three levels.
There can be a maximum of seven subject levels, a minimum of two subject levels, and three subject levels by default. L1 to L7 are used to represent the layers. The last level is Business Object and cannot be customized. The names of other levels can be customized. The levels configured in Configuration Center take effect on the Subjects page.
Figure 5 Configuring the subject levels
- On the DataArts Architecture console, choose
in the left navigation pane. On the page displayed, click Create to create an L1 subject, which is a subject area group.
In the dialog box displayed, set the parameters as shown in Figure 6 and click OK.
- Create four L2 subjects under the L1 subject City transportation: Trip records, Corporation, Time&Space, and Public dimensions.
Perform the following procedure to create a subject area named Trip records. The procedure for creating other subject areas is similar.
- Right-click the L1 subject City transportation in the subject tree, and select Create from the shortcut menu. Alternatively, click Create in the right pane.
Figure 7 Creating an L2 subject
- In the dialog box displayed, set Subject Name and Subject Code to the values of Subject Area Name and Subject Area Code in Table 2, set other parameters based on project requirements, and click OK.
- Right-click the L1 subject City transportation in the subject tree, and select Create from the shortcut menu. Alternatively, click Create in the right pane.
- Create business objects.
- Under Trip records, create four business objects: Original records, Standard records, Trip facts, and Record statistics.
- Under Corporation, create one business object: Suppliers.
- Under Time&Space, create one business object: Time.
- Under Public dimensions, create one business object: Public dimensions.
Perform the following procedure to create a business object named Original records in the subject area Trip records. The procedure for creating other business objects is similar.
- Right-click the L2 subject Trip records in the subject tree, and select Create from the shortcut menu. Alternatively, click Create in the right pane.
- In the dialog box displayed, set Subject Name and Subject Code to the values of Business Object Name and Business Object Code in Table 2, set other parameters based on project requirements, and click OK.
Creating and Publishing Lookup Tables
This section uses the lookup tables listed in Table 3 as an example.
Directory |
*Table Name |
* Table Code |
Table Description |
* Field Name |
* Field Code |
* Data Type |
Field Description |
---|---|---|---|---|---|---|---|
payment_type |
payment_type |
payment_type |
None |
payment_type_id |
payment_type_id |
BIGINT |
None |
payment_type_value |
payment_type_value |
STRING |
None |
||||
vendor |
vendor |
vendor |
None |
vendor_id |
vendor_id |
BIGINT |
None |
vendor_value |
vendor_value |
STRING |
None |
||||
rate |
rate_code |
rate_code |
None |
rate_code_id |
rate_code_id |
BIGINT |
None |
rate_code_value |
rate_code_value |
STRING |
None |
Procedure
- On the DataArts Architecture console, choose in the navigation pane on the left.
- Create three lookup table directories: payment_type, vendor, and rate.
Perform the following procedure to create a directory named payment_type. The procedure for creating other directories is similar.
- On the Lookup Tables page, click above the directory tree to create a directory.
Figure 8 Lookup table directory tree
- In the dialog box displayed, enter a directory name, select a parent directory, and click OK.
Figure 9 Creating a directory for lookup tables
- On the Lookup Tables page, click above the directory tree to create a directory.
- Create three lookup tables: payment_type, vendor, and rate_code.
Perform the following procedure to create a lookup table named payment_type. The procedure for creating other lookup tables is similar.
- On the Lookup Tables page, click payment_type in the directory tree, and click Create on the page displayed.
Figure 10 Lookup Tables page
- Set the parameters based on Table 3 and click Save.
Figure 11 Creating a lookup table
- Refer to 3.a to 3.b to create the lookup table vendor in the vendor directory and the lookup table rate_code in the rate directory.
Figure 12 Creating a lookup table named vendor
Figure 13 Creating a lookup table named rate_code
- On the Lookup Tables page, click payment_type in the directory tree, and click Create on the page displayed.
- Enter values for the three lookup tables payment_type, vendor, and rate_code.
On the Lookup Tables page, locate the row that contains the lookup table payment_type, and choose Table 4.
in the Operation column. On the page displayed, click Add to add the values listed inTable 4 Values to be added for the lookup table payment_type payment_type_id
payment_type_value
1
Credit card
2
Cash
3
No charge
4
Dispute
5
Unknown
6
Voided trip
Return to the Lookup Tables page, locate the row that contains the lookup table vendor, and choose Table 5.
in the Operation column. On the page displayed, click Add to add the values listed inTable 5 Values to be added for the lookup table vendor vendor_id
vendor_value
1
A Company
2
B Company
Return to the Lookup Tables page, locate the row that contains the lookup table rate_code, and choose Table 6.
in the Operation column. On the page displayed, click Add to add the values listed in - Return to the Lookup Tables page, select the three lookup tables, and click Publish.
- In the Apply for Publication dialog box, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
Creating and Publishing Data Standards
In this example, you need to create the three data standards listed in Table 7.
Directory |
*Standard Name |
* Standard Code (Custom) |
*Data Type |
Data Length |
Lookup Table |
*Lookup Table Field |
Description |
---|---|---|---|---|---|---|---|
payment_type |
payment_type |
payment_type |
Long integer (BIGINT) |
None |
payment_type |
payment_type_id |
None |
vendor |
vendor |
vendor |
Long integer (BIGINT) |
None |
vendor |
vendor_id |
None |
rate |
rate_code |
rate_code |
Long integer (BIGINT) |
None |
rate_code |
rate_code_id |
None |
- On the DataArts Architecture console, choose in the navigation pane on the left.
- If you access the Data Standards page for the first time, you must customize a template. The custom template can be modified in Configuration Center. Additionally, select Lookup table, as shown in the following figure.
Figure 14 Customize Template
- Create three directories for data standards: payment_type, vendor, and rate_code.
In the upper part of the directory tree on the Data Standards page, click . In the dialog box displayed, enter the directory name as payment_type, select a parent directory, and click OK.
Figure 15 Creating a directory for data standards
- Create three data standards: payment method, Suppliers, and rate code.
- In the directory tree on the Data Standards page, select the required directory and click Create on the page displayed on the right.
- On the Create Data Standard page, configure the three data standards by referring to the following figures, and click Save. In this example, only a few parameters are selected for the data standard template. You can customize a data standard template by referring to Configuration Center.
Figure 16 Creating a data standard named payment method
Figure 17 Creating a data standard named Suppliers
Figure 18 Creating a data standard named rate code
- Return to the Data Standards page, select the three data standards in the list, and click Publish.
- In the Apply for Publication dialog box, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
ER Modeling: Creating a Model at the SDI and DWI Layers Respectively
During ER modeling, create an ER model at the SDI and DWI layer, respectively, and import the original data table to the ER model at the SDI layer through by reversing the database, and create a standard service table named standard travel data in the ER model at the DWI layer.
- On the DataArts Architecture page, choose
in the left navigation pane.
- If no ER model has been created, a dialog box is displayed, asking you to create a hierarchical governance model. You can create an SDI ER model named sdi and then create a DWI ER model named dwi. Click OK.
Figure 19 Dialog box for creating a hierarchical governance model
- If you have created ER models before, click to create physical models, as shown in the following figure.
Figure 20 ER Modeling page
- Create an ER model at the SDI layer named sdi. In the Physical Models area, click . In the displayed dialog box, configure required parameters and click OK.
Figure 21 Creating a physical model named sdi
- Create an ER model at the DWI layer named dwi. In the Physical Models area, click . In the displayed dialog box, configure required parameters and click OK.
Figure 22 Creating a physical model named dwi
- Create an ER model at the SDI layer named sdi. In the Physical Models area, click . In the displayed dialog box, configure required parameters and click OK.
- If no ER model has been created, a dialog box is displayed, asking you to create a hierarchical governance model. You can create an SDI ER model named sdi and then create a DWI ER model named dwi. Click OK.
- In the Data Warehouse Layers part, click the newly created SDI ER model. Choose
, and click Reverse Database on the page displayed on the right to import the source table.Figure 23 Model directory
In the Reverse Database dialog box, set the parameters and click OK. In this example, select the original data table in the source layer database demo_sdi_db.
Figure 24 Reverse Database dialog box
After the database is reversed successfully, click Close. You can view the imported table in the table list.
Figure 25 Viewing the imported table
- Perform the following steps to create a standard service table named standard travel data:
- In the Data Warehouse Layers part, click the newly created DWI ER model. Choose , and click Create on the page displayed on the right.
- On the Basic Settings page, set the parameters as follows.
Figure 26 Basic settings of the trip data table
- Click the Table Fields tab and then Add. Add the fields listed in Table 8. Then click in the Data Standard column of the rows where the vendor ID, rate code ID, and payment type reside to associate with the Vendor, Rate Code ID, and Payment Type standards, respectively. Figure 27 lists the fields to be added.
Table 8 Fields in the standard travel data table No.
Field Name
Field Code
Data Type
Data Standard
Primary Key
Partition
Not Null
Tag
1
Vendor ID
vendor_id
Long integer (BIGINT)
Vendor
Deselected
Deselected
Selected
-
2
Pickup Time
tpep_pickup_datetime
Timestamp (TIMESTAMP)
-
Deselected
Deselected
Selected
-
3
Drop-off Time
tpep_dropoff_datetime
Timestamp (TIMESTAMP)
-
Deselected
Deselected
Selected
-
4
Passenger Quantity
passenger_count
Character (STRING)
-
Deselected
Deselected
Selected
-
5
Trip Distance
trip_distance
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
6
Rate Code
rate_code_id
Long integer (BIGINT)
Rate code
Deselected
Deselected
Selected
-
7
Storage Forwarding Flag
store_fwd_flag
Character (STRING)
-
Deselected
Deselected
Selected
-
8
Pickup Location
pu_location_id
Character (STRING)
-
Deselected
Deselected
Selected
-
9
Drop-off Location
do_location_id
Character (STRING)
-
Deselected
Deselected
Selected
-
10
Payment Type
payment_type
Long integer (BIGINT)
Payment type
Deselected
Deselected
Selected
-
11
Fare
fare_amount
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
12
Extra Fee
extra
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
13
MTA Tax
mta_tax
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
14
Handling Fee
tip_amount
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
15
Toll
tolls_amount
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
16
Improvement Surcharge
improvement_surcharge
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
17
Total Fee
total_amount
High-precision (DECIMAL)(10,2)
-
Deselected
Deselected
Selected
-
You can perform the following operations on the fields in the standard travel data table
- Associating with data standards
When creating or editing a table, click the Table Fields tab. Locate the row that contains a field and click in the Data Standard column to associate the field with a data standard. After the field is associated with a data standard and the table is published, a quality job is automatically generated, and a quality rule is generated for each field associated with a data standard. You can monitor the fields based on the data standards and view the field statuses on the Quality Jobs page of the DataArts Quality console. For more information about associating data standards, see .
- Adding tags
Tags are user-defined identifiers. After adding a tag, you can search for related data assets in the DataArts Catalog module with ease.
When creating or editing a table, click the Table Fields tab, locate the row that contains a field, and click in the Tag column. In the displayed dialog box, enter a new tag name and press Enter or select an existing tag from the drop-down list.
- Associating with quality rules
After creating a table, you can associate fields in the table with quality rules. After the association is complete and the table is published, a quality job is automatically created on the DataArts Quality page after the table is published. If the table has been published, the system automatically updates the quality job. For more information about associating quality rules, see Associating with Quality Rules.
- Associating with data standards
- Click the Mappings tab and create mappings to design data sources of the table.
- If the table field comes from different relationship models, you must create multiple mappings. In each mapping, you only need to set the source field for the field that comes from the current mapping.
- If the table fields come from multiple tables in the same ER model, you can create a mapping. You can set Join for multiple tables of the mapping and set source fields for the fields in the table.
In this example, you only need to create one mapping. Click Create to create a mapping.
- Mapping is automatically generated, but is also configurable.
- Model: Select sdi.
- Table: Select the original data table sdi_taxi_trip_data, from where data of the standard travel table comes.
Figure 28 Creating a mapping
- Field Mapping
In the Field Mapping area, set source fields for the fields in the table in sequence. The selected source fields must have the same meaning as the fields in the table. As shown in Figure 29, the generated SQL statement is displayed at the bottom of the Field Mapping area.
- On the DataArts Architecture page, choose
You can choose DataArts Factory > Job Development to view the created ETL jobs. By default, ETL jobs are scheduled at 00:00 every day.
in the left navigation pane, and click the Functions tab. On the page displayed, if Create data development jobs is selected for Model Design Process, the system creates an ETL job during data development based on the table mapping information during table release. An ETL node is generated for each mapping, and the job name starts with Database name_Table code. Currently, this function is in the internal test stage. Only DLI-to-DLI and DLI-to-DWS mapping jobs can be created.
- In this example, the function of automatically creating ETL jobs is not enabled. The function provides only the data flow direction for data development. During data development, you can refer to the mapping to write SQL scripts.
- On the DataArts Architecture page, choose
- After configuring the mapping, you have finished configuring the taxi trip data table. Click Save.
- Select the created model and choose More > Export. In the displayed dialog box, select Table for Export and click OK to export the model. Then export the sdi model in the same way. The exported models can be used as backups and imported when needed in the future.
Figure 30 Export Model dialog box
- Publish table models.
- Publish the source table imported to the SDI ER model in 2. After the table is published, you can use DataArts Studio to manage and monitor the source table.
Return to the ER Modeling page, select the sdi model in the model directory. Select the sdi_taxi_trip_data table in the list on the right, and click Publish. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
- Publish a table of the DWI ER model.
Return to the ER Modeling page, select the dwi model in the model directory. Select the dwi_table_trip_data table in the list on the right, and click Publish. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
- Publish the source table imported to the SDI ER model in 2. After the table is published, you can use DataArts Studio to manage and monitor the source table.
- After the application is approved, you can view Status and Sync Status on the ER Modeling page.
Publishing is an asynchronous operation. You can click to refresh the status. After table publishing application is approved, the system performs operations such as creating tables and synchronizing technical assets and business assets based on the configurations of Model Design Process on the Function Settings tab page in Configuration Center. The synchronization status is displayed in the Sync Status column of the table on the Information Architecture page.
- If the Sync Status is successful, the table is published successfully. Move the cursor over in the Sync Status column. If the message Table created successfully is displayed, the table has been successfully created in the corresponding data source.
- If one or more items in the in the Sync Status column fail to be synchronized, you can refresh the status. If the fault persists, choose
Locate the failure cause based on the error log and rectify the fault. Then return to the ER Modeling page, select the tables to be synchronized from the list, and choose More > Synchronize to synchronize the tables again. If the fault persists, contact technical support.
to view logs.
Figure 31 Checking the table status
Click a table name in the list to view the table details. Data Source indicates the location of the table.
Figure 32 Table details
Creating and Publishing Dimensions for the DWR Layer
During dimension modeling, create three lookup table dimensions (vendor, rate_code, and payment_type) and one hierarchy dimension (date) for the DWR layer.
- On the DataArts Architecture console, choose in the navigation pane on the left.
- Create the three lookup table dimensions listed in Table 9.
Table 9 Lookup table dimensions *Subject
*Dimension Name
* Dimension Code
*Type
*Owner
Description
*Data Connection Type
*Data Connection Name
*Database
Lookup Table
vendor
vendor
dim_vendor
Lookup table
-
None
MRS_HIVE
mrs_hive_link
demo_dwr_db
vendor
public_dimension
rate_code
dim_rate_code
Lookup table
-
None
MRS_HIVE
mrs_hive_link
demo_dwr_db
rate
public_dimension
payment_type
dim_payment_type
Lookup table
-
None
MRS_HIVE
mrs_hive_link
demo_dwr_db
payment_type
- Click the Dimensions tab, choose Figure 33 Dimensional modeling
in the subject tree, and click Create to create a dimension named Suppliers.
- On the Create Dimension page, set the parameters as shown in the figure below and click Save.
Figure 34 Creating a dimension named Suppliers
- Click the Dimensions tab, choose Figure 35 Creating a dimension named rate code
in the subject tree, and click Create to create a dimension named rate code. On the Create Dimension page, set the parameters as shown in the figure below and click Save.
- Click the Dimensions tab, choose Figure 36 Creating a dimension named payment method
in the subject tree, and click Create to create a dimension named payment method. On the Create Dimension page, set the parameters as shown in the figure below and click Save.
- Click the Dimensions tab, choose
- Create a hierarchy dimension named date.
- On the Dimensional Modeling tab page, choose in the subject tree. Then click Create on the Dimensions tab page to create a dimension named date dimension.
- Configure the basic settings and physicalization settings as shown in the figure below.
Table 10 Date dimension *Subject
*Dimension Name
* Dimension Code
*Type
*Owner
Description
*Data Connection Type
*Data Connection Name
*Database
date
date dimension
dim_date
Hierarchy
-
None
MRS_HIVE
mrs_hive_link
demo_dwr_db
Figure 37 Date dimension
- In the Field Settings area, add fields as described in the table below.
Table 11 Field settings No.
Field Name
Field Code
Data Standard
Data Type
Surrogate Key
Primary Key
Partition
Not Null
1
dim_date_key
dim_date_key
-
TIMESTAMP
Selected
Selected
Not selected
Selected
2
real_time
real_time
-
TIMESTAMP
Not selected
Not selected
Not selected
Not selected
3
minute_id
minute_id
-
BIGINT
Not selected
Not selected
Not selected
Not selected
4
minute
minute
-
BIGINT
Not selected
Not selected
Not selected
Not selected
5
hour_id
hour_id
-
BIGINT
Not selected
Not selected
Not selected
Not selected
6
hour
hour
-
BIGINT
Not selected
Not selected
Not selected
Not selected
7
day_id
day_id
-
BIGINT
Not selected
Not selected
Not selected
Not selected
8
day
day
-
STRING
Not selected
Not selected
Not selected
Not selected
9
month_id
month_id
-
BIGINT
Not selected
Not selected
Not selected
Not selected
10
month
month
-
STRING
Not selected
Not selected
Not selected
Not selected
11
year_id
year_id
-
BIGINT
Not selected
Not selected
Not selected
Not selected
12
year
year
-
BIGINT
Not selected
Not selected
Not selected
Not selected
Figure 38 Field settings
- In the Hierarchy Settings area, click Add to create two layers as shown in the figures below.
Figure 39 Layer 1
Figure 40 Layer 2
- Click Save.
- Return to the Dimensions tab page, select the four new dimensions in the dimension list, and click Publish.
- In the Apply for Publication dialog box, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
- After a dimension is published and approved, the system automatically creates a dimension table for the dimension. The name and code of the dimension table are the same as those of the dimension. On the Dimensional Modeling page, click the Dimension Tables tab to view the created dimension table.
In the dimension table list, you can view Sync Status of the dimension tables.
- If all items in Sync Status are displayed as Succeeded, the dimension is published and the dimension table is created in the database.
- If an item in Sync Status is displayed as Failed, click View History in the row. On the page displayed, click the History tab to view logs. Troubleshoot the fault based on the logs. After the fault is rectified, select the dimension table, click above the dimension table list, and click OK in the dialog box displayed. If the fault persists, contact technical support for assistance.
Figure 41 Sync Status of the dimension tables
Creating and Publishing a Fact Table for the DWR Layer
During dimensional modeling, create a fact table named stroke_order for the DWR layer.
- On the DataArts Architecture console, choose in the navigation pane on the left.
- Click the Fact Tables tab, choose in the subject tree, and click Create to create a fact table named stroke_order.
In the Basic Settings area on the Create Fact Table page, set the following parameters:
- Subject: Subject Area Group: City transportation > Subject Area: Trip records > Business Object: Trip facts
- Table Name: stroke_order
- Table Code: fact_stroke_order
- Data Connection Type: MRS_HIVE
- Data Connection Name: mrs_hive_link
- Database: demo_dwr_db
- Table Type: HIVE_TABLE
- Owner: an owner in the drop-down list box
- Description: None
In the Field Settings area, choose Table 12.
. In the dialog box displayed, select the dimensions rate_code, vendor, payment_type, and date, and click OK. Choose . In the dialog box displayed, select the dimension date and click OK. In the dimension field list, adjust the sequence of the dimension fields and modify the information about the two date dimensions, as listed inTable 12 Dimension fields No.
Field Name
Field Code
Data Type
Primary Key
Partition
Not Null
Associated Standard
Associated Dimension
Role
Description
1
rate_code_id
rate_code_id
BIGINT
Not selected
Not selected
Not selected
-
rate_code
dim_
-
2
vendor_id
vendor_id
BIGINT
Not selected
Not selected
Not selected
-
vendor
dim_
-
3
payment_type_id
payment_type_id
BIGINT
Not selected
Not selected
Not selected
-
payment_type
dim_
-
4
pickup_date_key
dim_pickup_date_key
TIMESTAMP
Not selected
Not selected
Not selected
-
Date
dim_pickup
Date dimension table
5
tpep_dropoff_datetime
dim_dropoff_date_key
TIMESTAMP
Not selected
Not selected
Not selected
-
Date
dim_dropoff
Date dimension table
In the Field Settings area, choose Table 13 in sequence.
and create the fields listed inTable 13 Measure fields No.
Field Name
Field Code
Data Type
Primary Key
Partition
Not Null
Associated Standard
6
pu_location_id
pu_location_id
STRING
Not selected
Not selected
Not selected
-
7
do_location_id
do_location_id
STRING
Not selected
Not selected
Not selected
-
8
fare_amount
fare_amount
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
9
extra
extra
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
10
mta_tax
mta_tax
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
11
tip_amount
tip_amount
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
12
tolls_amount
tolls_amount
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
13
improvement_surcharge
improvement_surcharge
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
14
total_amount
total_amount
DECIMAL (10,2)
Not selected
Not selected
Not selected
-
Figure 42 Fact table fields
- After the configuration, click Publish.
- In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
- Return to the Fact Tables tab page, find the new fact table in the list, and view Sync Status.
- If all items in Sync Status are displayed as Succeeded, the fact table is published and created in the database.
- If an item in Sync Status is displayed as Failed, choose . On the page displayed, click the History tab to view logs. Troubleshoot the fault based on the logs. After the fault is rectified, choose above the fact table list, and click OK in the dialog box displayed. If the fault persists, contact technical support for assistance.
Creating and Publishing Technical Metrics
In this example, you need to create the technical metrics listed in Table 14 and Table 15.
*Metric Name |
* Metric Code |
Data Table |
*Subject |
*Expression |
Description |
---|---|---|---|---|---|
total_amount |
sum_total_amount |
Itinerary order |
stroke_fact |
sum (total amount) |
None |
Metric |
*Data Table |
*Subject |
*Atomic Metric |
Statistical Dimension |
Time Filter |
General Filter |
---|---|---|---|---|---|---|
total_amount_(payment_type) |
Itinerary order |
stroke_statistic |
total_amount |
payment_type |
None |
None |
total_amount_(rate_code) |
Itinerary order |
stroke_statistic |
total_amount |
rate_code |
None |
None |
total_amount_(vendor,stroke_order.dim_dropoff_date_key) |
Itinerary order |
stroke_statistic |
total_amount |
vendor and stroke_order.dim_dropoff_date_key |
None |
None |
- On the DataArts Architecture console, choose in the navigation pane on the left.
- Create an atomic metric named total_amount to collect statistics on fares.
- Click the Atomic Metrics tab and click Create.
- On the Create Atomic Metric page, set the parameters as shown in the figure below and click Publish.
Figure 43 Creating an atomic metric
- Wait for the reviewer to review the application. After the application is approved, the atomic metric will be created.
- Create three derivative metrics.
- Create total_amount_(payment_type) to collect statistics on the total fares based on payment_type.
On the Technical Metrics page, click the Derivative Metrics tab and click Create. On the Create Derivative Metric page, set the parameters as shown in the figure below. After the configuration is complete, click Trial Run. In the dialog box displayed, click Execute. If the trial running is successful, click Save.
Figure 44 Creating a derivative metric named total_amount_(payment_type)
- Create total_amount_(rate_code) to collect statistics on the total fares based on rate_code.
On the Technical Metrics page, click the Derivative Metrics tab and click Create. On the Create Derivative Metric page, set the parameters as shown in the figure below. After the configuration is complete, click Trial Run. In the dialog box displayed, click Execute. If the trial running is successful, click Save.
Figure 45 Creating a derivative metric named total_amount_(rate_code)
- Create total_amount_(vendor,stroke_order.dim_dropoff_date_key) to collect statistics on the total fares based on vendor.
On the Technical Metrics page, click the Derivative Metrics tab and click Create. On the Create Derivative Metric page, set the parameters as shown in the figure below. After the configuration is complete, click Trial Run. In the dialog box displayed, click Execute. If the trial running is successful, click Save.
Figure 46 Creating a derivative metric named total_amount_(vendor,stroke_order.dim_dropoff_date_key)
- Create total_amount_(payment_type) to collect statistics on the total fares based on payment_type.
- Return to the Derivative Metrics tab page, select the three derivative metrics and click Publish. In the dialog box displayed, select a reviewer and click OK. Wait for the reviewer to review the application. If you have the reviewer permissions, select Auto-review and click Submit.
Creating and Publish Summary Tables for the DM Layer
Create the three summary tables listed in Table 16 for the DM layer.
*Subject |
*Table Name |
* Table Code |
Statistical Dimension |
Data Connection Type |
*Data Connection Name |
*Database |
Owner |
Description |
---|---|---|---|---|---|---|---|---|
stroke_statistic |
payment_type |
dws_payment_type |
payment_type |
MRS_HIVE |
mrs_hive_link |
demo_dm_db |
- |
None |
stroke_statistic |
rate_code |
dws_rate_code |
rate_code |
MRS_HIVE |
mrs_hive_link |
demo_dm_db |
- |
None |
stroke_statistics |
vendor |
dws_vendor |
vendor and stroke_order.dim_dropoff_date_key |
MRS_HIVE |
mrs_hive_link |
demo_dm_db |
- |
None |
- On the DataArts Architecture console, choose in the left navigation pane.
- Click the Summary Tables tab.
- Create three summary tables: payment_type, rate_code, and vendor.
- On the Summary Tables page, choose
Set the basic settings as shown in the figure below.
Figure 47 Creating a summary table named payment method statistics
In the Time Partition area, enter the field code dtime and select the data type TIMESTAMP. After a table is published, data is written to the table based on the field added here.
Figure 48 Time Partition settings
In the Metric Settings area, click Add to add the derivative metric total_amount_(payment_mode). You can add only published derivative or compound metrics that are associated with the specified statistical dimension.
Figure 49 Metric settings
Click Save.
in the directory tree, and click Create to create a summary table named payment method statistics. On the Create Summary Table page, set the parameters and click Save.
- On the Summary Tables page, choose Figure 50 Creating a summary table named rate statistics (Basic Settings)
Figure 51 Creating a summary table named rate statistics (Metric Settings)
in the directory tree, and click Create to create a summary table named rate statistics. On the Create Summary Table page, set the parameters and click Save.
- On the Summary Tables page, choose Figure 52 Creating a summary table named supplier statistics (Basic Settings)
Figure 53 Creating a summary table named supplier statistics (Metric Settings)
in the directory tree, and click Create to create a summary table named supplier statistics. On the Create Summary Table page, set the parameters and click Save.
- On the Summary Tables page, choose
- Return to the Dimensions tab page, select the three new summary tables in the dimension list, and click Publish.
- In the dialog box displayed, select a reviewer and click OK. After the reviewer approves the publishing application, the summary table is automatically created. If you have the reviewer permissions, select Auto-review and click Submit.
- Return to the Summary Tables tab page, find the new summary tables in the list, and view Sync Status.
- If all items in Sync Status are displayed as Succeeded, the summary tables are published and created in the database.
- If an item in Sync Status is displayed as Failed, choose in the row. On the page displayed, click the History tab to view logs. Troubleshoot the fault based on the logs. After the fault is rectified, choose above the summary table list, and click OK in the dialog box displayed. If the fault persists, contact technical support for assistance.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.