Updated on 2022-09-23 GMT+08:00

Overview

Introduction to DataArts Architecture

DataArts Architecture can be used to create entity-relationship (ER) models and dimensional models to standardize and visualize data development and output data governance methods that can guide development personnel to work with ease.

DataArts Architecture is for processing and commercializing data, and is the core module of data governance. It consists of four parts: data survey, standards design, model design, and metric design. DataArts Architecture supports DLI, POSTGRESQL, DWS, MRS Hive, and MRS Spark connections. (It supports MRS Hudi data sources through MRS Spark.)

DataArts Architecture aims to build:

  • A unified data classification system to manage all business data in directories for easier data classification, search, evaluation, and use.
  • A unified data standards system that complies with national or industrial standards to standardize each row of data and each field value and improve data quality and usability.
  • A unified data model system and a tiered enterprise data system from top to bottom based on standards definitions and data modeling. These systems can be used to construct enterprises' public data layers and subject libraries, facilitating data flow, sharing, creation, and innovation. They will make data usage more efficient, greatly reducing data redundancy, disorder, isolation, inconsistencies, and inaccuracies.

Model Design Method Overview

A data model can reflect the relationships between objects. It incorporates the key information features extracted based on business requirements. It visually represents how the internal information of an enterprise is organized. A data model must be capable of simulating scenarios, easy-to-understand, and easily implemented in the IT system.

ER and dimensional modeling are both used on DataArts Architecture.

  • ER modeling

    ER modeling describes the business processes within an enterprise. Compliant with the third normal form (3NF), ER modeling is designed for data integration. It is used for combining and merging data with similarities by subject. ER modeling results cannot be used directly for decision-making, but they are a useful tool.

    There are three different models involved in ER modeling: design conceptual models, logical models, and physical models.

    • Conceptual model is used to represent business processes and business data involved in various activities. A conceptual model illustrates the relationships between business entities.
    • Logical model is much more detailed than the conceptual model. Logical models outline business details based on entities, attributes, and relationships. They enable communication between IT and business staff. A logical model is a set of standardized logical table structures. Based on business rules, a logical model outlines business objects, data items of the business objects, and relationships between business objects.
    • Physical model: An advanced version of the logic model and used to design the database architecture for data storage with a full consideration of various technical factors. For example, the selected data warehouse is DWS or MRS_Hive.
  • Dimensional modeling

    Dimensional modeling is the construction of models based on analysis and decision-making requirements. It is mainly used for data analysis. Dimensional modeling is focused on how to quickly analyze user requirements and respond rapidly to complicated, large-scale queries.

    A multidimensional model is a fact table consisting of numeric metrics. The fact table is associated with a group of dimensional tables containing description attributes with primary or foreign keys.

    Typical dimensional models include star models and snowflake models used in some special scenarios.

    In the DataArts Architecture module of DataArts Studio, dimensional modeling involves constructing bus matrices to extract business facts and dimensions for model creation. You need to sort out business requirements for constructing metric systems and creating summary models.

DataArts Architecture Overview Page

On the DataArts Studio console, locate a workspace and click DataArts Architecture. The Overview page is displayed.

Figure 1 DataArts Architecture Overview page
  • My To-Dos
    • The My To-Dos area displays the quantity of My Applications and Pending Review.
    • Click the numbers above My Applications and Pending Review to access the My Applications and Pending Review pages, respectively.
  • Assets
    • The Assets area displays all the objects in DataArts Architecture.
    • Click the number next to each object name to access the object management page.
  • Quick Start

    The Quick Start area displays the overall process for data governance. You can click a specific operation under the process to go to the corresponding page.

  • DataArts Architecture Process
    • This area displays the DataArts Architecture process and how the DataArts Architecture module interacts with other modules of DataArts Studio. For details about the DataArts Architecture process, see DataArts Architecture Use Process.
    • You can move the cursor over the name of an object to view its description.
    • You can click the name of any object supported by DataArts Studio to access the object management page.

Information Architecture of DataArts Architecture

An information architecture is a set of component specifications that describe various types of information required for business operations and management decision-making as well as the relationships of business entities. On the Information Architecture page, you can view and manage all tables, including business tables, dimension tables, fact tables, and summary tables.

On the DataArts Studio console, locate a workspace and click DataArts Architecture. In the navigation pane, choose Information Architecture.

Perform the following operations on the Information Architecture page.
  • Search

    On the top of the Information Architecture page, click Advanced Search, set the table name, type, data source, and other filters, and click Search to search for a specific table. Then click the table name to access its details page.

  • Create

    Click Create to create a logical model, physical model, dimension table, fact table, or summary table. For details, see Designing Logical Models, Designing Physical Models, Creating Dimensions, Creating Fact Tables, or Creating Summary Tables.

  • Import

    Choose More > Import. (Currently, only tables can be imported.) Download the table template, fill in it, and upload it. Then click Close. For details, see Importing/Exporting Tables.

  • Export

    Choose More > Export to export a physical table model or DDL. For details, see Exporting a Table or DDL.

  • Synchronize

    Choose More > Synchronize to synchronize table information to DataArts Catalog as technical assets or synchronize logical models to DataArts Catalog as logical assets.

  • Modify Subject

    Choose More > Modify Subject to change the selected table to another subject.

  • Delete

    Choose More > Delete to delete a data table. A data table in the pending publishing, published, or pending suspension state cannot be deleted. A referenced data table cannot be deleted either.

  • Suspend

    Choose More > Suspend to suspend a published data table. A referenced data table cannot be suspended.

    Edited versions refer to the data that is re-edited after published.

  • Publish

    Click Publish to publish a data table. Data tables in the pending publishing, pending suspension, or published (without edited versions) state cannot be published.

  • Associate Rule

    Click Associate Rule and set the parameters to associate a quality rule with the object you select. For details, see Associating Quality Rules.

    Figure 2 Associating a quality rule with an object

    Generate Anomaly Data: If this option is selected, anomaly data is stored in the specified database based on the configured parameters.