Help Center/ DataArts Studio/ Service Overview/ Notes and Constraints
Updated on 2024-08-30 GMT+08:00

Notes and Constraints

Browser Constraints

The following table lists the recommended browser for logging in to DataArts Studio.
Table 1 Browser compatibility

Browser

Recommended Version

Recommended OS

Remarks

Google Chrome

126, 125, and 124

Windows 10

1920x1080 and 2560x1440 are recommended.

Firefox

127 and 126

Windows 10

1920x1080 and 2560x1440 are recommended.

Microsoft Edge

N/A (The version is updated with W10.)

Windows 10

1920x1080 and 2560x1440 are recommended.

Use Constraints

Before using DataArts Studio, you must read and understand the following restrictions:
Table 2 Restrictions for using DataArts Studio

Component

Restriction

Public

  1. DataArts Studio must be deployed based on HUAWEI CLOUD. If resources are isolated, DataArts Studio can be deployed in a full-stack DeC. In addition, DataArts Studio can be deployed on Huawei Cloud Stack or Huawei Cloud Stack Online.

    For more information about the application scenarios and differences between the full-stack DeC, Huawei Cloud Stack, and Huawei Cloud Stack Online, contact sales.

  2. DataArts Studio is a one-stop platform that provides data integration, development, and governance capabilities. DataArts Studio has no storage or computing capability and relies on the data lake base.
  3. Only one DataArts Studio instance can be bound to an enterprise project. If an enterprise project already has an instance, no more instance can be added.
  4. Different components of DataArts Studio support different data sources. You need to select a data lake foundation based on your service requirements. For details about the data lakes supported by DataArts Studio, see Data Sources Supported by DataArts Studio.

Management Center

  1. Due to the constraints of Management Center, other components (such as DataArts Architecture, DataArts Quality, and DataArts Catalog) do not support databases or tables whose names contain Chinese characters or periods (.).
  2. The free CDM cluster provided by a DataArts Studio instance has limited specifications. You are advised to use it only as an agent for a data connection in Management Center.
  3. You are advised to use different CDM clusters for a data connection agent in Management Center and a CDM migration job. If an agent and CDM job use the same cluster, they may contend for resources during peak hours, resulting in service unavailability.
  4. If a CDM cluster functions as the agent for a data connection in Management Center, the cluster cannot connect to multiple MRS security clusters. You are advised to plan multiple agents which are mapped to MRS security clusters one by one.
  5. If a CDM cluster functions as the agent for a data connection in Management Center, the cluster supports a maximum of 200 concurrent active threads. If multiple data connections share an agent, a maximum of 200 SQL, Shell, and Python scripts submitted through the connections can run concurrently. Excess tasks will be queued. You are advised to plan multiple agents based on the workload.

  6. A maximum of 200 data connections can be created in a workspace.
  7. The concurrency restriction for APIs in Management Center is 100 QPS.

DataArts Migration

  1. You can enable automatic backup and restoration of CDM jobs. Backups of CDM jobs are stored in OBS buckets. For details, see Automatic Backup and Restoration of CDM Jobs.
  2. There is no quota limit for CDM jobs. However, it is recommended that the number of jobs be less than or equal to twice the number of vCPUs in the CDM cluster. Otherwise, job performance may be affected.
  3. The DataArts Migration cluster is deployed in standalone mode. A cluster fault may cause service and data loss. You are advised to use the CDM Job node of DataArts Factory to invoke CDM jobs and select two CDM clusters to improve reliability. For details, see CDM Job.
  4. If changes occur in the connected data source (for example, the MRS cluster capacity is expanded), you need to edit and save the connection.
  5. If you have uploaded an updated version of a driver, you must restart the CDM cluster for the new driver to take effect.
  6. The number of concurrent extraction tasks for a job ranges from 1 to 300, and the total number of concurrent extraction tasks for a cluster ranges from 1 to 1,000. The maximum number of concurrent extraction tasks for a cluster depends on the CDM cluster specifications. You are advised to set the maximum number of concurrent extraction tasks to no larger than twice the number of vCPUs. The number of concurrent extraction tasks for a job should not exceed that for a cluster. If the number of concurrent extraction tasks is too large, memory overflow may occur. Exercise caution when changing the maximum number of concurrent extraction tasks.

For more constraints on DataArts Migration, see CDM Constraints.

DataArts Factory

  1. You can enable backup of assets such as scripts and jobs to OBS buckets. For details, see Managing Backups.
  2. The execution history of scripts, jobs, and nodes is stored in OBS buckets. If no OBS bucket is available, you cannot view the execution history.
  3. Resources from a HDFS can be used only by MRS Spark, MRS Flink Job, and MRS MapReduce nodes.
  4. A workspace can contain a maximum of 10,000 scripts, a maximum of 5,000 script directories, and a maximum of 10 directory levels.
  5. A workspace can contain a maximum of 10,000 jobs, a maximum of 5,000 job directories, and a maximum of 10 directory levels.
  6. A maximum of 1,000 execution results can be displayed for RDS SQL, DWS SQL, Hive SQL, DLI SQL, and Spark SQL scripts, and the data volume is less than 3 MB. If the number of execution results exceeds 1,000, you can dump them. A maximum of 10,000 execution results can be dumped.
  7. Only data of the last six months can be displayed on the Monitor Instance and Monitor PatchData pages.
  8. Only notification records of the last 30 days can be displayed.
  9. The download records age out every seven days. When aged out, download records and the data dumped to OBS are both deleted.

DataArts Architecture

  1. DataArts Architecture supports ER modeling, dimensional modeling (only star models), and data mart.
  2. The maximum size of a file to be imported is 4 MB. A maximum of 3,000 metrics can be imported. A maximum of 500 tables can be exported at a time.
  3. The quotas for the objects in a workspace are as follows:
    • Subjects: 5,000
    • Data standard directories: 500; data standards: 20,000
    • Business metrics: 100,000
    • Atomic, derivative, and compound metrics: 5,000 for each
  4. The quotas for different custom objects are as follows:
    • Custom subjects: 10
    • Custom tables: 30
    • Custom attributes: 10
    • Custom business metrics: 50

DataArts Quality

  1. The execution duration of data quality jobs depends on the data engine. If the data engine does not have sufficient resources, the execution of data quality jobs may be slow.
  2. A maximum of 50 rules can be configured for a data quality job. If necessary, you can create multiple quality jobs.
  3. By default, a maximum of 1,000 SQL statements associated with a quality job of a data connection can be executed concurrently. Excess SQL statements will be queued. The value ranges from 10 to 1000.
  4. By default, a maximum of 10,000 SQL statements associated with a quality job in a region can be executed concurrently. Excess SQL statements will be queued.
  5. In the Instance Running Status and Instance Alarm Status areas on the Dashboard page on the Metric Monitoring page, data of the last seven days is displayed. On the Alarms, Scenarios, and Metrics pages, data of the last seven, 15, or 30 days can be displayed.
  6. In the Quantity Changes area on the Dashboard page on the Quality Monitoring page, data of 30 days can be displayed. In the Alarm Trend by Severity and Rule Quantity Trend areas, data of the last seven days can be displayed.
  7. Quality reports are generated in batches on the T+1 day and retained for 90 days.
  8. If you export a quality report to OBS, the report is exported to the OBS path for storing job logs configured for the workspace. The exported record is retained for three months.

DataArts Catalog

  1. A maximum of 100 metadata collection tasks can be created in a workspace.
  2. Metadata collection tasks can be obtained through DDL SQL statements of the engine. You are not advised to collect more than 1,000 tables through a single task. If necessary, you can create multiple collection tasks. In addition, you need to set the scheduling time and frequency properly based on your requirements to avoid heavy access and connection pressure on the engine. The recommended settings are as follows:
    • If your service requires a metadata validity period of one day, set the scheduling period to max(one day, one-off collection period). This rule also applies to other scenarios.
    • If your service mainly runs in the daytime, set a scheduling time in the night during which the metadata collection has the minimum impact on the data source. This rule also applies to other scenarios.
  3. Only the jobs that are scheduled and executed in DataArts Factory generate data lineages. Tested jobs do not generate data lineages.
  4. Historical data connections of the last seven days, 15 days, or 30 days can be displayed on the Dashboard page on the Metadata Collection page.

DataArts DataService

  1. The shared edition is designed only for development and testing. You are advised to use the exclusive edition which is superior to the shared edition.
  2. A maximum of five DataArts DataService Exclusive clusters can be created in a DataArts Studio instance. Each cluster must be associated with a workspace and cannot belong to multiple workspaces.
  3. After a DataArts DataService Exclusive cluster is created, its specifications cannot be modified, and its version cannot be upgraded.
  4. The maximum number of DataArts DataService Exclusive APIs that can be created in a DataArts Studio instance is the quota of DataArts DataService Exclusive APIs (5,000 by default) or the total API quotas of the clusters in the instance, whichever is smaller. For example, if the quota of DataArts DataService Exclusive APIs for a DataArts Studio instance is 5,000, and two clusters whose API quotas are 500 and 2,000 respectively have been created in the instance, a maximum of 2,500 DataArts DataService Exclusive APIs can be created in the instance.
  5. The maximum number of DataArts DataService Exclusive APIs that can be created in a workspace is the quota of DataArts DataService Exclusive APIs (configured in the workspace information) or the total API quotas of the clusters in the instance, whichever is smaller. For example, if the quota of DataArts DataService Exclusive APIs for a workspace is 800, and two clusters whose API quotas are both 500 have been created in the workspace, a maximum of 800 DataArts DataService Exclusive APIs can be created in the workspace.
  6. A maximum of 1,000 applications can be created in a workspace.
  7. A maximum of 500 throttling policies can be created in a workspace.
  8. DataArts DataService allows you to trace and save events. For each event, DataArts DataService records information such as the date, description, and time source (a cluster). Events are retained for 30 days.
  9. From the log of a DataArts DataService Exclusive cluster, you can only obtain the last 100 access records of the cluster, evenly from all nodes of the cluster.
  10. In the APIs Called, APIs Published, Top 5 APIs by Call Rate, Top 5 APIs by Call Duration, and Top 5 APIs by Call Quantity areas on the Overview page, data of the last 12 hours, one day, seven days, or 30 days can be displayed. The total number of API calls is the sum of the number of APIs made in the last seven days (excluding the current day).