What Is DataArts Fabric?

DataArts Fabric (DataArtsFabric) is Huawei Cloud's comprehensive, one-stop platform for data and AI development. It offers full lifecycle management, encompassing data processing, analysis, model fine-tuning, inference, deployment, and rollout. Data engineers, data scientists, and AI application developers can collaborate efficiently using familiar tools within a unified workspace, accelerating workflows from development to production. DataArts Fabric scales automatically to meet demanding application requirements, expanding resources incrementally based on actual needs. It saves up to 50% of costs compared to services with resource pools preset for peak loads.

This serverless approach, leveraging shared resource pools for diverse data and AI workloads (including CPU and NPU heterogeneous resources, and shared development/production environments), optimizes resource investment. It enables hybrid offline/online deployment and integrates training with inference, smoothing out peak and trough resource demands and significantly improving utilization. Customers benefit from a frictionless experience with zero resource thresholds, eliminating the need for cluster management and facilitating low-cost experimentation in dynamic business environments.

Architecture

DataArts Fabric offers a high-performance, highly reliable, low-latency, and cost-effective mass storage system. When integrated with Huawei Cloud big data services, it significantly reduces costs and simplifies big data management for enterprises.

SQL engine
DataArts Fabric's distributed SQL engine features layered decoupling of metadata services, computing, caching, and storage, enabling elastic resource allocation at each layer without impacting performance or availability. Statement-level elastic scaling and high-performance distributed analysis engines facilitate TB-level data queries in seconds and PB-level queries in minutes.
Distributed Ray
To overcome distributed computing challenges in data processing and ML/DL workloads, DataArts Fabric supports the Ray framework. This integration offers a unified workflow for data and machine learning engineering. DataArts Fabric Ray's Ray-Data, Ray-Train, and Ray-Serve modules facilitate distributed data preprocessing, model training, and inference services.
Online inference
DataArts Fabric includes a proprietary, high-performance elastic inference engine. Users can deploy inference jobs via the default inference service or by independently deploying custom models.
Heterogeneous resource management
DataArts Fabric offers unified management and allocation of CPU and NPU resources. Resource scheduling is supported at container-level or actor-level granularity. Furthermore, DataArts Fabric provides secure sandboxes for resource isolation and robust fault tolerance.
Multi-semantic cache acceleration
DataArts Fabric delivers cross-engine, multi-modal, and multi-semantic acceleration through various caching mechanisms, including data, model, and checkpoint caches.

Figure 1 Product architecture
Click to enlarge

Access Methods

DataArts Fabric offers multiple access methods:

A web-based management console, HTTPS-based APIs, and SDK clients for seamless compute engine integration.

Management console
DataArts Fabric can be accessed via its management console for managing Ray jobs, SQL jobs, model deployment, and model inference. This enables end-to-end data and AI development directly from the console.
APIs
For integrating DataArts Fabric into third-party systems or for secondary development, use the provided HTTPS APIs.
SDKs
To integrate DataArts Fabric functionalities into third-party systems for secondary development, utilize the SDKs. DataArts Fabric SDKs encapsulate the REST APIs in Python and Java, simplifying development.