Updated on 2025-12-19 GMT+08:00

Overview

In AI scenarios, data scientists often grapple with managing and analyzing multimodal data, such as text, images, audio, video, and point clouds, scattered across diverse sources. This fragmentation leads to inefficiencies in data integration and processing. To address this challenge, Fabric SQL, a serverless and multimodal AI data lake service, was developed. It offers a unified platform for efficiently managing, processing, and analyzing multimodal data types, providing high-quality, multidimensional support for AI applications. Its core mission is to integrate heterogeneous data sources, enabling seamless storage, processing, and intelligent analysis of cross-modal data to enhance AI model training, inference, and deployment. Additionally, Fabric Data introduces a set of Dataset APIs tailored to the workflows of data scientists and analysts, offering intuitive, declarative methods for data management, manipulation, and analysis.

Core Capabilities

  • Unified multimodal processing: Natively handles structured, semi-structured, and unstructured data with consistent processing capabilities.
  • Strong type system: Ensures type safety, reducing debugging efforts and boosting development efficiency.
  • Functional programming interface: Provides user-friendly functional programming APIs like map and flatMap, treated as first-class citizens.
  • Lazy execution & high-performance engine: Combines lazy execution with Fabric SQL's distributed engine for exceptional performance.
  • Deep integration with AI toolchains: Seamlessly integrates with leading AI frameworks (e.g., TensorFlow, PyTorch) to support end-to-end AI development workflows.