Overview

Fabric Data enables the processing of multimodal data, such as images (PNG/JPG), audio (WAV/MP3), and video (MP4/AVI), through UDFs. Before applying UDFs to process this data, it must first be stored within a Fabric table. The following section demonstrates how to import image data into DataArts Fabric as an example:

Prepare image data.

Read the image data and write it into a Parquet file.

import pyarrow as pa
import pandas as pd
data = {"img": [
    {'filename': "image.png", 'format': 'png', 'height': 1, 'width': 2},
    ]
}
with open("image.png", 'rb') as file:
    data["img"][0]["data"] = file.read()
df = pd.DataFrame(data)
schema = pa.schema([('img', pa.struct([('filename', pa.string()), ('format', pa.string()), ('height', pa.int64()), ('width', pa.int64()), ('data', pa.binary())]))])
table = pa.Table.from_pandas(df, schema=schema)
pa.parquet.write_table(table, "image_type.parquet")

Upload the generated Parquet file to OBS.

Create a table containing the image type, specifying the location as the OBS path from the previous step.

import os
from fabric_data.multimodal import ai_lake
from fabric_data.multimodal.types import image
# Set the target database name
target_database = "multimodal_lake"

import logging
con = ai_lake.connect(
    fabric_endpoint=os.getenv("fabric_endpoint"),
    fabric_endpoint_id=os.getenv("fabric_endpoint_id"),
    fabric_workspace_id=os.getenv("fabric_workspace_id"),
    lf_catalog_name=os.getenv("lf_catalog_name"),
    lf_instance_id=os.getenv("lf_instance_id"),
    access_key=os.getenv("access_key"),
    secret_key=os.getenv("secret_key"),
    default_database=target_database,
    use_single_cn_mode=True,
    logging_level=logging.WARN,
)
con.set_function_staging_workspace(
    obs_directory_base=os.getenv("obs_directory_base"),
    obs_bucket_name=os.getenv("obs_bucket_name"),
    obs_server=os.getenv("obs_server"),
    access_key=os.getenv("access_key"),
    secret_key=os.getenv("secret_key"))
con.create_table("image_table", schema={"img": image.Image}, external=True, location="obs://image_type")

Parent topic: Multimodal Data Types

Previous topic: Multimodal Data Types

Next topic: Images