Updated on 2025-12-19 GMT+08:00

Getting Started

Installing Fabric Data SDK

  • Online installation:

    Use pip with Huawei's source for installation:

    pip install huawei_fabric_data --trusted-host pypi.cloudartifact.dgg.dragon.tools.huawei.com -i https://pypi.cloudartifact.dgg.dragon.tools.huawei.com/artifactory/cbu-pypi-public/simple/ 
  • Offline installation:

    Download the SDK package and install it using the following command:

    pip install huawei_fabric_data-0.1.0-py3-none-any.whl 

    Python 3.11 is recommended.

Using Fabric Data

  1. Create a session. Refer to Table 1 for input parameters.

    import ibis  # Import Ibis dependencies.
    from fabric_data.multimodal import ai_lake
    import logging
    import os
    con = ai_lake.connect(
        fabric_endpoint=os.getenv("fabric_endpoint"), 
        fabric_endpoint_id=os.getenv("fabric_endpoint_id"), 
        fabric_workspace_id=os.getenv("fabric_workspace_id"), 
        lf_catalog_name=os.getenv("lf_catalog_name"), 
        lf_instance_id=os.getenv("lf_instance_id"),
        access_key=os.getenv("access_key"),
        secret_key=os.getenv("secret_key"), 
        use_single_cn_mode=True,  # Whether single CN mode is enabled.
        logging_level=logging.INFO, # Set the log level.
    )

  2. Load the dataset.

    target_database = "test"
    dataset_name = "test_data"
    ds = con.load_dataset(dataset_name, database=target_database)
    ds.show()
       a  b  c
    0  1  3  1

  3. Process the data.

    import ibis.expr.datatypes as dt
    import fabric_data as fabric
    func_signature=fabric.Signature(
         parameters=[
             fabric.Parameter(name="start", annotation=int),
             fabric.Parameter(name="endnum", annotation=int),
             fabric.Parameter(name="step", annotation=int)
         ],
         return_annotation=dt.Struct({"col1": int}),
    )
    @fabric.udf.python(database=target_database, signature=func_signature)
    def py_generate_series(start, endnum, step):
        current = start
        while current <= endnum:
            yield {'col1':current}
            current += step
    ds = ds.flat_map(
       fn=py_generate_series,
       on=[ds.a, ds.b, ds.c],
       as_col='new_col',
       dpu=0.33
    )
    ds.show()
       a  b  c  new_col
    0  1  3  1   1
    1  1  3  1   2
    2  1  3  1   3

  4. Closes the session.

    con.close()

Table 1 Parameters of the connect() API

Parameter

Description

Type

Default Value

Remarks

fabric_endpoint

Endpoint of a connection.

str | None

None

For more details, refer to Regions and Endpoints.

fabric_endpoint_id

Endpoint ID of a connection.

str | None

None

For more details, refer to Obtaining an Endpoint ID.

fabric_workspace_id

Workspace ID of a connection.

str | None

None

For more details, refer to Obtaining a Workspace ID.

lf_instance_id

LakeFormation service instance ID of a connection.

str | None

None

-

lf_catalog_name

Catalog of the connection.

str | None

None

-

access_key

Access key ID of the connection.

str | None

None

For more details, refer to Obtaining an AK/SK.

secret_key

Access key of the connection.

str | None

None

-

security_token

Security token of the connection (required only when the access key ID and access key are temporary).

str | None

None

-

logging_level

Log level.

int

logging.INFO

-

use_single_cn_mode

Whether single CN mode is enabled.

bool

False

-

verify

Authentication of the connection.

bool

False

-

default_database

Specifies the name of the database to connect.

str | None

None

-