Updated on 2025-12-19 GMT+08:00

API Overview

Fabric Data APIs are divided into five core components: data processing, data management, input/output, utility, and action.

  • Data processing: Offers a variety of transformation operations such as map(), flat_map(), filter(), join(), and groupby() to perform feature engineering and structured data manipulation on datasets.
  • Data management: Provides capabilities for dataset create, read, update, delete (CRUD) operations, including insert(), update(), delete(), and index creation/deletion, enabling comprehensive lifecycle management.
  • Input/output: Supports seamless integration with popular lakehouse formats like Parquet, Iceberg, and Data Formation for efficient data writing.
  • Utility: Includes functions like schema(), columns(), and explain_plan() for metadata queries and execution plan analysis, aiding in debugging and optimization.
  • Action: Uses methods such as execute(), limit(), and take() to initiate computations and retrieve results, supporting lazy loading and on-demand execution.
Table 1

Operation Type

API

Description

Dataset – Data processing

map

Applies a one-to-one function mapping to single-row input data.

map_batchs

Applies a one-to-one function mapping to batch input data.

flat_map

Applies a one-to-many function mapping to single-row input data.

filter

Filters rows based on specified conditions.

join

Joins multiple datasets.

order_by

Sorts the dataset.

aggregate

Aggregates the entire dataset.

groupby

Groups the dataset.

min

Calculates the minimum value of a specified column.

max

Calculates the maximum value of a specified column.

mean

Calculates the mean value of a specified column.

unique

Retrieves a list of unique values from a specified column.

select_columns

Selects specific columns from the dataset.

add_column

Adds a column to the dataset.

drop_columns

Removes specified columns from the dataset.

rename_columns

Renames columns in the dataset.

Table – Data management

insert

Inserts data into the dataset.

delete

Deletes data from the dataset.

update

Updates existing data within the dataset.

Input/Output

write_parquet

Writes data to a Parquet table.

write_iceberg

Writes data to an Iceberg table.

Utility

schema

Displays the schema of the dataset.

columns

Lists all column names in the dataset.

count

Returns the total number of rows in the dataset.

explain_plan

Prints the execution plan for a query.

explain_performance

Executes and prints detailed performance metrics for a query.

stats

Displays statistical information about the executed query (requires prior execution).

Action

show

Triggers execution and displays the results.

execute

Triggers execution and returns the results.

limit

Outputs up to a specified number of records.

take

Triggers execution and returns a single-row iterator.

take_batch

Triggers execution and returns a batch iterator.