API Overview
Fabric Data APIs are divided into five core components: data processing, data management, input/output, utility, and action.
- Data processing: Offers a variety of transformation operations such as map(), flat_map(), filter(), join(), and groupby() to perform feature engineering and structured data manipulation on datasets.
- Data management: Provides capabilities for dataset create, read, update, delete (CRUD) operations, including insert(), update(), delete(), and index creation/deletion, enabling comprehensive lifecycle management.
- Input/output: Supports seamless integration with popular lakehouse formats like Parquet, Iceberg, and Data Formation for efficient data writing.
- Utility: Includes functions like schema(), columns(), and explain_plan() for metadata queries and execution plan analysis, aiding in debugging and optimization.
- Action: Uses methods such as execute(), limit(), and take() to initiate computations and retrieve results, supporting lazy loading and on-demand execution.
|
Operation Type |
API |
Description |
|---|---|---|
|
Dataset – Data processing |
Applies a one-to-one function mapping to single-row input data. |
|
|
Applies a one-to-one function mapping to batch input data. |
||
|
Applies a one-to-many function mapping to single-row input data. |
||
|
Filters rows based on specified conditions. |
||
|
Joins multiple datasets. |
||
|
Sorts the dataset. |
||
|
Aggregates the entire dataset. |
||
|
Groups the dataset. |
||
|
Calculates the minimum value of a specified column. |
||
|
Calculates the maximum value of a specified column. |
||
|
Calculates the mean value of a specified column. |
||
|
Retrieves a list of unique values from a specified column. |
||
|
Selects specific columns from the dataset. |
||
|
Adds a column to the dataset. |
||
|
Removes specified columns from the dataset. |
||
|
Renames columns in the dataset. |
||
|
Table – Data management |
Inserts data into the dataset. |
|
|
Deletes data from the dataset. |
||
|
Updates existing data within the dataset. |
||
|
Input/Output |
Writes data to a Parquet table. |
|
|
Writes data to an Iceberg table. |
||
|
Utility |
Displays the schema of the dataset. |
|
|
Lists all column names in the dataset. |
||
|
Returns the total number of rows in the dataset. |
||
|
Prints the execution plan for a query. |
||
|
Executes and prints detailed performance metrics for a query. |
||
|
Displays statistical information about the executed query (requires prior execution). |
||
|
Action |
Triggers execution and displays the results. |
|
|
Triggers execution and returns the results. |
||
|
Outputs up to a specified number of records. |
||
|
Triggers execution and returns a single-row iterator. |
||
|
Triggers execution and returns a batch iterator. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot