Help Center/
DataArts Fabric/
Developer Guide/
Data-AI Convergence/
DataArts Fabric DataFrame/
Scenario Practices/
Examples of DataFrames Without UDFs
Updated on 2025-08-25 GMT+08:00
Examples of DataFrames Without UDFs
Below is an example using query1 from tpch to demonstrate the usage of DataFrame.
The query SQL is as follows:
SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= CAST('1998-09-02' AS date) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
The corresponding DataFrame logic is as follows:
import ibis # Import Ibis dependencies. con = ibis.fabric.connect( # Call the DataArts Fabric backend connection to establish a connection. endpoint=FABRIC_ENDPOINT, # Specify the service region. Refer to Regions and Endpoints. endpoint_id=FABRIC_ENDPOINT_ID, # Query endpoint_id. See "Appendix" in the API Reference. domain=FABRIC_DOMAIN, # Tenant name user=FABRIC_USER, # IAM username password=FABRIC_PASS, # IAM password project_id=FABRIC_PROJECT_ID, # How Do I Obtain a Project ID? catelog_name=IBIS_TEST_FABRIC_CATELOG, # Connect to a specified catalog. workspace_id=FABRIC_WORKSPACE_ID, # Get workspace_id. See "Appendix" in the API Reference. lakeformation_instance_id=IBIS_TEST_FABRIC_LAKEFORMATION_INSTANCE_ID, # LakeFormation instance ID obs_directory_base=OBS_DIRECTORY_BASE, # Storage path for UDFs in OBS obs_bucket_name=OBS_BUCKET_NAME, # OBS bucket name obs_server=OBS_SERVER, # OBS access address. Refer to Endpoints and Domain Names. ) t = con.table("lineitem", database="tpch") # Retrieve table information by connecting to the backend and create a table object. q = t.filter(t.l_shipdate <= add_date("1998-12-01", dd=-90)) discount_price = t.l_extendedprice * (1 - t.l_discount) charge = discount_price * (1 + t.l_tax) q = q.group_by(["l_returnflag", "l_linestatus"]) q = q.aggregate( sum_qty=t.l_quantity.sum(), sum_base_price=t.l_extendedprice.sum(), sum_disc_price=discount_price.sum(), sum_charge=charge.sum(), avg_qty=t.l_quantity.mean(), avg_price=t.l_extendedprice.mean(), avg_disc=t.l_discount.mean(), count_order=lambda t: t.count(), ) q = q.order_by(["l_returnflag", "l_linestatus"]) sql = q.compile() # Compile DataFrame into a SQL string. df = q.execute() # Execute the expression and return the result set.
Parent topic: Scenario Practices
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot