Configuring DBT to Connect to DLI for Data Scheduling and Analysis
Data Build Tool (DBT) is an open source data modeling and conversion tool that runs in Python environments. Connecting DBT to DLI can define and execute SQL transformations, supporting the entire data lifecycle management from integration to analysis. It is suitable for large-scale data analysis projects and complex data analysis scenarios.
This section describes how to configure DBT to connect to DLI.
Preparations
- Environment requirements
Make sure that your system environment meets the following requirements:
- Obtaining the dli-dbt driver package
Download the JDBC driver huaweicloud-dli-jdbc-xxx-dependencies.jar from the DLI management console.
- Connection information:
Table 1 Connection information Item
Description
How to Obtain
DLI's AK/SK
AK/SK-based authentication refers to the use of an AK/SK pair to sign requests for identity authentication.
DLI's endpoint address
Endpoint of a cloud service in a region.
DLI's project ID
Project ID, which is used for resource isolation.
DLI's region information
DLI's region information
Step 1: Create a DBT Environment
- Install dbt-core.
Install dbt-core of the recommended version.
pip install dbt-core==1.7.9
pip is a package management tool for Python that is typically installed alongside Python.
If pip is not installed, install it using Python's built-in ensurepip module.
python -m ensurepip
- Install dli-sdk-python.
Run the following installation command:
python setup.py install
- Installing dli-dbt
Download the dli-dbt driver from the DLI management console.
Run the following installation command:
python setup.py install
Run the following command to check whether dbt is successfully installed:
dbt --version
Step 2: Connect DBT to DLI
Configure the profiles.yml file to store information about the connection between DBT and DLI.
Find .dbt in the home directory of the server where DBT is installed and create or edit the profiles.yml file.
For example, in Windows, the path may be C:\Users\Username\.dbt\profiles.yml.
The file must contain the configuration of the connection between DBT and DLI. For example:
profiles: - name: dbt_dli target: dev outputs: dev: type: dli region: your-region-name project_id: your-project_id access_id: your-ak secret_key: your-sk queue: your-queue-name database: your-dli-database schema: your-dli-schema
Parameter |
Mandatory |
Description |
Example Value |
---|---|---|---|
type |
Yes |
Data source type. Set it to dli in this example. |
dli |
region |
Yes |
Region name. |
ap-southeast-2 |
project_id |
Yes |
ID of the project where DLI resources are. |
0b33ea2a7e0010802fe4c009bb05076d |
access_id and secret_key |
Yes |
AK/SK that acts as the authentication key. |
- |
queue |
Yes |
DLI queue name. |
dli_test |
database |
Yes |
Data directory name, with dli as default. If LakeFormation metadata is used, enter the data directory name. |
dli |
schema |
Yes |
Name of the DLI database used to submit jobs. |
tpch |
Step 3: Use DBT to Submit a Job to DLI
- Initialize a DBT project.
Run the following command in an empty directory to initialize a DBT project:
dbt init
- Configure the dbt_project.yml file.
Create or edit the dbt_project.yml file in the root directory of the project.
Configure the project by referring to dbt_project.yml.
Ensure that the data source name defined in profiles.yml of the project has been set in the profile file in Step 2: Connect DBT to DLI.
Figure 1 profile fileFigure 2 profile configured in the dbt_project.yml file - Verify the configuration.
Run the following command to check whether the DBT configuration is correct:
dbt debug
- Run the job.
Once the test is passed, run the following command to execute your data model:
dbt run
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot