Help Center/ Data Warehouse Service / Best Practices/ Overview

Updated on 2026-01-06 GMT+08:00

View PDF

Overview

This manual is a comprehensive collection of best practices in using DWS. For each task, a detailed description of the solution and operation procedures is provided. Check this manual based on the task at hand.

Import and Export

**Table 1** DWS import and export
Document	Description
Importing Data to DWS	Describes how to import data to and export data from DWS using tools.
Transferring Data Between OBS Buckets and DWS Clusters	Describes how to upload sample data to OBS, access data in OBS buckets using OBS foreign tables, import data to DWS, or export data from a DWS table to an OBS bucket.
Using GDS to Import Table Data from a Remote Server to a DWS Cluster	Describes how to use GDS to import data on a remote server to DWS.
Importing Table Data from MRS Hive to a DWS Cluster	Describes how to enable DWS to remotely access or read MRS data sources using an HDFS foreign table.
Enabling Cross-Cluster Access of Hive Metastore Through an External Schema	Describes how to enable a decoupled storage and compute cluster to remotely access HiveMetaStore metadata using an external schema.
Importing Table Data from DLI to a DWS Cluster	Describes how to import data from DLI to DWS using the DWS foreign table function.
Exporting ORC Data from a DWS Cluster to an MRS Cluster	Describes how to export data in ORC format to MRS using an HDFS foreign table.

Data Migration

**Table 2** DWS data migration
Document	Description
Using CDM to Migrate Oracle Data to a DWS Cluster	Describes how to migrate data from Oracle tables to DWS.
Using CDM to Migrate MySQL Data to a DWS Cluster	Describes how to use Cloud Data Migration (CDM) to migrate MySQL data to DWS in batches.
Using a Flink Job of DLI to Synchronize MySQL Data to a DWS Cluster in Real Time	Describes how to use a Flink job of DLI to synchronize MySQL data to DWS in real time.
Using CDM to Migrate Data from Hologres to a DWS Cluster	Describes how to use CDM to migrate data from Hologres to DWS.
Using Kettle to Migrate Small Tables from AWS Redshift to a DWS Cluster	Describes how to use the open-source tool Kettle to migrate data from AWS Redshift to DWS.
Using CDM to Migrate Data from AnalyticDB for MySQL to a DWS Cluster	Describes how to use CDM to migrate data from AnalyticDB for MySQL to DWS.
Using a Flink Job of DLI to Synchronize Kafka Data to a DWS Cluster in Real Time	Describes how to use a Flink job of DLI to synchronize consumption data of DMS for Kafka to DWS in real time.
Migrating Data Between DWS Clusters Using GDS	Describes how to migrate 15 million rows of data between two DWS clusters within minutes based on the high concurrency of GDS import and export.

Data Analysis

**Table 3** DWS data analysis
Document	Description
Using DWS to Query Vehicle Routes at Traffic Checkpoints in Seconds	Describes how to analyze passing vehicles at checkpoints. In this practice, 890 million data records from checkpoints are loaded to a single database table on DWS for accurate and fuzzy query, demonstrating the ability of DWS to perform high-performance query for historical data.
Using DWS to Analyze the Supply Chain Requirements of a Company	Describes how to load the sample data set from OBS to a DWS cluster and perform data queries, demonstrating multi-table analysis and theme analysis of DWS in data analysis scenarios.
Using DWS to Analyze the Operational Status of a Retail Department Store	Describes how to load the daily business data of each retail store from OBS to the corresponding table in a DWS cluster, and then summarize and query KPIs such as the store revenue, customer flow, monthly sales ranking, monthly customer flow conversion rate, monthly price-rent ratio, and sales per unit area.
Interconnecting DWS with Power BI	Describes how to install Power BI on a Windows ECS and connect it to DWS using an on-premises data gateway.

Decoupled Storage and Compute

**Table 4** DWS with decoupled storage and compute
Document	Description
Usage Suggestions and Performance Optimization for DWS 3.0 Storage-Compute Decoupled Clusters	Describes the performance optimization and precautions of the clusters with decouple storage and compute.
Using Disk Cache to Improve Query Performance	Describes disk cache of DWS. In the DWS clusters with decoupled storage and compute, service data is stored in OBS buckets. If users frequently access OBS, a performance bottleneck may occur. You can use the local disk cache function of DWS to store frequently used hot data in the cache and sort the data with high traffic based on the actual usage frequency to improve query performance.

Data Development

**Table 5** DWS data development
Document	Description
Using Hybrid Data Warehouses and HStore Tables	Describes hybrid data warehouses and HStore tables. Based on the large-scale data query and analysis capabilities, hybrid data warehouses of DWS feature high concurrency, high performance, low latency, and low cost in transaction processing. HStore tables play a key role in the digital transformation of the Internet, IoT, and traditional industries.
Converting a Time Series Table to an HStore Table	Describes the benefits of HStore tables over time series tables in terms of data import, compression ratio, and query performance. Therefore, you are advised to replace time series tables with HStore tables.
Using Hybrid Row-Column Storage	Describes hybrid row-column storage provided by DWS. It can store both row- and column-format data in one table. The data of the two formats is independently maintained and synchronously updated. The query optimizer selects the optimal access mode based on the actual query path. Describes how users design and use hybrid row-column storage structures based on typical scenarios and performance comparison for efficient data handling.
Cutting Costs by Switching Between Cold and Hot Data Storage in DWS	Describes how DWS classifies data into hot data and cold data by time based on the data usage frequency to improve analysis performance and reduce costs.
Cutting Partition Maintenance Costs for the E-commerce and IoT Industries by Leveraging Automatic Partition Management Feature	Describes how DWS automatically manages partitions. Based on the table-level parameters (period and ttl), DWS can automatically create partitions and delete expired partitions. This function is available for time-based partitioned tables (such as orders and IoT data). This function frees users from maintaining partitioned tables, significantly reducing O&M costs and improving query performance.
Improving Development Efficiency by Leveraging the View Decoupling and Rebuilding Function	Describes when and how to use the automatic view rebuilding function. Base table objects cannot be modified independently due to view and table dependency. To solve this problem, DWS supports view decoupling and rebuilding.
Best Practices of GIN Index	Describes how to use GIN indexes to search array and JSONB types, as well as how to conduct full-text searches.
Encrypting and Decrypting Data Columns	Describes SQL function encryption provided by DWS. Data encryption is widely used in information systems to prevent unauthorized access and data leakage. As the core of an information system, DWS also provides data encryption functions, including transparent encryption and SQL function encryption.
Managing Data Permissions Through Views	Describes how to use views to allow various users to access specific data within a given table, ensuring data permissions management and security.

Database Management

**Table 6** DWS database management
Document	Description
Role-based Access Control (RBAC)	Provides an example of applying the role-based access control (RBAC) model to DWS. RBAC grants permissions to roles so that users can obtain the permissions of these roles by associating with the roles.
Configuring Read-Only Permissions	Describes how to configure read-only users on the console to manage account permissions. These users can view and connect to clusters but cannot execute high-risk actions like deleting them.
Best Practices for User Management	Describes the permissions of system administrators and common users as well as how to create users and query user information.
Viewing Table and Database Information	Provides SQL examples for querying tables and databases.
Best Practices of Database SEQUENCE	Describes best practices and examples for creating and managing sequences.

Performance Tuning

**Table 7** DWS performance tuning
Document	Description
Optimizing Table Structure Design to Enhance DWS Query Performance	Describes how to optimize table performance in DWS by properly designing the table structure (for example, by selecting the table model, table storage mode, compression level, distribution mode, distribution column, partitioned table, and local clustering).
Excellent Practices for SQL Queries	Describes how to adjust SQL statements according to certain rules to improve the SQL execution while ensuring correct results.
Using Bitmap Indexes to Reduce Space and Write Costs in Point Queries	Describes bitmap indexes of DWS. Point queries on high-selectivity columns such as primary keys, order numbers, and user IDs are common in real-world applications. Indexing these columns is the standard approach to optimize query performance However, in large data volume, wide table structure, or batch import scenarios, the conventional indexing mechanism performs poorly. To resolve such issues, DWS provides lightweight column-level bitmap indexes for HStore tables. Such indexes speed up queries and minimize space and write costs. They are especially suitable for optimizing the performance of point queries on columns.
Using the Turbo Engine to Improve Data Query Performance	Describes the Turbo engine. Compared to the original column storage execution engine, the Turbo engine optimizes memory and disk storage formats for strings and numeric types, and enhances the performance of common operators like sort, aggregate (agg), join, and scan. This results in doubling the overall performance of the executor and significantly reducing computing costs.
Data Skew Queries	Provides methods for querying data skew.
Using PGXC_STAT_ACTIVITY to Analyze SQL Statements That Are Being Executed to Handle DWS Performance Issues	Describes how to use the PG_STAT_ACTIVITY view to analyze and locate SQL problems, such as excessive SQL connections, long SQL query time, and SQL query blocking, that developers may encounter during development.

Cluster and Resource Load Management

**Table 8** DWS cluster and resource load management
Document	Description
Binding Different Resource Pools to Two Types of Jobs to Balance Load for DWS	Demonstrates how to use DWS for resource management, helping enterprises eliminate bottlenecks in concurrent queries. SQL jobs can run smoothly without affecting each other and consume less resources than before.
Scaling Options for DWS with a Coupled Storage-Compute Architecture	Describes the scalability, a critical feature of DWS. It indicates that DWS can increase or decrease compute and storage resources to meet changing demand, achieving a balance between performance and cost.
On-demand VW Scaling in the Storage and Compute Decoupling: Cost-Effective Adaptation to Flexible and Changing Service Requirements	This section describes how to add or delete virtual warehouses (VWs) for DWS clusters with decoupled storage and compute. In this way, VWs can be automatically scaled in or out based on service peaks, enabling flexible usage and cost savings.
Querying and Analyzing Top SQL	Describes the Top SQL monitoring function of DWS. The function is used to locate slow SQL queries in the database, detect job queues affecting CPU, disk, and I/O resources, and give suggestions on optimizing or manually intervening these SQL tasks to enhance cluster performance.

Security Management

**Table 9** DWS security management
Document	Description
DWS Security Best Practices	Provides guidance for enhancing the overall security of your service data when using DWS.

Next topic: Import and Export

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.