Features
DWS provides various methods to access the service, such as the console, client, and REST APIs. This section describes the main functions of DWS.
Enterprise-Level Data Warehouses and Compatibility with Standard SQL
After a DWS cluster is created, you can use the SQL client to connect to the cluster and perform operations such as creating a database, managing the database, importing and exporting data, and querying data.
DWS provides high-performance databases that can handle petabytes of data, with the following features:
- MPP computing framework, hybrid row-column storage, and vectorized execution, enabling response to billion-level data correlation analysis within seconds
- Optimized in-memory computing based on Hash Join of Bloom Filter, improving the performance by 2 to 10 times
- Supports the symmetrically distributed, active-active multi-node cluster architecture, ensuring no SPOFs.
- Optimized communication between large-scale clusters based on telecommunication technologies, improving data transmission efficiency between compute nodes
- Cost-based intelligent optimizers, helping generate the optimal plan based on the cluster scale and data volume to improve execution efficiency
DWS has comprehensive SQL capabilities:
- Supports ANSI/ISO SQL 92, SQL99, and SQL 2003 standards, stored procedures, GBK and UTF-8 character sets, and SQL standard functions and OLAP analysis functions.
- Compatible with the PostgreSQL/Oracle/Teradata/MySQL ecosystem and supports interconnection with mainstream database ETL and BI tools provided by third-party vendors.
- Supports roaring bitmaps and common functions used with them, which are widely used for user feature extraction, user profiling, and more applications in the Internet, retail, education, and gaming industries.
- List partitioning (PARTITION BY LIST (partition_key,[...])) and range partitioning are supported.
- Read-only HDFS and OBS foreign tables in JSON file format are supported.
- Permissions on system catalogs can be granted to common users. The VACUUM permission can be granted separately. Roles with predefined, extensible permissions are supported, including:
- ALTER, DROP, and VACUUM permissions at the table level.
- ALTER and DROP permissions at the schema level.
- Preset roles role_signal_backend and role_read_all_stats
For details about the SQL syntax and database operation guidance, see the Data Warehouse Service Developer Guide.
Cluster Management
A DWS cluster contains nodes of the same flavor in the same subnet. These nodes jointly provide services. DWS offers a professional, efficient, and centralized management console that enables you to quickly request clusters, manage data warehouses with ease, and concentrate on data and services.
Main functions of cluster management are described as follows:
- Creating Clusters
To use data warehouse services on the cloud, create a DWS cluster first. You can select product and node specifications to quickly create a cluster. You can also purchase a yearly/monthly package to create a cluster.
- Managing Snapshots
A snapshot is a complete backup that records point-in-time configuration data and service data of a DWS cluster. A snapshot can be used to restore a cluster at a certain time. You can manually create snapshots for a cluster or enable automated snapshot creation (periodic). Automated snapshots have a limited retention period. You can copy automatic snapshots for long-term retention.
When you restore a cluster from a snapshot, the system can restore the snapshot data to a new cluster or the original cluster.
You can delete snapshots that are no longer needed on the console to release storage space. Automated snapshots cannot be manually deleted.
- Managing nodes
You can check the nodes in a cluster, including the status, specifications, and usage of each node. To prepare for a large scale-out, you can add nodes in batches. To add 180 nodes, add them in three batches of 60 nodes each. If any nodes fail to be added, retry adding them. Once all 180 nodes are added, use them for scaling out. Adding nodes will not interrupt cluster services.
- Scaling out clusters
As the service volume increases, the current scale of a cluster may not meet service requirements. In this case, you can scale out the cluster by adding compute nodes to it. Services are not interrupted during the scale-out. You can enable online scale-out and automatic redistribution if necessary.
- Managing redistribution
By default, redistribution is automatically started after cluster scale-out. For enhanced reliability, disable the automatic redistribution function and manually start a redistribution task after the scale-out is successful. Data redistribution can accelerate service response. Currently, DWS supports offline redistribution (default mode) and online redistribution.
- Storage space scaling
As customer services evolve, disk space often becomes the initial bottleneck. In scenarios where other resources are ample, the conventional scale-out process is not only time-consuming but also resource-inefficient. Disk capacity expansion can quickly increase storage without service interruption. You can expand the disk capacity when no other services are running. If the disk space is insufficient after the expansion, you can continue to expand the disk capacity. If the expansion fails, you can expand the disk capacity again.
- Resource management
When multiple database users query jobs at the same time, some complex queries may occupy cluster resources for a long time, affecting the performance of other queries. For example, a group of database users continuously submit complex and time-consuming queries, while another group of users frequently submit short queries. In this case, short queries may have to wait in the queue for the time-consuming queries to complete. To improve efficiency, you can use the DWS resource management function to handle such problems. You can create different resource pools for different types of services, and configure different resource ratios for these pools. Then, add database users to the corresponding pools to restrict their resource usages.
- Logical cluster
A physical cluster can be divided into logical clusters that use the node-group mechanism. Tables in a database can be allocated to different physical nodes by logical cluster. A logical cluster can contain tables from multiple databases.
- Restarting clusters
Restarting a cluster may cause data loss in running services. If you have to restart a cluster, ensure that there is no running service and all data has been saved.
- Deleting Clusters
You can delete a cluster when you do not need it. Deleting a cluster is risky and may cause data loss. Therefore, exercise caution when performing this operation.
DWS allows you to manage clusters in either of the following ways:
- Management console
Use the management console to access DWS clusters. When you have registered an account, log in to the management console and choose Data Warehouse Service.
For more information about cluster management, see Managing Clusters.
- REST APIs
Use REST APIs provided by DWS to manage clusters. In addition, if you need to integrate DWS into a third-party system for secondary development, use APIs to access the service.
For details, see Data Warehouse Service API Reference.
Diverse Data Import Modes
DWS supports efficient data import from multiple data sources. The following lists typical data import modes. For details, see Data Migration to DWS.
- Importing data from OBS in parallel
- Using GDS to import data from a remote server
- Importing data from MRS to a data warehouse cluster
- Importing data from one DWS cluster to another
- Using the gsql meta-command \COPY to import data
- Running the COPY FROM STDIN statement to import data
- Using DLI to import data to DWS
- Migrating data to DWS using CDM
- Using Database Schema Convertor (DSC) to migrate SQL scripts
- Using gs_dump and gs_dumpall to export metadata
- Using gs_restore to import data
APIs
You can call standard APIs, such as JDBC and ODBC, to access databases in DWS clusters.
For details, see Using JDBC to Connect to a Cluster and Using ODBC to Connect to a Cluster.
High Reliability
- Supports instance and data redundancy, ensuring zero single points of failure (SPOF) in the entire system.
- Supports multiple data backups, and all data can be manually backed up to OBS.
- Automatically isolates the faulty node, uses the backup to restore data, and replaces the faulty node when necessary.
- Automatic snapshots work with OBS to implement intra-region disaster recovery (DR). If the production cluster fails to provide read and write services due to natural disasters in the specified region or cluster internal faults, the DR cluster becomes the production cluster to ensure service continuity.
- In the Unbalanced state, the number of primary instances on some nodes increases. As a result, the load pressure is high. In this case, you can perform a primary/standby switchback for the cluster during off-peak hours to improve performance.
- If the internal IP address or EIP of a CN is used to connect to a cluster, the failure of this CN will lead to cluster connection failure. To avoid single-CN failures, DWS uses Elastic Load Balance (ELB). An ELB distributes access traffic to multiple ECSs for traffic control based on forwarding policies. It improves the fault tolerance capability of application programs.
- After a cluster is created, the number of required CNs varies with service requirements. DWS allows you to add or delete CNs as needed.
Security Management
- Isolates tenants and controls access permissions to protect the privacy and data security of systems and users based on the network isolation and security group rules, as well as security hardening measures.
- Supports SSL network connections, user permission management, and password management, ensuring data security at the network, management, application, and system layers.
For details, see "Establishing Secure TCP/IP Connections in SSL Mode" and "Enabling Separation of Duties for DWS Database Users".
Monitoring and Auditing
- Monitoring Clusters
DWS integrates with Cloud Eye, allowing you to monitor compute nodes and databases in the cluster in real time. For details, see Monitoring Clusters Using Cloud Eye.
- Database Monitoring
DMS is provided by DWS to ensure the fast and stable running of databases. It collects, monitors, and analyzes the disk, network, and OS metric data used by the service database, as well as key performance metric data of cluster running. It also diagnoses database hosts, instances, and service SQL statements based on the collected metrics to expose key faults and performance problems in a database in a timely manner, and guides customers to optimize and resolve the problems. For details, see Database Monitoring (DMS).
- Alarms
You can check and configure alarm rules and subscribe to alarm notifications. Alarm rules display alarm statistics and details of the past week for users to view tenant alarms. This feature monitors common DWS alarms with pre-set rules and allows users to customize the alarm thresholds based on their service needs. For details, see Alarms.
- Notifying Events
DWS interconnects with Simple Message Notification (SMN) so that you can subscribe to events and view events that are triggered. For details, see Event Notifications.
- Audit Logs
- DWS can be integrated with Cloud Trace Service (CTS) to audit management console operations and API calls. For details, see Viewing Audit Logs of Key Operations on the Management Console.
- DWS records all SQL operations, including connection attempts, query attempts, and database changes. For details, see Viewing Database Audit Logs.
Multiple Database Tools
DWS provides the following self-developed tools. You can download the tool packages on the DWS console. For how to use the tools, see the Data Warehouse Service (DWS) Tool Guide.
- SQL editor
The DWS SQL editor provides one-stop data development, ingestion, and processing functions. With the editor, you can connect to a cluster database from the DWS console to edit and execute SQL statements.
- gsql
gsql is a CLI SQL client tool running on the Linux OS. It helps connect to, operate, and maintain the database in a DWS cluster.
- Data Studio
Data Studio is a SQL client tool with a Graphical User Interface (GUI) that runs on Windows. It is utilized to connect to databases in a DWS cluster, manage database objects, edit, run, and debug SQL scripts, and view execution plans.
- GDS
GDS is a data service tool offered by DWS that utilizes the foreign table mechanism to achieve fast data import and export.
The GDS tool package needs to be installed on the server where the data source file is located. This server is called the data server or the GDS server.
- DSC SQL syntax migration tool
The DSC is a CLI tool running on the Linux or Windows OS. It is dedicated to providing customers with simple, fast, and reliable application SQL script migration services. It parses the SQL scripts of source database applications using the built-in syntax migration logic, and converts them to SQL scripts applicable to DWS databases.
The DSC can migrate SQL scripts of Teradata, Oracle, Netezza, MySQL, and DB2 databases.
- gs_dump and gs_dumpall
gs_dump exports a single database or its objects. gs_dumpall exports all databases or global objects in a cluster.
To migrate database information, you can use a tool to import the exported metadata to a target database.
- gs_restore
During database migration, you can export files using gs_dump tool and import them to DWS by using gs_restore. In this way, metadata, such as table definitions and database object definitions, can be imported.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot