Halaman ini belum tersedia dalam bahasa lokal Anda. Kami berusaha keras untuk menambahkan lebih banyak versi bahasa. Terima kasih atas dukungan Anda.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Installing, Configuring, and Starting GDS

Updated on 2023-03-17 GMT+08:00

Scenario

GaussDB(DWS) uses GDS to allocate the source data for parallel data import. Deploy GDS on the data server.

If a large volume of data is stored on multiple data servers, install, configure, and start GDS on each server. Then, data on all the servers can be imported in parallel. The procedure for installing, configuring, and starting GDS is the same on each data server. This section describes how to perform this procedure on one data server.

Context

  1. The GDS version must match the cluster version. For example, GDS V100R008C00 matches DWS 1.3.X. Otherwise, the import or export may fail, or the import or export process may fail to respond.

    Therefore, use the latest version of GDS. After the database is upgraded, download the latest version of GaussDB(DWS) GDS as instructed in Procedure. When the import or export starts, GaussDB(DWS) checks the GDS versions. If the versions do not match, an error message is displayed and the import or export is terminated.

    To obtain the version number of GDS, run the following command in the GDS decompression directory:

    gds -V

    To view the database version, run the following SQL statement after connecting to the database:

    1
    SELECT version();
    

Procedure

  1. For details about how to import or export data using GDS, see "Tutorial: Using GDS to Import Data > Step 1: Preparing an ECS as the GDS Server" in the Data Warehouse Service User Guide.
  2. Log in as user root to the data server where GDS is to be installed and run the following command to create the directory for storing the GDS package:

    mkdir -p /opt/bin/dws

  3. Upload the GDS package to the created directory.

    Use the SUSE Linux package as an example. Upload the GDS package dws_client_8.1.x_suse_x64.zip to the directory created in the previous step.

  4. (Optional) If SSL is used, upload the SSL certificates to the directory created in 2.
  5. Go to the directory and decompress the package.

    cd /opt/bin/dws
    unzip dws_client_8.1.x_suse_x64.zip

  6. Create a GDS user and the user group to which the user belongs. This user is used to start GDS and read source data.

    groupadd gdsgrp
    useradd -g gdsgrp gds_user

  7. Change the owner of the GDS package directory and source data file directory to the GDS user.

    chown -R gds_user:gdsgrp /opt/bin/dws/gds
    chown -R gds_user:gdsgrp /input_data 

  8. Switch to user gds_user.

    su - gds_user

    If the current cluster version is 8.0.x or earlier, skip 9 and go to 10.

    If the current cluster version is 8.1.x, go to the next step.

  9. Execute the script on which the environment depends (applicable only to 8.1.x).

    cd /opt/bin/dws/gds/bin
    source gds_env

  10. Start GDS.

    GDS is green software and can be started after being decompressed. There are two ways to start GDS. One is to run the gds command to configure startup parameters. The other is to write the startup parameters into the gds.conf configuration file and run the gds_ctl.py command to start GDS.

    The first method is recommended when you do not need to import data again. The second method is recommended when you need to import data regularly.
    • Method 1: Run the gds command to start GDS.
      • If data is transmitted in non-SSL mode, run the following command to start GDS:
        gds -d dir -p ip:port -H address_string -l log_file -D -t worker_num

        Example:

        /opt/bin/dws/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D -t 2
      • If data is transmitted in SSL mode, run the following command to start GDS:
        gds -d dir -p ip:port -H address_string -l log_file -D 
        -t worker_num --enable-ssl --ssl-dir Cert_file

        Example:

        Run the following command to upload the SSL certificate mentioned in 4 to /opt/bin:
        /opt/bin/dws/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D --enable-ssl --ssl-dir /opt/bin/

      Replace the information in italic as required.

      • -d dir: directory for storing data files that contain data to be imported. This tutorial uses /input_data/ as an example.
      • -p ip:port: listening IP address and port for GDS. The default value is 127.0.0.1. Replace it with the IP address of a 10GE network that can communicate with GaussDB(DWS). The port number ranges from 1024 to 65535. The default port is 8098. This tutorial uses 192.168.0.90:5000 as an example.
      • -H address_string: specifies the hosts that are allowed to connect to and use GDS. The value must be in CIDR format. Configure this parameter to enable a GaussDB(DWS) cluster to access GDS for data import. Ensure that the network segment covers all hosts in a GaussDB(DWS) cluster.
      • -l log_file: GDS log directory and log file name. This tutorial uses /opt/bin/dws/gds/gds_log.txt as an example.
      • -D: GDS in daemon mode. This parameter is used only in Linux.
      • -t worker_num: number of concurrent GDS threads. If the data server and GaussDB(DWS) have available I/O resources, you can increase the number of concurrent GDS threads.

        GDS determines the number of threads based on the number of concurrent import transactions. Even if multi-thread import is configured before GDS startup, the import of a single transaction will not be accelerated. By default, an INSERT statement is an import transaction.

      • --enable-ssl: enables SSL for data transmission.
      • --ssl-dir Cert_file: SSL certificate directory. Set this parameter to the certificate directory in 4.
      • For details about GDS parameters, see "GDS - Parallel Data Loader > gds" in the Data Warehouse Service (DWS) Tool Guide.
    • Method 2: Write the startup parameters into the gds.conf configuration file and run the gds_ctl.py command to start GDS.
      1. Run the following command to go to the config directory of the GDS package and modify the gds.conf configuration file. For details about the parameters in the gds.conf configuration file, see Table 1.
        vim /opt/bin/dws/gds/config/gds.conf

        Example:

        The gds.conf configuration file contains the following information:

        <?xml version="1.0"?>
        <config>
        <gds name="gds1" ip="192.168.0.90" port="5000" data_dir="/input_data/" err_dir="/err" data_seg="100MB" err_seg="100MB" log_file="/log/gds_log.txt" host="10.10.0.1/24" daemon='true' recursive="true" parallel="32"></gds>
        </config>

        Information in the configuration file is described as follows:

        • The data server IP address is 192.168.0.90 and the GDS listening port is 5000.
        • Data files are stored in the /input_data/ directory.
        • Error log files are stored in the /err directory. The directory must be created by a user who has the GDS read and write permissions.
        • The size of a single data file is 100 MB.
        • The size of a single error log file is 100 MB.
        • Logs are stored in the /log/gds_log.txt file. The directory must be created by a user who has the GDS read and write permissions.
        • Only nodes with the IP address 10.10.0.* can be connected.
        • The GDS process is running in daemon mode.
        • Recursive data file directories are used.
        • The number of concurrent import threads is 2.
      2. Start GDS and check whether it has been started.
        python3 gds_ctl.py start

        Example:

        cd /opt/bin/dws/gds/bin
        python3 gds_ctl.py start
        Start GDS gds1                  [OK]
        gds [options]:
         -d dir            Set data directory.
         -p port           Set GDS listening port.
            ip:port        Set GDS listening ip address and port.
         -l log_file       Set log file.
         -H secure_ip_range
                           Set secure IP checklist in CIDR notation. Required for GDS to start.
         -e dir            Set error log directory.
         -E size           Set size of per error log segment.(0 < size < 1TB)
         -S size           Set size of data segment.(1MB < size < 100TB)
         -t worker_num     Set number of worker thread in multi-thread mode, the upper limit is 32. If without setting, the default value is 1.
         -s status_file    Enable GDS status report.
         -D                Run the GDS as a daemon process.
         -r                Read the working directory recursively.
         -h                Display usage.

gds.conf Parameter Description

Table 1 gds.conf configuration description

Attribute

Description

Value Range

name

Identifier

-

ip

Listening IP address

The IP address must be valid.

Default value: 127.0.0.1

port

Listening port

Value range: 1024 to 65535 (integer)

Default value: 8098

data_dir

Data file directory

-

err_dir

Error log file directory

Default value: data file directory

log_file

Log file Path

-

host

Host IP address allowed to be connected to GDS (The value must in CIDR format and this parameter is available for the Linux OS only.)

-

recursive

Whether the data file directories are recursive

Value range:

  • true: recursive
  • false: not recursive

Default value: false

daemon

Whether the process is running in daemon mode

Value range:

  • true: The process is running in daemon mode.
  • false: The process is not running in daemon mode.

Default value: false

parallel

Number of concurrent data import threads

Value range: 0 to 32 (integer)

Default value: 1

Kami menggunakan cookie untuk meningkatkan kualitas situs kami dan pengalaman Anda. Dengan melanjutkan penelusuran di situs kami berarti Anda menerima kebijakan cookie kami. Cari tahu selengkapnya

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback