Help Center > > Developer Guide> Data Import> Using GDS to Import Data from a Remote Server> Installing, Configuring, and Starting GDS

Installing, Configuring, and Starting GDS

Updated at: Jul 15, 2020 GMT+08:00

Scenarios

DWS uses GDS to allocate the source data for parallel data import. Deploy GDS on the data server.

If a large volume of data is stored on multiple servers, deploy, configure, and start GDS on each server. Then, data on all the servers can be imported in parallel. The procedure of installing, configuring, and starting GDS is the same on each data server. This section describes how to perform this procedure on one data server.

Context

  1. GDS can be installed in the following OSs:

    Kunpeng:

    • Community Enterprise Operating System 7.6
    • EulerOS 2.0 SP8
    • Red Hat Enterprise Linux Server release 7.5
    • NeoKylin 7.5/7.6
    x86:
    • SUSE Linux Enterprise Server 10 SP4 x86_64
    • SUSE Linux Enterprise Server 11 SP1/SP2/SP3/SP4 x86_64
    • SUSE Linux Enterprise Server 12 SP0/SP1/SP2/SP3 x86_64
    • Red Hat Enterprise Linux Server release 6.4/6.5/6.6/6.7/6.8/6.9/7.0/7.1/7.2/7.3/7.4/7.5 x86_64
    • Community Enterprise Operating System 6.4/6.5/6.6/6.7/6.8/6.9/7.0/7.1/7.2/7.3/7.4 x86_64
  2. The GDS version must match the cluster version. For example, GDS V100R008C00 matches DWS 1.3.X. Otherwise, the import or export may fail, or the import or export process may fail to respond.

    Therefore, do not use the GDS of an earlier version. After the database is upgraded, download the new version of GDS as instructed in Procedure. When the import or export starts, DWS checks the GDS version. If it does not match the DWS version, DWS displays an error message and terminates the import or export.

    To obtain the version number of GDS, run the following command in the GDS decompression directory:

    gds -V

    To view the database version, run the following SQL statement after connecting to the database:

    1
    SELECT version();
    

Procedure

  1. Before using GDS to import or export data, perform the "Preparing an ECS as the GDS Data Server" and "Downloading the GDS Package and SSL Certificate" operations in Step 1: Preparing an ECS as the GDS Server.
  2. Log in as user root to the data server where GDS is to be installed and run the following command to create the directory for storing the GDS package:

    mkdir -p /opt/bin/dws

  3. Upload the GDS package to the created directory.

    Use the SUSE Linux package as an example. Upload the GDS package dws_client_redhat_x64.tar.gz to the directory created in the previous step.

  4. (Optional) If SSL is used, upload the SSL certificate to the directory created in 2.

  5. Go to the new directory and decompress the package.

    cd /opt/bin/dws
    tar -zxvf dws_client_redhat_x64.tar.gz

  6. Create a GDS user and the user group to which the user belongs. This user is used to start GDS and read source data.

    groupadd gdsgrp
    useradd -g gdsgrp gds_user

  7. Change the owner of the GDS package directory and source data file directory to the GDS user.

    chown -R gds_user:gdsgrp /opt/bin/dws/gds 
    chown -R gds_user:gdsgrp /input_data 

  8. Switch to user gds_user.

    su - gds_user

  9. Start the GDS.

    GDS is green software and can be started after being decompressed. There are two ways to start GDS.

    Method 1 is recommended when you only need to import data occasionally.

    Method 2 is recommended when you need to import data regularly.

    Method 1: Run the gds command to set startup parameters.
    • Run the gds command to start GDS.
      • If data is transmitted in non-SSL mode, run the following command to start GDS:
        gds -d dir -p ip:port -H address_string -l log_file -D -t worker_num

        Example:

        /opt/bin/dws/gds/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/gds/gds_log.txt -D -t 2
      • If data is transmitted in SSL mode, run the following command to start GDS:
        gds -d dir -p ip:port -H address_string -l log_file -D 
        -t worker_num --enable-ssl --ssl-dir Cert_file

        Example:

        Run the following command to upload the SSL certificate mentioned in 4 to /opt/bin:
        /opt/bin/dws/gds/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D --enable-ssl --ssl-dir /opt/bin/

      Replace the information in italic as required.

      • -d dir: directory storing data files that contain data to be imported. It is /input_data/ in this tutorial.
      • -p ip:port: listening IP address and port for GDS. The default value is 127.0.0.1. Replace it with the IP address of a 10GE network that can communicate with DWS. The listening port can be any one ranging from 1024 to 65535. The default port is 8098. This parameter is set to 192.168.0.90:5000 in this tutorial.
      • -H address_string: network segment for hosts that can connect to and use GDS. The value must be in CIDR format. Set this parameter to enable DWS to access GDS for data import. Ensure that the network segment covers all hosts in DWS.
      • -l log_file: GDS log directory and log file name. It is /opt/bin/dws/gds/gds_log.txt in this tutorial.
      • -D: GDS in daemon mode. This parameter is used only in Linux.
      • -t worker_num: number of concurrent GDS threads. If the data server and DWS have available I/O resources, you can increase the number of concurrent GDS threads.

        GDS determines the number of threads based on the number of concurrent import transactions. That is, even if multi-thread import is configured before GDS startup, the import of a single transaction will not be accelerated. By default, an INSERT statement is an import transaction.

      • --enable-ssl: enables SSL for data transmission.
      • --ssl-dir Cert_file: SSL certificate directory. Set it to the certificate directory mentioned in 4.
      • For details about GDS parameters, see gds Command Introduction.

      Method 2: Write the startup parameters into the gds.conf configuration file and run the gds_ctl.py command to start the GDS.

    • Run the gds_ctl.py command to start GDS.
      1. Run the following command to go to the config directory of the GDS package and modify the gds.conf configuration file. For details about the parameters in the gds.conf configuration file, see Table 1.
        vim /opt/bin/gds/config/gds.conf

        Example:

        The gds.conf configuration file contains the following information:

        <?xml version="1.0"?>
        <config>
        <gds name="gds1" ip="192.168.0.90" port="5000" data_dir="/input_data/" err_dir="/err" data_seg="100MB" err_seg="100MB" log_file="/log/gds_log.txt" host="10.10.0.1/24" daemon='true' recursive="true" parallel="32"></gds>
        </config>

        Information in the configuration file is described as follows:

        • The data server IP address is 192.168.0.90 and the GDS listening port is 5000.
        • Data files are stored in the /input_data/ directory.
        • Error log files are stored in the /err directory.
        • The size of a single data file is 100 MB.
        • The size of a single error log file is 100 MB.
        • Logs are stored in the /log/gds_log.txt file.
        • Only nodes with the IP address being 10.10.0.* can be connected.
        • The GDS process is running in daemon mode.
        • Recursive data file directories are used.
        • The number of concurrent import threads is 32.
      2. Start the GDS and check whether it has been started:
        python gds_ctl.py start

        Example:

        cd /opt/bin/gds
        python gds_ctl.py start
        Start GDS gds1                  [OK]
        gds [options]:
         -d dir            Set data directory.
         -p port           Set GDS listening port.
            ip:port        Set GDS listening ip address and port.
         -l log_file       Set log file.
         -H secure_ip_range
                           Set secure IP checklist in CIDR notation.                   Required for GDS to start.
         -e dir            Set error log directory.
         -E size           Set size of per error log segment.(0 < si                   ze < 1TB)
         -S size           Set size of data segment.(1MB < size < 10                   0TB)
         -t worker_num     Set number of worker thread in multi-thre                   ad mode, the upper limit is 32. If withou                   t setting, the default value is 1.
         -s status_file    Enable GDS status report.
         -D                Run the GDS as a daemon process.
         -r                Read the working directory recursively.
         -h                Display usage.

gds.conf Parameter Description

Table 1 gds.conf configuration description

Attribute

Description

Value Range

name

Identifier

-

ip

Listening IP address

The IP address must be valid.

Default value: 127.0.0.1

port

Listening port number

Value range: an integer ranging from 1024 to 65535

Default value: 8098

data_dir

Data file directory

-

err_dir

Error log file directory

Default value: data file directory

log_file

Log file path

-

host

Sets the host allowed to be connected to GDS in CIDR format. Only a Linux OS is supported.

-

recursive

Whether the data file directories are recursive

Valid value:

  • true: recursive
  • false: not recursive

Default value: false

daemon

Whether the process is running in the daemon mode.

Valid value:

  • true: The process is running in daemon mode.
  • false: The process is not running in daemon mode.

Default value: false

parallel

Number of concurrent data import threads

Value range: 0 to 32 (integer)

Default value: 32

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel