Updated on 2024-10-14 GMT+08:00

Exporting Data In Parallel

In high-concurrency scenarios, you can use GDS to export a large volume of data from a database to a common file system. To export data in parallel using a foreign table, you must enable the stream operator first.

Overview

Using foreign tables: Data files to export are specified based on the export mode and data formats specified in a foreign table. Data is exported in parallel through multiple DNs from the database to data files, which improves overall data export performance.
  • The CN plans data export tasks and delivers the tasks to DNs. Then the CN is released to process other tasks.
  • The computing capabilities and bandwidths of all the DNs are fully leveraged to export data.
    Figure 1 Exporting data using foreign tables

Concepts

  • Data file: A TEXT, CSV, or FIXED file that stores data exported from GaussDB.
  • Foreign table: a table that stores information, such as the format, location, and encoding format of a data file.
  • GDS: a data service tool. To export data, deploy GDS on the server where data files are stored.
  • Table: a table in the database, including row-store tables and column-store tables. Data in data files is exported from these tables.
  • Local mode: Service data in a cluster is exported to hosts in the cluster.
  • Remote mode: Service data in a cluster is exported to hosts outside the cluster.

Exporting a Schema

In GaussDB, data can be exported in local or remote mode.

  • Remote mode: Service data in a cluster is exported to hosts outside the cluster.
    • In this mode, multiple GDSs are used to concurrently export data. One GDS can export data for only one cluster at a time.
    • The data export rate of a GDS that resides on the same intranet as cluster nodes is limited by the network bandwidth. A 10 GE configuration is recommended.
    • Data files in TEXT, CSV, or FIXED format are supported. The size of data in a single row must be less than 1 GB.
  • Local mode: Service data in a cluster is exported to hosts in the cluster. The local mode is dedicated to exporting data from a large number of small files.
    • In this mode, data is evenly divided and stored in specified directories on cluster nodes, occupying the disk space of these cluster nodes.
    • Data files in TEXT, CSV, or FIXED format are supported. The size of data in a single row must be less than 1 GB.

Export Process

Figure 2 Process of parallel data export
Table 1 Process description

Process

Description

Sub-task

Plan data export

Prepare data to export and plan the export path.

For details, see Planning Data Export.

N/A

Check whether the Local mode is selected

Check the export mode specified during foreign table creation to determine whether the Local mode is selected.

N/A

Start GDS

If the Remote mode is selected, install, configure, and start GDS on data servers.

For details, see Installing, Configuring, and Starting GDS.

N/A

Create a foreign table

Create a foreign table to help GDS specify information about a data file. The foreign table stores information, such as the location, format, encoding, and inter-data delimiter of a data file.

For details, see Creating a GDS Foreign Table.

N/A

Export data

After the foreign table is created, run the INSERT statement to efficiently export data to data files.

For details, see Exporting Data.

N/A

Stop GDS

Stop GDS after data is exported.

For details, see Stopping GDS.

N/A