Help Center > > Developer Guide

Importing and Exporting Data Through GDS Pipe Files

Updated at: Jul 14, 2021 GMT+08:00
  • The current version does not support data import and export through GDS in SSL mode. Do not use GDS in SSL mode.
  • All pipe files mentioned in this section refer to named pipes on Linux.

In the current GDS version, data can be imported from a pipe or exported from a database to a pipe. This function makes the import and export through GDS more flexible.

  • When the local disk space of the GDS user is insufficient:
    • Data in HDFS can be directly written to a pipe without occupying extra disk space.
    • The data exported from GDS is compressed using the pipe to occupy less disk space.
    • The exported data is transferred through the pipe to the HDFS server for storage.
  • If you need to cleanse data before importing and exporting data:
    • You can compile a program based on your requirements, so that you can write the data to a pipe in streaming real time, or process the content read from the pipe in streaming real time. In this way, you can cleanse the imported and exported data.

Precautions

  • GDS supports concurrent import and export. The gds -t parameter is used to set the size of the thread pool and control the maximum number of concurrent working threads. But it does not accelerate a single SQL task. The default value of gds -t is 8, and the upper limit is 200. When using the pipe function to import and export data, ensure that the value of -t is greater than or equal to the number of concurrent services. In the dual-cluster interconnection scenario, the value of -t must be greater than or equal to twice the number of concurrent services.
  • Data in pipes is deleted once read. Therefore, ensure that no other program except GDS reads data in the pipe during import or export. Otherwise, data may be lost, task errors may occur, or the exported files may be disordered.
  • Concurrent import and export of foreign tables with the same location are not supported. That is, multiple threads of GDS cannot read or write pipe files at the same time.
  • A single import or export task of GDS identifies only one pipe. Therefore, do not carry wildcard characters ({}[]?) in the location address set for the GDS foreign table. Example:
    CREATE FOREIGN TABLE foreign_test_pipe_tr( like test_pipe ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://127.0.0.1:7789/foreign_test_*', FORMAT 'text', DELIMITER ',',  NULL '', EOL '0x0a' ,file_type 'pipe',auto_create_pipe 'false');
  • When the -r recursion parameter is enabled for GDS, only one pipe can be identified. That is, GDS identifies only one pipe in the current data directory and does not recursively search for it. Therefore, the -r parameter does not take effect in the pipe import and export scenarios.
  • CN retry is not supported during the import and export through a pipe, because GDS cannot control the operations performed by peer users and programs on pipes.
  • During the import, if the peer program does not write data into the pipe for more than one hour, the import task times out and an error is reported.
  • During the export, if the peer program does not read data from the pipe for more than one hour by default, the export task times out and an error is reported.
  • Ensure that the GDS version and kernel version support the function of importing and exporting data through pipes.
  • If the auto_create_pipe parameter of the foreign table is set to true, a delay may occur when GDS automatically creates a pipe. Before any operation on a pipe, check whether the automatically created pipe exists and whether it is a pipe file.
  • Once an import or export task through a GDS pipe is complete, the pipe is automatically deleted. However, the pipe deletion is delayed, if you manually terminate an import or export task. In this situation, the pipe is deleted after the timeout interval expires.

Exporting Data Through GDS Pipe Files

Example:

  1. Start the GDS.

    gds -d /***/gds_data/ -D -p 127.0.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D

    If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.

  2. Export data.

    1. Log in to the database, create an internal table, and write data to the table.
      CREATE TABLE test_pipe( id integer not null, sex text not null, name text ) ;
      CREATE TABLE
      
      INSERT INTO test_pipe values(1,2,'11111111111111');
      INSERT INTO test_pipe values(2,2,'11111111111111');
      INSERT INTO test_pipe values(3,2,'11111111111111');
      INSERT INTO test_pipe values(4,2,'11111111111111');
      INSERT 0 1
    2. Create a write-only foreign table.
      CREATE FOREIGN TABLE foreign_test_pipe_tw( id integer not null, age text not null, name  text ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://127.0.0.1:7789/', FORMAT 'text', DELIMITER ',',  NULL '', EOL '0x0a' ,file_type 'pipe', auto_create_pipe 'false') WRITE ONLY;
    3. Execute the export statement. In this case, the statements are blocked.
      INSERT INTO foreign_test_pipe_tw select * from test_pipe; 

  3. Export data through the GDS pipes.

    1. Log in to GDS and go to the GDS data directory.
      cd /***/gds_data/  
    2. Create a pipe. If auto_create_pipe is set to true, skip this step.
      mkfifo postgres_public_foreign_test_pipe_tw.pipe 
    3. Read data from the pipe and write it to a new file.
      cat postgres_public_foreign_test_pipe_tw.pipe > postgres_public_foreign_test_pipe_tw.txt
    4. To compress the exported files, run the following command:
      gzip -9 -c < postgres_public_foreign_test_pipe_tw.pipe  > out.gz 
    5. To export the content from the pipe to the HDFS server, run the following command:
      cat postgres_public_foreign_test_pipe_tw.pipe  | hdfs dfs -put -  /user/hive/***/test_pipe.txt

  4. Verify the exported data.

    1. Check whether the exported file is correct.
      cat postgres_public_foreign_test_pipe_tw.txt
      3,2,11111111111111
      1,2,11111111111111
      2,2,11111111111111
      4,2,11111111111111
    2. View the compressed file.
      vim out.gz
      3,2,11111111111111
      1,2,11111111111111
      2,2,11111111111111
      4,2,11111111111111
    3. View the data exported to the HDFS server.
      hdfs dfs -cat /user/hive/***/test_pipe.txt
      3,2,11111111111111
      1,2,11111111111111
      2,2,11111111111111
      4,2,11111111111111

Importing Data Through GDS Pipe Files

Example:

  1. Start the GDS.

    gds -d /***/gds_data/ -D -p 127.0.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D

    If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.

  2. Import data.

    1. Log in to the database and create an internal table.
      CREATE TABLE test_pipe_1( id integer not null, sex text not null, name  text );
    2. Create a read-only foreign table.
      CREATE FOREIGN TABLE foreign_test_pipe_tr( like test_pipe ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://127.0.0.1:7789/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',',  NULL '', EOL '0x0a' ,file_type 'pipe',auto_create_pipe 'false');
    3. Execute the import statement. The statement is blocked.
      INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr;

  3. Import data through the GDS pipes.

    1. Log in to the GDS server and go to the GDS data directory.
      cd /***/gds_data/   
    2. Create a pipe. If auto_create_pipe is set to true, skip this step.
      mkfifo foreign_test_pipe.pipe;  
    3. Write data to the pipe.
      cat postgres_public_foreign_test_pipe_tw.txt > foreign_test_pipe_.pipe
    4. To read the compressed file to the pipe, run the following command:
      gzip -d < out.gz > foreign_test_pipe.pipe
    5. To read the HDFS file to the pipe, run the following command:
      hdfs dfs -cat - /user/hive/***/test_pipe.txt > foreign_test_pipe.pipe

  4. View the result returned by the import statement.

    INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr;
    INSERT 0 4
    SELECT * FROM test_pipe_1;
    id | sex |      name
    ----+-----+----------------
    3 | 2   | 11111111111111
    1 | 2   | 11111111111111
    2 | 2   | 11111111111111
    4 | 2   | 11111111111111
    (4 rows)
    

Exporting Data Through Multi-Process Pipes

GDS also supports importing and exporting data through multi-process pipes. That is, one foreign table corresponds to multiple GDSs.

The following takes exporting a local file as an example.

  1. Start multiple GDSs.

    gds -d /***/gds_data/ -D -p 127.0.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D
    gds -d /***/gds_data_1/ -D -p 127.0.0.1:7790 -l /***/gds_log/aa.log -H 0/0 -t 10 -D

    If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.

  2. Export data.

    1. Log in to the database and create an internal table.
      CREATE TABLE test_pipe (id integer not null, sex text not null, name  text);
    2. Write data.
      INSERT INTO test_pipe values(1,2,'11111111111111');
      INSERT INTO test_pipe values(2,2,'11111111111111');
      INSERT INTO test_pipe values(3,2,'11111111111111');
      INSERT INTO test_pipe values(4,2,'11111111111111');
    3. Create a write-only foreign table.
      CREATE FOREIGN TABLE foreign_test_pipe_tw( id integer not null, age text not null, name  text ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://127.0.0.1:7789/|gsfs://127.0.0.1:7790/', FORMAT 'text', DELIMITER ',',  NULL '', EOL '0x0a' ,file_type 'pipe', auto_create_pipe 'false') WRITE ONLY;
    4. Execute the export statement. In this case, the statements are blocked.
      INSERT INTO foreign_test_pipe_tw select * from test_pipe; 

  3. Export data through the GDS pipes.

    1. Log in to GDS and go to each GDS data directory.
      cd /***/gds_data/ 
      cd /***/gds_data_1/ 
    2. Create a pipe. If auto_create_pipe is set to true, skip this step.
      mkfifo postgres_public_foreign_test_pipe_tw.pipe 
    3. Read each pipe and write the new file to the pipes.
      cat postgres_public_foreign_test_pipe_tw.pipe > postgres_public_foreign_test_pipe_tw.txt

  4. Verify the exported data.

    cat /***/gds_data/postgres_public_foreign_test_pipe_tw.txt
    3,2,11111111111111
    cat /***/gds_data_1/postgres_public_foreign_test_pipe_tw.txt
    1,2,11111111111111
    2,2,11111111111111
    4,2,11111111111111

Importing Data Through Multi-Process Pipes

GDS also supports importing data through multi-process pipes. That is, one foreign table corresponds to multiple GDSs.

The following takes importing a local file as an example.

  1. Start multiple GDSs. If the GDSs have been started, skip this step.

    gds -d /***/gds_data/ -D -p 127.0.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D
    gds -d /***/gds_data_1/ -D -p 127.0.0.1:7790 -l /***/gds_log_1/aa.log -H 0/0 -t 10 -D

    If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.

  2. Import data.

    1. Log in to the database and create an internal table.
      CREATE TABLE test_pipe( id integer not null, sex text not null, name  text );
    2. Create a read-only foreign table.
      CREATE FOREIGN TABLE foreign_test_pipe_tr( like test_pipe ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://127.0.0.1:7789/foreign_test_pipe.pipe|gsfs://127.0.0.1:7790/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' , file_type 'pipe', auto_create_pipe 'false');
    3. Execute the import statement. The statement is blocked.
      INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr;

  3. Import data through the GDS pipes.

    1. Log in to GDS and go to each GDS data directory.
      cd /***/gds_data/ 
      cd /***/gds_data_1/
    2. Create a pipe. If auto_create_pipe is set to true, skip this step.
      mkfifo foreign_test_pipe.pipe;  
    3. Read each pipe and write the new file to the pipes.
      cat postgres_public_foreign_test_pipe_tw.txt > foreign_test_pipe.pipe

  4. View the result returned by the import statement.

    INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr;
    INSERT 0 4
    SELECT * FROM test_pipe_1;
    id | sex |      name
    ----+-----+----------------
    3 | 2   | 11111111111111
    1 | 2   | 11111111111111
    2 | 2   | 11111111111111
    4 | 2   | 11111111111111
    (4 rows)

Direct Data Import and Export Between Clusters

  1. Start the GDS. (If the process has been started, skip this step.)

    gds -d /***/gds_data/ -D -p GDS_IP:GDS_PORT -l /***/gds_log/aa.log -H 0/0 -t 10 -D

    If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.

  2. Export data from the source database.

    1. Log in to the target database, create an internal table, and write data to the table.
      CREATE TABLE test_pipe( id integer not null, sex text not null, name  text );
      INSERT INTO test_pipe values(1,2,'11111111111111');
      INSERT INTO test_pipe values(2,2,'11111111111111');
      INSERT INTO test_pipe values(3,2,'11111111111111');
      INSERT INTO test_pipe values(4,2,'11111111111111');
      INSERT INTO test_pipe values(5,2,'11111111111111');
    2. Create a write-only foreign table.
      CREATE FOREIGN TABLE foreign_test_pipe( id integer not null, age text not null, name  text ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe') WRITE ONLY;
    3. Execute the import statement. The statement is blocked.
      INSERT INTO foreign_test_pipe SELECT * FROM test_pipe;

  3. Import data to the target cluster.

    1. Create an internal table.
      CREATE TABLE test_pipe (id integer not null, sex text not null, name text);
    2. Create a read-only foreign table.
      CREATE FOREIGN TABLE foreign_test_pipe(like test_pipe) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' , file_type 'pipe', auto_create_pipe 'false');
    3. Run the following command to import data to the table:
      INSERT INTO test_pipe SELECT * FROM foreign_test_pipe;

  4. View the result returned by the import statement from the target cluster.

    SELECT * FROM test_pipe;
     id | sex |      name
    ----+-----+----------------
      3 | 2   | 11111111111111
      6 | 2   | 11111111111111
      7 | 2   | 11111111111111
      1 | 2   | 11111111111111
      2 | 2   | 11111111111111
      4 | 2   | 11111111111111
      5 | 2   | 11111111111111
      8 | 2   | 11111111111111
      9 | 2   | 11111111111111
    (9 rows)

By default, the pipeline file exported from or imported to GDS is named in the format of Database name_Schema name_Foreign table name .pipe. Therefore, the database name and schema name of the target cluster must be the same as those of the source cluster. If the database or schema is inconsistent, you can specify the same pipe file in the URL of the location.

Example:

  • Pipe name specified by a write-only foreign table.
    CREATE FOREIGN TABLE foreign_test_pipe(id integer not null, age text not null, name  text) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',',  NULL '', EOL '0x0a' ,file_type 'pipe') WRITE ONLY;
  • Pipe name specified by a read-only foreign table.
    CREATE FOREIGN TABLE foreign_test_pipe(like test_pipe) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',',  NULL '', EOL '0x0a' ,file_type 'pipe',auto_create_pipe 'false');

Common Troubleshooting Methods:

  • Issue 1: "/***/postgres_public_foreign_test_pipe_tr.pipe" must be named pipe.

    Locating method: The type of the GDS foreign table file_type is pipe, but the operated file is a common file. Check whether the postgres_public_foreign_test_pipe_tr.pipe file is a pipe file.

  • Issue 2: could not open pipe "/***/postgres_public_foreign_test_pipe_tw.pipe" cause by Permission denied.

    Locating method: GDS does not have the permission to open the pipe file.

  • Issue 3: could not open source file /*****/postgres_public_foreign_test_pipe_tw.pipe because timeout 300s for WRITING.

    Locating method: Opening the pipe times out when GDS is used to export data. This is because the pipe is not created within 300 seconds after auto_create_pipe is set to false, or the pipe is created but is not read by any program within 300 seconds.

  • Issue 4: could not open source file /*****/postgres_public_foreign_test_pipe_tw.pipe because timeout 300s for READING.

    Locating method: Opening the pipe times out when GDS is used to export data. This is because the pipe is not created within 300 seconds after auto_create_pipe is set to false, or the pipe is created but is not written by any program within 300 seconds.

  • Issue 5: could not poll writing source pipe file "/****/postgres_public_foreign_test_pipe_tw.pipe" timeout 300s.

    Locating method: If the GDS does not receive any write event on the pipe within 300 seconds during data export, the pipe is not read for more than 300 seconds.

  • Issue 6: could not poll reading source pipe file "/****/postgres_public_foreign_test_pipe_tw.pipe" timeout 300s.

    Locating method: If the GDS does not receive any read event on the pipe within 300 seconds during data import, the pipe is not written for more than 300 seconds.

  • Issue 7: could not open pipe file "/***/postgres_public_foreign_test_pipe_tw.pipe" for "WRITING" with error No such device or address.

    Locating method: It indicates that the /***/postgres_public_foreign_test_pipe_tw.pipe file is not read by any program. As a result, GDS cannot open the pipe file by writing.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel