gs_restore for Importing Data

Description

gs_restore, provided by GaussDB, is used to import data that was exported using gs_dump. It can also be used to import files exported by gs_dump.

It has the following functions:

Imports data to the database.
If a database is specified, data is imported to the database. For parallel import, the password for connecting to the database is required. During data import, the generated columns are automatically updated and saved as ordinary columns.

If a PDB is specified in the connection parameters, data is imported to the specified PDB.
- If data is imported to a specified PDB, the -C, --create option is not supported.
- If the specified PDB is in the close state, the import fails.
- If the transaction of importing data to a specified PDB is set to read-only, the import fails.
- In M-compatible mode, do not import or export data between instances with different lower_case_table_names parameter settings. Otherwise, data may be lost.
- The current design of gs_restore is incompatible with separation of duties. In separation of duties, gs_restore can be used only by the initial user. The file to be imported must also be executed by the initial user.
Imports data to an archive file.
If the -l parameter is specified, an archive file containing a brief summary of the data is generated.

gs_restore supports SSL encrypted communication. The method is the same as that of using gsql.

Prerequisites

Before using gs_restore, ensure that the gs_restore version is consistent with the gs_dump version and database version.

Format

gs_restore [OPTION]... FILE

The FILE does not have a short or long option. It is used to specify the location for the archive files.
The dbname or -l option is required as prerequisites. Users cannot enter dbname and -l parameters at the same time.
gs_restore incrementally imports data by default. To prevent data exception caused by consecutive imports, use the -e and -c parameters for each import. -c indicates that the database objects that already exist in the database to be restored are cleared (deleted) before the database objects are rebuilt. -e indicates that if an error occurs when an SQL statement is sent to the database, the system exits. By default, the system continues to import data and displays a series of error information after the import is complete.
If the owner of the schema object has the OPRADMIN system permission, the initial user is required for the import.
During data import, if invalid characters are found and do not need to be transcoded, you can set the database compatibility parameter copy_special_character_version to 'no_error' to import the data. Otherwise, an error is reported but the import is not interrupted.

Parameters

Common parameters:

-d, --dbname=NAME
Connects to the dbname database and imports data to the database.
-f, --file=<FILE_NAME>
Specifies the output file for the generated archive, or the output file in the list specified using -l.

The default is the standard output.

-f cannot be used with -d.
-F, --format=c|d|t
Specifies the format of the archive. The format does not need to be specified because the gs_restore determines the format automatically.

Value range:
- c/custom: The archive form is the customized format in gs_dump.
- d/directory: The archive format is directory.
- t/tar: The archive format is tar.
-l, --list
Lists the formats of the archive. The operation output can be used for the input of the -L option. If filtering parameters, such as -n or -t, are used together with -l, they will restrict the listed items.
-v, --verbose
Specifies the verbose mode.
-V, --version
Prints the gs_restore version and exits.
-?, --help
Displays help information about gs_restore parameters and exits.

Import parameters:

-a, -data-only
Imports only the data, not the schema (data definition). gs_restore incrementally imports data.
-c, --clean
Cleans (deletes) existing database objects in the database to be restored before recreating them. If the target database does not contain the objects involved in the deletion operation, some promptive error information may be displayed.
-C, --create
Before data is imported to a database, CREATE DATABASE is used to create the database. (After this option is specified, the database specified by -d is only used to execute the CREATE DATABASE command, and all data is still imported to the created database.)
- In multi-tenancy scenarios, this option is not supported when gs_restore is used to import data to a PDB.
- This option is not supported in M-compatible databases. You must create a database in M-compatible mode on the target instance, export data from the source instance, and connect to the newly created M-compatible database on the target instance to import data.
-e, --exit-on-error
Exits if an error occurs when you send the SQL statement to the database. If you do not exit, the commands will still be sent and error information will be displayed when the import ends.
-I, --index=NAME
Imports only the definition of the specified index. Multiple indexes can be imported. Enter -I index multiple times to import multiple indexes.

For example:
```
gs_restore -h host_name -p port_number -d testdb -I Index1 -I Index2 backup/MPPDB_backup.tar
```
In this example, Index1 and Index2 will be imported.
-j, --jobs=NUM
Specifies the number of concurrent, the most time-consuming jobs of gs_restore (such as loading data, creating indexes, or creating constraints). This parameter can greatly reduce the time to import a large database to a server running on a multi-processor machine.

Each task may be a process or a thread, which is determined by the OS. Each task is connected to the server separately.

The optimal value for this option depends on the server hardware settings, the client, the network, the number of CPU cores, and disk settings. It is recommended that the parameter be set to the number of CPU cores on the server. In addition, a larger value can also lead to faster import in many cases. However, an overly large value will lead to decreased performance because of thrashing.

This option supports the customized archive format only. The input file must be a regular file (not the pipe file). This parameter can be ignored when you select the script method rather than connecting to a database server. In addition, multiple jobs cannot be used with the --single-transaction option.
- This parameter applies to multi-table, multi-index, and multi-constraint scenarios. In practice, the number of created processes (or threads) is related to the number of tables, indexes, and constraints. The maximum number of concurrent jobs does not exceed the specified number of jobs.
- When gs_restore imports data to a PDB, the optimal value of this parameter depends on the resources allocated to the PDB.
-L, --use-list=<FILE_NAME>
Imports only archive elements that are listed in list-file and imports them in the order that they appear in the file. If filtering parameters, such as -n or -t, are used with -L, they will further limit the items to be imported.

list-file is normally created by editing the output of a previous -l parameter. File lines can be modified or removed, and can also be commented out by placing a semicolon (;) at the beginning of the row.
-n, --schema=NAME
Restores only objects that are listed in schemas.

This parameter can be used with the -t parameter to import a specific table.

Entering -n schemaname multiple times can import multiple schemas.

For example:
```
gs_restore -h host_name -p port_number -d testdb -n sch1 -n sch2 backup/MPPDB_backup.tar
```
In this example, sch1 and sch2 will be imported.
- In M-compatible mode, if a database with templatem is created by running CREATE DATABASE, data is imported by specifying db_name using the -d option; if a database is created by running CREATE DATABASE db_name, data is imported by specifying -n because such database is equivalent to a schema.
- The specified schema name must exist in the archive file that is used as the input. If a schema name that does not exist in the archive file is specified, the import of the schema does not take effect.
- Different from the case in gs_dump, this parameter is fully matched in gs_restore. When the schema name contains uppercase letters, no extra quotation marks are required, for example, -n "Sch1".
- In M-compatible mode, the value of this parameter is affected by the GUC parameter lower_case_table_names. For example, in case-sensitive mode (lower_case_table_names set to 0), the value of this parameter must be case-sensitive. If the value contains uppercase letters, no extra quotation marks are required; in case-insensitive mode (lower_case_table_names set to 1), the value of this parameter must be in lowercase.
-O, --no-owner
Do not output commands to set ownership of objects to match the original database. By default, gs_restore issues the ALTER OWNER or SET SESSION AUTHORIZATION statement to set ownership of created schema elements. Unless the system administrator or the user who has all the objects in the script initially connects to the database. Otherwise, the statement will fail. Any username can be used for the initial connection using -O, and this user will own all the created objects.
-P, --function=NAME(args)
Imports only listed functions. You need to correctly spell the function name and the parameter based on the contents of the dump file in which the function exists.

Entering -P alone means importing all function-name(args) functions in a file. Entering -P with -n means importing the function-name(args) functions in a specified schema. Entering -P multiple times and using -n once means that all imported functions are in the -n schema by default.

You can enter -n schema-name -P 'function-name(args)' multiple times to import functions in specified schemas.

For example:
```
./gs_restore -h host_name -p port_number -d testdb -n test1 -P 'Func1(integer)' -n test2 -P 'Func2(integer)' backup/MPPDB_backup.tar
```
In this example, both Func1 (i integer) in the test1 schema and Func2 (j integer) in the test2 schema will be imported.
-s, --schema-only
Imports only schemas (data definitions), instead of data (table content). The current sequence value will not be imported.
-S, --sysadmin=NAME
Reserved for function extension. The option is not recommended.
-t, --table=NAME
Imports only listed table definitions or data, or both. This parameter can be used with the -n parameter to specify a table object in a schema. When -n is not entered, the default schema is PUBLIC. Entering -n schemaname -t tablename multiple times can import multiple tables in a specified schema.

For example:

Import table1 in the PUBLIC schema.
```
gs_restore -h host_name -p port_number -d testdb -t table1 backup/MPPDB_backup.tar
```
Import test1 in the test1 schema and test2 in the test2 schema.
```
gs_restore -h host_name -p port_number -d testdb -n test1 -t test1 -n test2 -t test2 backup/MPPDB_backup.tar
```
Import table1 in the PUBLIC schema and test1 in the test1 schema.
```
gs_restore -h host_name -p port_number -d testdb -n PUBLIC -t table1 -n test1 -t table1 backup/MPPDB_backup.tar
```
- -t does not support the schema_name.table_name input format. If this format is specified, no error is reported but the setting does not take effect.
- When -t is specified, gs_restore does not import any other database objects that are attached to the selected table. Therefore, there is no guarantee that the results of a specific-table dump can be automatically imported to an empty database.
- -t tablename does not import trigger information from a table.
- Different from the case in gs_dump, this parameter is fully matched in gs_restore. When the table name contains uppercase letters, no extra quotation marks are required, for example, -t "Tbl1".
- In M-compatible mode, the value of this parameter is affected by the GUC parameter lower_case_table_names. For example, in case-sensitive mode (lower_case_table_names set to 0), the value of this parameter must be case-sensitive. If the value contains uppercase letters, no extra quotation marks are required; in case-insensitive mode (lower_case_table_names set to 1), the value of this parameter must be in lowercase.
-T, --trigger=NAME
This parameter is reserved for extension.
-x, --no-privileges/--no-acl
Prevents the import of access permissions (GRANT/REVOKE commands).
-1, --single-transaction
Executes import as a single transaction (that is, commands are wrapped in BEGIN/COMMIT).

This option ensures that either all the commands are completed successfully, or no application is changed. This option means --exit-on-error.
--disable-triggers
Reserved for function extension. The option is not recommended.
--no-data-for-failed-tables
By default, table data will be imported even if the statement to create a table fails (for example, the table exists). Data in such table is skipped using this parameter. This operation is useful if the target database already contains the desired table contents.

This parameter takes effect only when you import data directly to a database, not when you output SQL scripts.
--no-security-labels
Reserved for function extension. The option is not recommended.
--no-tablespaces
Does not select any tablespaces. All the objects will be created during the import process, no matter which tablespace is selected when using this option.
--section=SECTION
Imports the listed sections (such as pre-data, data, or post-data).
--use-set-session-authorization
This option is used for backing up the plain-text format.

Generates the SET SESSION AUTHORIZATION statement as the output instead of the ALTER OWNER statement to determine object ownership. This parameter makes dump more standards-compatible. If the records of objects in exported files are referenced, import may fail. Only administrators can use the SET SESSION AUTHORIZATION statement to dump data, and the administrators must manually change and verify the passwords of exported files by referencing the SET SESSION AUTHORIZATION statement before import. The ALTER OWNER statement requires lower permissions.

If the --use-set-session-authorization parameter is used by a system administrator to import data and the imported data contains objects of the initial user, "ERROR: permission denied to set session authorization" is reported and the import is interrupted when the system administrator switches to the initial user due to security permission verification. In this scenario, you are advised to use the initial user to import and export data.
--pipeline
Uses a pipe to transmit the password. This parameter cannot be used on devices.

If any local additions need to add to the template1 database during the installation, restore the output of gs_restore into an empty database with caution. Otherwise, you are likely to obtain errors due to duplicate definitions of the added objects. To create an empty database without any local additions, copy data from template0 rather than template1. For example:

gaussdb=# CREATE DATABASE foo WITH TEMPLATE template0;

gs_restore cannot import large objects selectively. For example, it can only import the objects of a specified table. If an archive form contains large objects, all large objects will be imported. If this archive object is excluded by -L, -t, or other options, none of the large objects will be imported.

The -d/--dbname and -f/--file options do not coexist.
The -s/--schema-only and -a/--data-only options do not coexist.
The -c/--clean and -a/--data-only options do not coexist.
When the --single-transaction option is used, -j/--jobs must be a single job.
--role must be used with --rolepassword.

Connection parameters:

-h, --host=HOSTNAME
Specifies the host name. If the value begins with a slash (/), it is used as the UDS directory. The default value is taken from the PGHOST environment variable. If it is not set, a UDS connection is attempted.

This parameter is used only for defining names of the hosts outside a database. For the localhost in the database, you can use IPv4 address 127.0.0.1 or IPv6 address ::1.

Environment variable: PGHOST
-p, --port=PORT
Specifies the TCP port listened on by the server or the local UDS file name extension to ensure a correct connection. The default value is the PGPORT environment variable.

If the thread pool is enabled, you are advised to use pooler port, that is, the listening port number plus 1.

Environment variable: PGPORT
-U, --username=NAME
Specifies the name of the connected user. The initial user cannot be used for cross-node execution.

Environment variable: PGUSER
-w, --no-password
Never issues a password prompt. The connection attempt fails if the server requires password for authentication and the password is not provided in other ways. This parameter is useful in batch jobs and scripts in which no user password is required.
-W, --password=PASSWORD
Specifies the user password for connection. If the authentication policy of the host is trust, the password of the system administrator is not verified. That is, you do not need to enter the -W parameter. If this parameter is not specified and you are not a system administrator, the system prompts you to enter the password in interactive mode. To ensure system security, you are advised to enter the password in interactive mode.
--role=ROLENAME
Specifies a role name for the import operation. If this parameter is selected, the SET ROLE statement will be issued after gs_restore connects to the database. It is useful when the authenticated user (specified by -U) lacks the permissions required by gs_restore. This parameter allows the user to switch to a role with the required permissions. Some installations have a policy against logging in directly as the initial user. This parameter allows data to be imported without violating the policy.
--rolepassword=ROLEPASSWORD
Specifies the password of the specific role.

Examples

gs_restore is used to import the files exported by gs_dump.

Example 1:
Execute the gs_restore tool to import the exported MPPDB_backup.dmp file (custom format) to the testdb database.
```
gs_restore backup/MPPDB_backup.dmp -p 8000 -d testdb
restore operation successful
total time: 13053  ms
```
Example 2:
Execute the gs_restore tool to import the exported MPPDB_backup.tar file (tar format) to the testdb database.
```
gs_restore backup/MPPDB_backup.tar -p 8000 -d testdb 
restore operation successful
total time: 21203  ms
```
Example 3:
Execute the gs_restore tool to import the exported MPPDB_backup file (directory format) to the testdb database.
```
gs_restore backup/MPPDB_backup -p 8000 -d testdb
restore operation successful
total time: 21003  ms
```

Example 4:

Execute the gs_restore tool and import the MPPDB_backup.dmp file (custom format). Specifically, import all the object definitions and data in the PUBLIC schema. Existing objects are deleted from the target database before the import. If an existing object references to an object in another schema, you need to manually delete the referenced object first.

gs_restore backup/MPPDB_backup.dmp -p 8000 -d testdb -e -c -n PUBLIC
Error while PROCESSING TOC:
Error from TOC entry 313; 1259 337399 TABLE table1 gaussdba
could not execute query: ERROR:  cannot drop table table1 because other objects depend on it
DETAIL:  view t1.v1 depends on table table1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
    Command was: DROP TABLE IF EXISTS public.table1;

Manually delete the referenced object and create it again after the import is complete.

gs_restore backup/MPPDB_backup.dmp -p 8000 -d testdb -e -c -n PUBLIC
restore operation successful
total time: 2203  ms

Example 5:
Execute the gs_restore tool and import the MPPDB_backup.dmp file (custom format). Specifically, import only the definition of table1 in the PUBLIC schema.
```
gs_restore backup/MPPDB_backup.dmp -p 8000 -d testdb -e -c -s -n PUBLIC -t table1
restore operation successful
total time: 21000  ms
```
Example 6:
Execute the gs_restore tool and import the MPPDB_backup.dmp file (custom format). Specifically, import only the data of table1 in the PUBLIC schema.
```
gs_restore backup/MPPDB_backup.dmp -p 8000 -d testdb -e -a -n PUBLIC -t table1
restore operation successful
total time: 20203  ms
```
Example 7:
In the multi-tenancy scenario, use gs_restore to import the specified PDB_backup.dmp file (user-defined archive format) to the PDB named testpdb.
```
gs_restore backup/PDB_backup.dmp -p 20000 -d testpdb
restore operation successful
total time: 371  ms
```
Example 8:
In the multi-tenancy scenario, use gs_restore to import the specified PDB_backup.tar file (tar format) to the PDB named testpdb.
```
gs_restore backup/PDB_backup.tar -p 20000 -d testpdb
restore operation successful
total time: 367  ms
```
Example 9:
In the multi-tenancy scenario, use gs_restore to import the specified PDB_backup directory file (directory archive format) to the PDB named testpdb.
```
gs_restore backup/PDB_backup -p 20000 -d testpdb
restore operation successful
total time: 370  ms
```
Example 10:
In the multi-tenancy scenario, use gs_restore with the -C, --create option to import data. gs_restore reports an error and exits.
```
gs_restore backup/PDB_backup -C -p 20000 -d testpdb
gs_restore unsupport the '-C, --create' option for pdb.
```

Special case:

Execute the gsql tool. Import the MPPDB_backup.sql file in the export folder (plain-text format) generated by gs_dump or gs_dumpall to the testdb database.

gsql -d testdb -p 8000 -f /home/omm/test/MPPDB_backup.sql
SET
SET
SET
SET
SET
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
CREATE INDEX
CREATE INDEX
CREATE INDEX
SET
CREATE INDEX
REVOKE
REVOKE
GRANT
GRANT
total time: 30476  ms
In the example, the file after -f is the exported file, and 8000 indicates the port number of the database server. testdb indicates the name of the database to be accessed.