Exporting Data Using SELECT INTO OUTFILE OBS
Scenarios
You can use the SELECT INTO OUTFILE OBS statement to query data from TaurusDB databases, export the data as text files, and store the files to OBS buckets. The files are first stored in local temporary storage and then exported to OBS. After the export is complete, the local files are deleted.
This statement exports only table data as text files and does not export table schemas.
Supported Versions
The kernel version must be 2.0.75.260300 or later.
For details about how to check the kernel version, see How Can I Check the Version of a TaurusDB Instance?
Prerequisites
- The parameter rds_load_from_obs_enable must be set to ON. For details about how to set it, see Modifying Parameters of a DB Instance.
- The user that issues the export statement must have SELECT and FILE privileges on the database table from which data will be exported.
- You must grant the TaurusDB cluster permission to access OBS first.
- If you enable the function of loading data from OBS buckets using a Huawei Cloud account, no additional configuration is required. If you enable this function as an IAM user for the first time, you must have permission to create an agency.
- To load data from OBS buckets, you must have the following IAM permissions:
Table 1 IAM permissions and agencies IAM Policy
IAM Permissions
Agency
Role/Policy-based policy
- iam:agencies:listAgencies
- iam:agencies:createAgency
- iam:permissions:listRolesForAgencyOnProject
- iam:permissions:grantRoleToGroupOnProject
- iam:permissions:grantRoleToAgencyOnProject
- iam:roles:listRoles
- iam:roles:createRole
If you do not have these permissions, create a custom policy.
The system automatically creates a DbsObsLoadAgency agency. This agency is visible to you. Deleting the agency will result in failures to load data from OBS buckets.
Identity policy-based policy
iam:agencies:list
If you do not have these permissions, create a custom identity policy.
Create a DbsObsLoadAgency agency. If you have already created the agency, skip this step.
Constraints
- The number of files exported to an OBS bucket depends on the amount of data selected by SELECT INTO OUTFILE OBS and the OBS file size threshold. The default file size threshold for OBS is about 6 GB.
- If the size of the selected data is below the threshold, a single OBS file is generated.
- If the size reaches or exceeds the threshold, set MANIFEST to ON. The system automatically creates multiple OBS files and generates a manifest file.
- TaurusDB ensures that row records in data files are never split across file boundaries. The system attempts to keep the size of each data file (except the last one) close to the preset threshold. To maintain row data integrity, if a row exceeds the remaining space of the current file, the entire row is written to that file. The file size will exceed the configured threshold.
- To export a large amount of data (more than 25 GB), you are advised to use multiple SELECT INTO OUTFILE OBS statements to export the data to OBS.
Each statement should select a different portion of the data and have a unique object name in the OBS_URI parameter. Partitioning the data with multiple statements makes it easier to recover from an error in one statement. Using multiple statements also helps avoid a single long-running transaction, which can improve database performance.
If an error occurs in one statement, only a portion of data needs to be re-exported.
- If multiple SELECT INTO OUTFILE OBS statements that use the same OBS_URI run in parallel, the export behavior is undefined. Only one SQL statement will succeed and the others will fail.
- To re-execute an interrupted SELECT INTO OUTFILE OBS statement, you must delete any existing files in the specified OBS bucket that match the OBS_URI parameter or use SELECT INTO OUTFILE OBS OVERWRITE ON to overwrite such files.
- If there are any files in the specified OBS bucket that match the OBS_URI parameter, the export will fail with a "file already exists" error. To resolve this, you can specify a unique object name in OBS_URI, remove those existing files from OBS, or include OVERWRITE ON in the statement.
- Currently, the export progress cannot be directly monitored. However, you can estimate the progress by comparing the size of the file being exported to OBS with the size of the data selected for the statement.
- If a crash occurs during the execution of SELECT INTO OUTFILE OBS, manually delete the residual files from OBS.
- All parameters in the syntax tree must be in a specific sequence. If the sequence is changed, a syntax error will be reported.
- Ensure that the URI in the SQL statement does not end with manifest. The system uses this suffix by default to generate manifest files, so avoid it to prevent conflicts.
Creating a DbsObsLoadAgency Agency
- Log in to the IAM console.
- In the navigation pane, choose Agencies. On the displayed page, search for DbsObsLoadAgency.
- On the displayed page, set agency parameters. Figure 1 Creating an agency
- Agency Name: Enter DbsObsLoadAgency.
- Agency Type: Select Cloud service.
- Cloud Service: Search for op_svc_rds and select Database Service (DBS).
- Validity Period: Select Unlimited.
- Click Done.
- In the displayed dialog box, click Authorize.
- Search for DbsObsLoad, select it in the list, and click Next.

- Select Region-specific projects and the target region, and click OK.

Using SELECT INTO OUTFILE OBS to Export Data
- Syntax
SELECT [ALL | DISTINCT | DISTINCTROW ] [HIGH_PRIORITY] [STRAIGHT_JOIN] [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT] [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS] select_expr [, select_expr ...] [FROM table_references [PARTITION partition_list] [WHERE where_condition] [GROUP BY {col_name | expr | position} [ASC | DESC], ... [WITH ROLLUP]] [HAVING where_condition] [ORDER BY {col_name | expr | position} [ASC | DESC], ...] [LIMIT {[offset,] row_count | row_count OFFSET offset}] INTO OUTFILE OBS 'OBS-URI' [CHARACTER SET charset_name] [export_options] [MANIFEST {ON | OFF}] [OVERWRITE {ON | OFF}]export_options:
[FORMAT {CSV|TEXT} [HEADER]] [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]The SELECT INTO OUTFILE OBS statement uses the following required and optional parameters. You can find more details in the MySQL official documentation.
- FORMAT {CSV|TEXT}: specifies the format of the output file, which can be CSV or TEXT.
- HEADER: specifies that the output file contains a header line. The headers correspond to the column names of the SELECT statement.
- select_expr: specifies the column or expression to be queried and exported.
- OBS_URI: specifies the path of an OBS file. It must follow the format: obs://bucket/object.
- bucket: the name of the OBS bucket where data will be stored.
- object: the name of the file to be stored in OBS.
- MANIFEST {ON | OFF}: specifies whether to create a manifest file. This field is optional. The default value is OFF.
If MANIFEST ON is specified in a query, a manifest file in JSON format will be automatically created in OBS after all data text files are created and uploaded. The manifest file contains the URLs of all uploaded text files.
The file name is suffixed with .part_xxxxx, where xxxxx is a five-digit integer, for example, obs://bucket/object.part_00000.
The manifest file path is obs://bucket/object.manifest.
A manifest file is a JSON file that lists the text files to be saved to OBS. Its format and content are as follows:
{ "entries": [ { "url": "obs://bucket/customermeta", "mandatory": true }, { "url": "obs://bucket/customerdata", "mandatory": true } ] } - OVERWRITE {ON | OFF}
Specifies whether to overwrite existing files in a bucket when uploading files to OBS. If OVERWRITE ON is specified, the files with the same URI will be overwritten. Otherwise, an error will be reported for duplicate files.
- FIELDS | COLUMNS: optional. It specifies how the fields in the file are delimited.
- TERMINATED BY: specifies the delimiter between fields. Fields are tab-delimited (\t) by default.
- ENCLOSED BY: specifies the character used to enclose field values. If OPTIONALLY is used, only character-type fields are enclosed. Otherwise, all fields are enclosed.
- ESCAPED BY: specifies the escape character. The default value is a backslash (\).
- LINES: optional. It determines how the lines in the file are delimited.
- STARTING BY: specifies the prefix string to be ignored in each line.
- TERMINATED BY: defines the line terminator. Lines are delimited by a newline character (\n) by default.
- Examples
- Exporting a single file
Export data from table t of the TaurusDB database test to the t.txt file in an OBS bucket.
SELECT * FROM test.t into OUTFILE OBS "obs://bucket/t.txt";
- Exporting a manifest file
- Export data from table t of the TaurusDB database to data files (such as test.part_00000 and test.part_00001) in an OBS bucket, and generate a manifest file named test.manifest.
SELECT * FROM t into OUTFILE OBS "obs://bucket/test" MANIFEST ON;
- Overwrite any files in the bucket that have the same name as the file you specify for the statement, including the data files (like test.part_00000) and the test.manifest file.
SELECT * FROM t into OUTFILE OBS "obs://bucket/test" MANIFEST ON OVERWRITE ON;
- Export data from table t of the TaurusDB database to data files (such as test.part_00000 and test.part_00001) in an OBS bucket, and generate a manifest file named test.manifest.
- Complex export statement The following SQL statement extracts deduplicated completed orders from specific partitions, performs grouping and aggregation (using WITH ROLLUP for grand total rows), sorts the results in descending order, and exports the data to OBS in a standard CSV format. This statement enables the overwrite mode and generates a manifest file.
SELECT DISTINCT -- Filter out duplicates. HIGH_PRIORITY -- High priority for the optimizer id, user_name, order_amount FROM orders PARTITION (p2023, p2024) -- Export data only from the specified partitions. WHERE order_status = 'COMPLETED' GROUP BY id, user_name, order_amount ASC WITH ROLLUP -- Group and sort data, and generate aggregate summary rows. HAVING order_amount > 100 ORDER BY order_amount DESC LIMIT 1000 -- Only the first 1,000 rows are extracted. INTO OUTFILE OBS 'obs://bucket/export/order_report/' CHARACTER SET utf8mb4 -- Specify the character set. FORMAT CSV HEADER -- CSV format with headers FIELDS TERMINATED BY ',' -- Comma-separated OPTIONALLY ENCLOSED BY '""' -- Character strings enclosed in double quotation marks ESCAPED BY '\\' -- Backslash escape LINES STARTING BY '' -- No prefix for rows TERMINATED BY '\n' -- Newline character MANIFEST ON -- Generate a manifest file. OVERWRITE ON; -- Overwrite old files.
- Export statement containing multiple keywords
This statement performs a STRAIGHT_JOIN between the users and orders tables, using the users table as the driving table. It groups the results by user_id (including grand total rows), removes duplicates, and exports the results to OBS in a pipe-delimited (using the vertical bar "|") text format with headers. The statement also generates a manifest file and enables the overwrite mode (OVERWRITE ON).
SELECT DISTINCTROW -- Same as DISTINCT, which specifies that duplicate rows will be removed from the result set. SQL_SMALL_RESULT -- Advise the optimizer that the result set is small and a temporary table in memory can be used. SQL_CALC_FOUND_ROWS -- Instruct the database to calculate the total number of rows in the result set, ignoring any LIMIT clause. The result can then be retrieved using SELECT FOUND_ROWS(). STRAIGHT_JOIN -- Force the database to join tables in the order they are listed in the statement. It treats the users table as the driving table and the orders table as the driven table. u.user_id, o.order_no FROM users u JOIN orders o ON u.user_id = o.user_id WHERE u.status = 'active' GROUP BY u.user_id DESC WITH ROLLUP -- Generate subtotal rows and grand total rows in the result set. When it is used with a GROUP BY clause, it appends an additional row containing the summary statistics for all groups. INTO OUTFILE OBS 'obs://bucket/full_test/data_' CHARACTER SET gbk -- Specify the character set as GBK. FORMAT TEXT HEADER -- Text format with headers COLUMNS TERMINATED BY '|' OPTIONALLY ENCLOSED BY ""'"" ESCAPED BY '!' -- Use an exclamation mark as the escape character. LINES STARTING BY '>>>' -- Add a prefix to each line. TERMINATED BY '\r\n' -- Windows-style newline character MANIFEST ON OVERWRITE ON;
- Exporting a single file
Common Errors
The following table lists OBS error codes that may appear in error messages when you import a text file.
| Error Code | Error Message | Measures |
|---|---|---|
| HttpErrorNotFound | Object not found. | Check whether the object exists in the OBS bucket. |
| InvalidAccessKeyId | Invalid AK. | The SQL statement execution takes an extended period, causing the temporary AK to expire. Avoid exporting a large amount of data at a time. |
For more information, see OBS Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot