Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Importing and Exporting Data by Running COPY

Updated on 2025-01-26 GMT+08:00

COPY is one of cqlsh commands. It includes COPY TO and COPY FROM. They are used to copy data to and from Cassandra.

COPY TO can export data from a table to a CSV, Parquet, or ORC file.
  • If the exported file is in CSV format, it needs to be written into the target file by row, and fields are separated by delimiters.
  • If no field name is specified, all fields are exported.
  • To skip some fields, specify a field list.

COPY FROM allows you to import data from a CSV file to an existing table.

  • The source file is imported by row.
  • All rows in the dataset must contain the same number of fields, and the PRIMARY KEY field must have a value. During the import, the PRIMARY KEY field will be verified and the existing records are updated.
  • If HEADER is set to False and no field name is specified, fields are imported in a specified order. After field names are specified, the fields are imported in sequence. The missing and empty fields are set to null.
  • The source file can only have fewer fields than the target table.
  • When only COPY FROM is used to import data, the number of rows in a dataset cannot exceed 2 million.

Precautions

COPY Syntax

  • COPY TO

    COPY table_name [( column_list )] TO 'file_name' [, 'file2_name', ...] | STDOUT [WITH option = 'value' [ADN ...]]

  • COPY FROM

    COPY table_name [( column_list )] FROM 'file_name' [, 'file2_name', ...] | STDIN [WITH option = 'value' [ADN ...]]

NOTE:

COPY supports one or more comma-separated file names or a list of Python glob expressions.

For some common syntax symbols in the COPY command, see Table 1.

Table 1 Symbol conventions

Symbol

Description

Uppercase letters

Text keyword.

Lowercase letters

A variable, which needs to be replaced with a user-defined value.

Italic

(Optional) Enclose optional command parameters in square brackets ([]). Do not enter only square brackets.

( )

Group. Parentheses (()) indicate the group to be selected. Do not input only brackets.

|

Or. Use vertical bars (|) to separate elements. You can input any element. Do not enter only vertical bars.

...

Repeatable. The ellipsis (...) indicates that you can repeat syntax elements multiple times as required.

'Literal string'

The single quotation marks (') must contain the character string in the CQL statement. Use single quotation marks to keep uppercase letters.

{ key : value }

The map set. Include a map set or key-value pair in braces ({}). Separate keys and values with colons.

<datatype1,datatype2>

Set, list, map, or tuple of an ordered list. Angle brackets (< >) contain data types in collections, lists, maps, or tuples. Data types are separated by commas (,).

cql_statement;

End a CQL statement. Semicolons (;) end all CQL statements.

[--]

Use two hyphens (--) to separate command line options from command arguments. This syntax is useful when parameters may be mistaken for command arguments.

' <schema> ... </schema> '

Search CQL only; single quotation marks (') enclose the entire XML schema declaration.

@xml_entity='xml_entity_type'

Search CQL only; identify entities and literal values to overwrite XML elements in schemas and solrConfig files.

COPY Usage Suggestions

Table 2 Description

Command

Parameter

Description

Default Value

Applicability

TO/FROM

DELIMITER

A single character used to separate fields.

English comma,

-

TO/FROM

QUOTE

A single character that contains a field value.

"

-

TO/FROM

ESCAPE

Escapes a single character using the QUOTE character.

\

-

TO/FROM

HEADER

Boolean value (true | false), indicating the name of the column in the first row.

True matches the field name with the imported column name and inserts the column name into the first row of the exported data.

FALSE

-

TO/FROM

NULL

Filled value of the field whose query result is empty. You can set this parameter as required.

Empty string ()

-

TO/FROM

DATETIMEFORMAT

Time format for reading or writing CSV time data.

The timestamp is in the strftime format. If this parameter is not set, the default value is the value of time_format in the cqlshrc file.

Default format: %Y- %m- %d %H: %M: %S %z.

%Y-%m-%d %H:%M:%S%z

-

TO/FROM

MAXATTEMPTS

Maximum number of retry times when an error occurs.

5

-

TO/FROM

REPORTFREQUENCY

Frequency of displaying the status, in seconds.

0.25

-

TO/FROM

DECIMALSEP

Delimiter character for decimal values.

English full stop.

-

TO/FROM

THOUSANDSSEP

Separator of a thousand array.

None

-

TO/FROM

BOOLSTYLE

Boolean values indicate True and False. The value is case-insensitive. For example, the values yes and no have the same effect as values YES and NO.

True,False

-

TO/FROM

NUMPROCESSES

Number of working processes.

16

The default value of this parameter is the number of kernels on the computer minus one. There is no maximum value for this parameter.

You can run the dstat and dstat -lvrn 10 commands to check the CPU idle time. If the CPU idle time exists, use the default number of working processes. You can increase the number of processes while observing the CPU usage of the instance. It is recommended that the CPU usage be less than or equal to 60%. If the CPU usage of the executor is idle and the CPU usage of the instance exceeds the recommended value, expand the capacity to further improve the performance.

TO/FROM

CONFIGFILE

Specifies a cqlshrc configuration file to set the WITH option.

NOTE:

Command line options always overwrite the cqlshrc file.

None, user-defined

-

TO/FROM

RATEFILE

Prints the output statistics to this file.

None, user-defined

You are advised to add this parameter when exporting data to improve statistics efficiency.

TO/FROM

ORIGIN

Check whether the database to be imported or exported is an open-source Cassandra database.

  • If the open-source Cassandra is used, the value is True.
  • If GeminiDB Cassandra is used, the value is False.

False

-

FROM

CHUNKSIZE

The block size is passed to the worker process.

5000

This parameter specifies the number of rows sent from the Feeder process (reading data from files) to the worker process. Depending on the average row size of the dataset, it may be advantageous to increase the value of this parameter.

FROM

INGESTRATE

Approximate import rate per second.

100000

INGESTRATE indicates the rate (in rows) at which the feeder process sends data to the worker process per second. Generally, you do not need to change the value unless the rate is too high and needs to be limited.

FROM

MAXBATCHSIZE

Maximum size of a batch file to be imported.

20

The value of this parameter can be as large as possible but cannot exceed the upper limit.

  • MAXBATCHSIZE x The size of a single row < batch_size_fail_threshold_in_kb.
  • If the batch size is too large, an alarm will be reported and rejected.
  • Set the following parameters in cassandra.yaml:

    batch_size_warn_threshold_in_kb (The current value is 5.)

    batch_size_fail_threshold_in_kb (The current value is 50.)

FROM

MINBATCHSIZE

Minimum size of a batch import file.

2

For each chunk, the worker process writes data in batches based on the minimum batch size. The value may need to be adjusted based on the block size, number of nodes in the cluster, and number of VNODEs on each node. If the chunk size is larger, increase the value accordingly.

FROM

MAXROWS

Maximum number of rows. The value -1 indicates that there is no upper limit.

-1

-

FROM

SKIPROWS

Number of rows to skip.

0

-

FROM

SKIPCOLS

A comma-separated list of column names to skip.

None, user-defined

-

FROM

MAXPARSEERRORS

Maximum number of global parsing errors. The value -1 indicates that there is no upper limit.

-1

-

FROM

MAXINSERTERRORS

Maximum number of global insertion errors. The value -1 indicates that there is no upper limit.

-1

-

FROM

ERRFILE

A file that stores all rows that are not imported.

If no value is set, the information is stored in import_ ks _ table .err, where ks is the key space and table is the table name.

import_ ks _ table .err

-

FROM

TTL

The time to live is in seconds. By default, data does not expire.

3600

-

TO

ENCODING

Output character string type.

UTF-8

-

TO

PAGESIZE

Size of the page for obtaining results.

1000

Size of the result page. The value is an integer. The default value is 1000.

The larger the page size, the longer the value of pagetimeout. If the data volume in a single row is large, set this parameter to a smaller value. If the data volume in a single row is small, set this parameter to a larger value. The best effect of this value depends on the local batch write capability of the executor. If the local batch write capability is strong (for example, Huawei Cloud obsfs is used), you can increase the value.

TO

PAGETIMEOUT

The page times out to obtain the result.

10

The value is an integer, indicating the timeout interval for obtaining each page. The unit is second. The default value is 10 seconds.

  • For a large page size or a large partition, increase the value of this parameter.
  • If a timeout occurs, increase the value of this parameter.
  • If the server times out, an exponential backoff policy is automatically initiated to prevent the server from being further overloaded, so you may notice the delay. The driver also generates a timeout. In this case, the driver does not know whether the server discards the request or returns the result later. There is a low probability that data may be lost or duplicated. Increasing the value of this parameter is helpful in preventing driver build timeouts.

TO

BEGINTOKEN

Minimum token for exporting data.

None, user-defined

The value is a string, indicating the minimum token to be considered during data export.

Records with smaller tokens will not be exported.

The default value is empty, indicating that there is no minimum token.

TO

ENDTOKEN

Maximum token used to export data.

None, user-defined

The value is a string, indicating the maximum number of tokens to be considered during data export.

Records with larger tokens will not be exported.

This parameter is left empty by default, indicating that there is no maximum token.

TO

MAXREQUESTS

Maximum number of requests that can be processed concurrently by each worker.

6

The value of this parameter is an integer, indicating the maximum number of running requests that can be processed by each working process.

Total degree of parallelism during data export = Number of working processes x Value of this parameter.

Default value: 6 Each request will export data for the entire token range.

TO

MAXOUTPUTSIZE

Maximum size of an output file, in lines.

After this parameter is set, the output file is split into multiple segments when the size of the output file exceeds the value of this parameter. The value -1 indicates that there is no upper limit.

-1

The value of this parameter is an integer, indicating the maximum size of an output file in the unit of lines. If the value of this parameter is exceeded, the output file is split into multiple segments. The default value is -1, indicating that there is no limit on the maximum value. Therefore, the file is the only output file. This parameter can be used together with MAXFILESIZE.

TO

MAXFILESIZE

Maximum size of an output file, in KB.

After this parameter is set, the output file is split into multiple segments when the size of the output file exceeds the value of this parameter.

None, user-defined

The value of this parameter is an integer, indicating the maximum size of an output file in bytes. The final file size is close to the value of this parameter. If the file size exceeds this value, the output file is split into multiple segments. The default value is -1, indicating that there is no limit on the maximum value. Therefore, the file is the only output file. This parameter can be used together with MAXOUTPUTSIZE.

TO

dataformats

Output file format. Currently, this parameter can only be set to json.

None, user-defined

-

TO

DATATYPE

The file format can be Parquet or ORC.

None, user-defined

-

TO

RESULTFILE

The exported file containing detailed results.

None, user-defined

You are advised to add this parameter when exporting data to improve statistics efficiency.

TO

wherecondition

Export condition specified during the export.

None, user-defined

-

Procedure

The following uses an example to describe how to preconfigure data, export data, and import data.

  1. Pre-configuring Data

    1. Create a keyspace.
      CREATE KEYSPACE cycling WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
    2. Create a table.
      CREATE TABLE cycling.cyclist_name (
         id UUID PRIMARY KEY,
         lastname text,
         firstname text
      );
    3. Insert a data record.
      INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS','Marianne');
      INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN','Anna');
      INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (e7ae5cf3-d358-4d99-b900-85902fda9bb0, 'FRAME','Alex');
      INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (220844bf-4860-49d6-9a4b-6b5d3a79cbfb, 'TIRALONGO','Paolo');
      INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'KRUIKSWIJK','Steven');
      INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (fb372533-eb95-4bb4-8685-6ef61e994caa, 'MATTHEWS', 'Michael');

  2. Exports data from and imports data to the cyclist_name table.

    1. Export the id and lastname columns from the cyclist_name table to a CSV file.
      COPY cycling.cyclist_name (id,lastname)  TO '../cyclist_lastname.csv' WITH HEADER = TRUE;
      Figure 1 Exported successfully

      After the preceding command is executed successfully, the cyclist_lastname.csv file is created in the upper-level directory of the current directory. If the file already exists, it will be overwritten.

    2. Export the id and first name columns from the cyclist_name table to another CSV file.
      COPY cycling.cyclist_name (id,firstname)  TO '../cyclist_firstname.csv' WITH HEADER = TRUE;
      Figure 2 Exported successfully

      After the preceding command is executed successfully, the cyclist_firstname.csv file is created in the upper-level directory of the current directory. If the file already exists, it will be overwritten.

    3. Delete data from the cyclist_name table. To ensure data security, the TRUNCATE command is not supported.
      DELETE FROM cycling.cyclist_name WHERE id = 'fb372533-eb95-4bb4-8685-6ef61e994caa';
    4. No data exists in the table.
      SELECT * FROM cycling.cyclist_name ;
      Figure 3 Querying data
    5. Import the cyclist_firstname.csv file.
      COPY cycling.cyclist_name (id,firstname) FROM '../cyclist_firstname.csv' WITH HEADER = TRUE;
      Figure 4 Import succeeded
    6. Verify the imported data.
      SELECT * FROM cycling.cyclist_name;
      Figure 5 Import succeeded
    7. Import the cyclist_lastname.csv file.
      COPY cycling.cyclist_name (id,lastname) FROM '../cyclist_lastname.csv' WITH HEADER = TRUE;
      Figure 6 Importing data
    8. Check whether the data is updated.
      SELECT * FROM cycling.cyclist_name;

      The query result is displayed,

      Figure 7 Import succeeded

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback