Esta página ainda não está disponível no idioma selecionado. Estamos trabalhando para adicionar mais opções de idiomas. Agradecemos sua compreensão.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Importing Data

Updated on 2024-07-04 GMT+08:00

Function

The LOAD DATA function can be used to import data in CSV, Parquet, ORC, JSON, and Avro formats. The data is converted into the Parquet data format for storage.

Syntax

1
2
LOAD DATA INPATH 'folder_path' INTO TABLE [db_name.]table_name
  OPTIONS(property_name=property_value, ...);

Keywords

  • INPATH: path of data to be imported
  • OPTIONS: list of properties

Parameters

Table 1 Parameters

Parameter

Description

folder_path

OBS path of the file or folder used for storing the raw data.

db_name

Enter the database name. If this parameter is not specified, the current database is used.

table_name

Name of the DLI table to which data is to be imported.

The following configuration options can be used during data import:

  • DATA_TYPE: specifies the type of data to be imported. Currently, CSV, Parquet, ORC, JSON, and Avro are supported. The default value is CSV.

    The configuration item is OPTIONS ('DATA_TYPE' = 'CSV').

    When importing a CSV file or a JSON file, you can select one of the following modes:

    • PERMISSIVE: When the PERMISSIVE mode is selected, the data of a column is set to null if its data type does not match that of the target table column.
    • DROPMALFORMED: When the DROPMALFORMED mode is selected, the data of a column s not imported if its data type does not match that of the target table column.
    • FAILFAST: When the FAILFAST mode is selected, exceptions might occur and the import may fail if a column type does not match.

    You can set the mode by adding OPTIONS ('MODE' = 'PERMISSIVE') to the OPTIONS parameter.

  • DELIMITER: You can specify a separator in the import statement. The default value is ,.

    The configuration item is OPTIONS('DELIMITER'=',').

    For CSV data, the following delimiters are supported:

    • Tab character, for example, 'DELIMITER'='\t'.
    • Any binary character, for example, 'DELIMITER'='\u0001(^A)'.
    • Single quotation mark ('). A single quotation mark must be enclosed in double quotation marks (" "). For example, 'DELIMITER'= "'".
    • \001(^A) and \017(^Q) are also supported, for example, 'DELIMITER'='\001(^A)' and 'DELIMITER'='\017(^Q)'.
  • QUOTECHAR: You can specify quotation marks in the import statement. The default value is double quotation marks (").

    The configuration item is OPTIONS('QUOTECHAR'='"').

  • COMMENTCHAR: You can specify the comment character in the import statement. During the import operation, if a comment character is at the beginning of a row, the row is considered as a comment and will not be imported. The default value is a pound key (#).

    The configuration item is OPTIONS('COMMENTCHAR'='#').

  • HEADER: Indicates whether the source file contains a header. Possible values can be true and false. true indicates that the source file contains a header, and false indicates that the source file does not contain a header. The default value is false. If no header exists, specify the FILEHEADER parameter in the LOAD DATA statement to add a header.

    The configuration item is OPTIONS('HEADER'='true').

  • FILEHEADER: If the source file does not contain any header, add a header to the LOAD DATA statement.

    OPTIONS('FILEHEADER'='column1,column2')

  • ESCAPECHAR: Is used to perform strict verification of the escape character on CSV files. The default value is a slash (\\).
    The configuration item is OPTIONS. (ESCAPECHAR?=?\\?)
    NOTE:

    Enter ESCAPECHAR in the CSV data. ESCAPECHAR must be enclosed in double quotation marks (" "). For example, "a\b".

  • MAXCOLUMNS: This parameter is optional and specifies the maximum number of columns parsed by a CSV parser in a line.

    The configuration item is OPTIONS('MAXCOLUMNS'='400').

    Table 2 MAXCOLUMNS

    Name of the Optional Parameter

    Default Value

    Maximum Value

    MAXCOLUMNS

    2000

    20000

    NOTE:

    After the value of MAXCOLUMNS Option is set, data import will require the memory of executor. As a result, data may fail to be imported due to insufficient executor memory.

  • DATEFORMAT: Specifies the date format of a column.

    OPTIONS('DATEFORMAT'='dateFormat')

    NOTE:
    • The default value is yyyy-MM-dd.
    • The date format is specified by the date mode string of Java. For the Java strings describing date and time pattern, characters A to Z and a to z without single quotation marks (') are interpreted as pattern characters , which are used to represent date or time string elements. If the pattern character is quoted by single quotation marks ('), text matching rather than parsing is performed. For the definition of pattern characters in Java, see Table 3.
    Table 3 Definition of characters involved in the date and time patterns

    Character

    Date or Time Element

    Example

    G

    Epoch ID

    AD

    y

    Year

    1996; 96

    M

    Month

    July; Jul; 07

    w

    Number of the week in a year

    27 (the twenty-seventh week of the year)

    W

    Number of the week in a month

    2 (the second week of the month)

    D

    Number of the day in a year

    189 (the 189th day of the year)

    d

    Number of the day in a month

    10 (the tenth day of the month)

    u

    Number of the day in a week

    1 (Monday), ..., 7 (Sunday)

    a

    am/pm flag

    pm (12:00-24:00)

    H

    Hour time (0-23)

    2

    h

    Hour time (1-12)

    12

    m

    Number of minutes

    30

    s

    Number of seconds

    55

    S

    Number of milliseconds

    978

    z

    Time zone

    Pacific Standard Time; PST; GMT-08:00

  • TIMESTAMPFORMAT: Specifies the timestamp format of a column.

    OPTIONS('TIMESTAMPFORMAT'='timestampFormat')

    NOTE:
    • Default value: yyyy-MM-dd HH:mm:ss.
    • The timestamp format is specified by the Java time pattern string. For details, see Table 3.
  • Mode: Specifies the processing mode of error records while importing. The options are as follows: PERMISSIVE, DROPMALFORMED, and FAILFAST.

    OPTIONS('MODE'='permissive')

    NOTE:
    • PERMISSIVE (default): Parse bad records as much as possible. If a field cannot be converted, the entire row is null.
    • DROPMALFORMED: Ignore the bad records that cannot be parsed.
    • FAILFAST: If a record cannot be parsed, an exception is thrown and the job fails.
  • BADRECORDSPATH: Specifies the directory for storing error records during the import.

    OPTIONS('BADRECORDSPATH'='obs://bucket/path')

    NOTE:

    It is recommended that this option be used together with the DROPMALFORMED pattern to import the records that can be successfully converted into the target table and store the records that fail to be converted to the specified error record storage directory.

Precautions

  • When importing or creating an OBS table, you must specify a folder as the directory. If a file is specified, data import may be failed.
  • Only the raw data stored in the OBS path can be imported.
  • You are advised not to concurrently import data in to a table. If you concurrently import data into a table, there is a possibility that conflicts occur, leading to failed data import.
  • Only one path can be specified during data import. The path cannot contain commas (,).
  • If a folder and a file with the same name exist in the OBS bucket directory, the data is preferentially to be imported directed to the file rather than the folder.
  • When importing data of the PARQUET, ORC, or JSON format, you must specify DATA_TYPE. Otherwise, the data is parsed into the default format CSV. In this case, the format of the imported data is incorrect.
  • If the data to be imported is in the CSV or JSON format and contains the date and columns, you need to specify DATEFORMAT and TIMESTAMPFORMAT. Otherwise, the data will be parsed into the default date and timestamp formats.

Example

NOTE:

Before importing data, you must create a table. For details, see Creating an OBS Table or Creating a DLI Table.

  • To import a CSV file to a DLI table named t, run the following statement:
    1
    2
    LOAD DATA INPATH 'obs://dli/data.csv' INTO TABLE t
      OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','COMMENTCHAR'='#','HEADER'='false');
    
  • To import a JSON file to a DLI table named jsontb, run the following statement:
    1
    2
    LOAD DATA INPATH 'obs://dli/alltype.json' into table jsontb
      OPTIONS('DATA_TYPE'='json','DATEFORMAT'='yyyy/MM/dd','TIMESTAMPFORMAT'='yyyy/MM/dd HH:mm:ss');
    

Usamos cookies para aprimorar nosso site e sua experiência. Ao continuar a navegar em nosso site, você aceita nossa política de cookies. Saiba mais

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback