ORC
Function
The Apache ORC format allows to read and write ORC data. For details, see ORC Format.
Supported Connectors
- FileSystem
Parameter Description
|
Parameter |
Mandatory |
Default Value |
Data Type |
Description |
|---|---|---|---|---|
|
format |
Yes |
None |
String |
Specify what format to use, here should be orc. |
ORC format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY to enable snappy compression.
Data Type Mapping
ORC format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to ORC type.
|
Flink SQL Type |
ORC Physical Type |
ORC Logical Type |
|---|---|---|
|
CHAR |
bytes |
CHAR |
|
VARCHAR |
bytes |
VARCHAR |
|
STRING |
bytes |
STRING |
|
BOOLEAN |
long |
BOOLEAN |
|
BYTES |
bytes |
BINARY |
|
DECIMAL |
decimal |
DECIMAL |
|
TINYINT |
long |
BYTE |
|
SMALLINT |
long |
SHORT |
|
INT |
long |
INT |
|
BIGINT |
long |
LONG |
|
FLOAT |
double |
FLOAT |
|
DOUBLE |
double |
DOUBLE |
|
DATE |
long |
DATE |
|
TIMESTAMP |
timestamp |
TIMESTAMP |
|
ARRAY |
- |
LIST |
|
MAP |
- |
MAP |
|
ROW |
- |
STRUCT |
Example
Use Kafka to send data and output the data to Print.
- Create a datasource connection for the communication with the VPC and subnet where Kafka locates and bind the connection to the queue. Set a security group and inbound rule to allow access of the queue and test the connectivity of the queue using the Kafka IP address. For example, locate a general-purpose queue where the job runs and choose More > Test Address Connectivity in the Operation column. If the connection is successful, the datasource is bound to the queue. Otherwise, the binding fails.
- Create a Flink OpenSource SQL job and enable checkpointing. Copy the following statement and submit the job:
CREATE TABLE kafkaSource ( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id string ) WITH ( 'connector' = 'kafka', 'topic-pattern' = kafkaTopic', 'properties.bootstrap.servers' = 'KafkaAddress1:KafkaPort,KafkaAddress2:KafkaPort', 'properties.group.id' = 'GroupId'', 'scan.startup.mode' = 'latest-offset', 'format' = 'csv' ); CREATE TABLE sink ( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id string ) WITH ( 'connector' = 'filesystem', 'format' = 'orc', 'path' = 'obs://xx' ); insert into sink select * from kafkaSource;
- Insert the following data into the source Kafka topic:
202103251505050001,appshop,2021-03-25 15:05:05,500.00,400.00,2021-03-25 15:10:00,0003,Cindy,330108 202103241606060001,appShop,2021-03-24 16:06:06,200.00,180.00,2021-03-24 16:10:06,0001,Alice,330106
- Read the ORC file in the OBS path configured in the sink table. The data results are as follows:
202103251202020001, miniAppShop, 2021-03-25 12:02:02, 60.0, 60.0, 2021-03-25 12:03:00, 0002, Bob, 330110 202103241606060001, appShop, 2021-03-24 16:06:06, 200.0, 180.0, 2021-03-24 16:10:06, 0001, Alice, 330106
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.