Source Table
Function
The file system connector can be used to read single files or entire directories into a single table.
When using a directory as the source path, there is no defined order of ingestion for the files inside the directory. For more information, see FileSystem SQL Connector.
Syntax
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE sink_table ( name string, num INT, p_day string, p_hour string ) partitioned by (p_day, p_hour) WITH ( 'connector' = 'filesystem', 'path' = 'obs://*** ', 'format' = 'parquet', 'source.monitor-interval'='' ); |
Parameter Description
- Directory watching
By default, the file system connector is bounded, that is it will scan the configured path once and then close itself.
You can enable continuous directory watching by configuring the source.monitor-interval parameter:
Key
Default Value
Data Type
Description
source.monitor-interval
None
Duration
The interval in which the source checks for new files. The interval must be greater than 0.
Each file is uniquely identified by its path, and will be processed once, as soon as it's discovered.
The set of files already processed is kept in state during the whole lifecycle of the source, so it's persisted in checkpoints and savepoints together with the source state.
Shorter intervals mean that files are discovered more quickly, but also imply more frequent listing or directory traversal of the file system/object store.
If this config option is not set, the provided path will be scanned once, hence the source will be bounded.
- Available Metadata
The following connector metadata can be accessed as metadata columns in a table definition. All the metadata are read only.
Key
Data Type
Description
file.path
STRING NOT NULL
Full path of the input file
file.name
STRING NOT NULL
Name of the file, that is the farthest element from the root of the filepath
file.size
STRING NOT NULL
Byte count of the file
file.modification-time
TIMESTAMP_LTZ(3) NOT NULL
Modification time of the file
Example
Read data from the OBS table as the data source and output it to the Print connector.
CREATE TABLE obs_source( name string, num INT, `file.path` STRING NOT NULL METADATA ) WITH ( 'connector' = 'filesystem', 'path' = 'obs://demo/sink_parquent_obs', 'format' = 'parquet', 'source.monitor-interval'='1 h' ); CREATE TABLE print ( name string, num INT, path STRING ) WITH ( 'connector' = 'print' ); insert into print select * from obs_source;
+I[0e72e, 841255524, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0] +I[53524, -2032270969, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0] +I[77225, 245599258, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0] +I[fc202, -545621464, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0] +I[07e9d, 1511139764, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0] +I[4e48b, 278014413, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot