OBS Sink Stream
Overview
Create a sink stream to export CS data to OBS. CS can export the job analysis results to OBS. OBS applies to various scenarios, such as big data analysis, cloud-native application program data, static website hosting, backup/active archive, and deep/cold archive.
OBS is an object-based storage service. It provides massive, secure, highly reliable, and low-cost data storage capabilities. For more information, see the Object Storage Service Console Operation Guide.
Precautions
Before data exporting, check the version of the OBS bucket. The OBS sink stream supports data exporting to an OBS bucket running OBS 3.0 or a later version.
Syntax
Syntax
CREATE SINK STREAM stream_id (
attr_name attr_type (',' attr_name attr_type)*
)WITH (
type = "obs",
region = "",
encode = "",
field_delimiter = "",
row_delimiter = "",
obs_dir = "",
file_prefix = "",
rolling_size = "",
rolling_interval = "",
quote = "",
array_bracket = "",
append = "",
max_record_num_per_file = "",
dump_interval = "",
dis_notice_channel = "",
max_record_num_cache = "",
carbon_properties = ""
) Description
| Parameter | Mandatory | Description |
|---|---|---|
| type | Yes | Output channel type. Value obs indicates that data is stored to OBS. |
| region | Yes | Region to which OBS belongs. |
| encode | Yes | Encoding format. Currently, formats CSV, JSON, ORC, and CarbonData are supported. |
| field_delimiter | No | Separator used to separate every two attributes.
|
| row_delimiter | No | Row delimiter. This parameter does not need to be configured if the CarbonData or ORC encoding format is adopted. |
| json_config | No | If encode is set to json, you can set this parameter to specify the mapping between the JSON field and the stream definition field. An example of the format is as follows: field1=data_json.field1;field2=data_json.field2. |
| obs_dir | Yes | Directory for storing files. The directory is in the format of {Bucket name}/{Directory name}, for example, obs-a1/dir1/subdir. |
| file_prefix | No | Prefix of the data export file name. The generated file is named in the format of file_prefix.x, for example, file_prefix.1 and file_prefix.2. If this parameter is not specified, the file prefix is temp by default. This parameter is not applicable to CarbonData files. |
| rolling_size | No | Maximum size of a file. NOTE:
|
| rolling_interval | No | Time mode, in which data is saved to the corresponding directory. NOTE:
|
| quote | No | Modifier, which is added before and after each attribute only when the CSV encoding format is adopted. You are advised to use invisible characters, such as u0007, as the parameter value. |
| array_bracket | No | Array bracket, which can be configured only when the CSV encoding format is adopted. The available options are (), {}, and []. For example, if you set this parameter to {}, the array output format is {a1, a2}. |
| append | No | The value can be true or false. The default value is true. If OBS does not support the append mode, set this parameter to false. If Append is set to false, max_record_num_per_file and dump_interval must be set. |
| max_record_num_per_file | No | Maximum number of records in a file. This parameter needs to be configured only when the ORC or CarbonData encoding format is adopted. After this parameter is specified, a new file is generated to store extra data records after the number of records stored in a file reaches the allowed quantity. |
| dump_interval | No | Triggering period. This parameter needs to be configured when the ORC encoding format is adopted or notification to DIS is enabled.
|
| dis_notice_channel | No | DIS stream where CS sends the record that contains the OBS directory CS periodically sends the DIS stream a record, which contains the OBS directory, indicating that no more new files will be generated in the directory. |
| max_record_num_cache | No | Maximum number of cache records. This parameter can be set only when the CarbonData encoding format is adopted. The minimum value of this parameter cannot be less than that of max_record_num_per_file. The default value is max_record_num_per_file. |
| carbon_properties | No | Carbon attribute. This field can be configured only when the CarbonData encoding format is adopted. The value is in the format of k1=v1, k2=v2. All configuration items supported by the withTableProperties function in carbon-sdk are supported. In addition, the configuration items IN_MEMORY_FOR_SORT_DATA_IN_MB and UNSAFE_WORKING_MEMORY_IN_MB are supported. |
Example
- Export the car_infos data to the obs-sink bucket in OBS. The output directory is car_infos. The output file uses greater_30 as the file name prefix. The maximum size of a single file is 100 MB. If the data size exceeds 100 MB, another new file is generated. The data is encoded in CSV format, the comma (,) is used as the attribute delimiter, and the line break is used as the line separator.
CREATE SINK STREAM car_infos ( car_id STRING, car_owner STRING, car_brand STRING, car_price INT, car_timestamp LONG ) WITH ( type = "obs", encode = "csv", region = "cn-north-1" , field_delimiter = ",", row_delimiter = "\n", obs_dir = "obs-sink/car_infos", file_prefix = "greater_30", rolling_size = "100m" );
- Example of the CarbonData encoding format
CREATE SINK STREAM car_infos ( car_id STRING, car_owner STRING, car_brand STRING, car_price INT, car_timestamp LONG ) WITH ( type = "obs", region = "cn-north-1" , encode = "carbondata", obs_dir = "cs-append-2/carbondata", max_record_num_per_file = "1000", max_record_num_cache = "2000", dump_interval = "60", ROLLING_INTERVAL = "yyyy/MM/dd/HH/mm", dis_notice_channel = "dis-notice", carbon_properties = "long_string_columns=MessageBody, IN_MEMORY_FOR_SORT_DATA_IN_MB=512" );
- Example of the ORC encoding format
CREATE SINK STREAM car_infos ( car_id STRING, car_owner STRING, car_brand STRING, car_price INT, car_timestamp LONG ) WITH ( type = "obs", region = "cn-north-1" , encode = "orc", obs_dir = "cs-append-2/obsorc", FILE_PREFIX = "es_info", max_record_num_per_file = "100000", dump_interval = "60" );
Last Article: DIS Sink Stream
Next Article: HBase Sink Stream
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.