Elasticsearch Sink Stream

Overview

CS exports job output data to Elasticsearch of Cloud Search Service (CSS). Elasticsearch is a popular enterprise-class Lucene-powered search server and provides the distributed multi-user capabilities. It delivers multiple functions, including full-text retrieval, structured search, analytics, aggregation, and highlighting. With Elasticsearch, you can achieve stable, reliable, real-time search. Elasticsearch applies to diversified scenarios, such as log analysis and site search.

CSS is a fully managed, distributed search service. It is fully compatible with open-source Elasticsearch and provides CS with structured and unstructured data search, statistics, and report capabilities. For more information about CSS, see the Cloud Search Service User Guide.

Prerequisites

  • Ensure that you have created a cluster on CSS using your account. For details about how to create a cluster on CSS, see Creating a Cluster in the Cloud Search Service User Guide.
  • In this scenario, jobs must run on the exclusive cluster of CS. Therefore, CS must interconnect with the VPC that has been connected with CSS. You can also set the security group rules as required.

    For details about how to set up the VPC peering connection, see VPC Peering Connection in the Cloud Stream Service User Guide.

    For details about how to configure security group rules, see Security Group in the Virtual Private Cloud User Guide.

Syntax

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_name attr_type)* )WITH (type = "es",region = "",cluster_address = "",es_index = "",es_type= "",es_fields= "",batch_insert_data_num= "");

Description

Table 1 Syntax description

Parameter

Mandatory

Description

type

Yes

Output channel type. Value es indicates that data is stored to CSS.

region

Yes

Region where CSS is located. For example, cn-north-1.

cluster_address

Yes

Private access address of the CSS cluster, for example: x.x.x.x:x. Use commas (,) to separate multiple addresses.

es_index

Yes

Index storing the data to be inserted.

es_type

Yes

Document type of the data to be inserted.

es_fields

Yes

Key of the data field to be inserted. The parameter is in the format of "Id, f1, f2, f3, f4". Ensure that the parameter value has a one-to-one mapping with data columns in the sink stream. If the key is not used, remove the id keyword. Specifically, the parameter is in the format of "F1, f2, f3, f4, f5".

batch_insert_data_num

Yes

Amount of data to be written in batches at a time. The value must be a positive integer. The upper limit is 100. The default value is 10.

Precautions

None

Example

Data of stream qualified_cars is exported to the cluster on CSS.
CREATE SINK STREAM qualified_cars (
  car_id STRING,
  car_owner STRING,
  car_age INT,
  average_speed INT,
  total_miles INT
)
WITH (
  type = "es",
  region = "cn-north-1" ,
  cluster_address = "192.168.0.212:9200",
  es_index = "china",
  es_type = "zhejiang",
  es_fields = "id,owner,age,speed,miles",
  batch_insert_data_num = "10"
);