Kafka Sink Stream

Overview

CS exports the job output data to Kafka.

Apache Kafka is a fast, scalable, and fault-tolerant distributed message publishing and subscription system. It delivers high throughput and built-in partitions and provides data replicas and fault tolerance. Apache Kafka is applicable to scenarios of handling massive messages. Kafka clusters are deployed and hosted on MRS that is powered on Apache Kafka.

Prerequisites

  • If the Kafka server listens on the port using hostname, you need to add the mapping between the hostname and IP address of the Kafka Broker node to the CS cluster. Contact the Kafka service deployment personnel to obtain the hostname and IP address of the Kafka Broker node. For details about how to add an IP-domain mapping, see the description of Adding an IP-Domain Mapping in Cluster Management in the Cloud Stream Service User Guide.
  • When using offline Kafka clusters, use VPC peering connections to connect CS to Kafka.

    For details about how to set up the VPC peering connection, see VPC Peering Connection in the Cloud Stream Service User Guide.

Syntax

Syntax

CREATE SINK STREAM kafka_sink (name STRING) WITH(type = "kafka",kafka_bootstrap_servers = "",kafka_topic = "",encode = "json")

Description

Table 1 Syntax description

Parameter

Mandatory

Description

type

Yes

Output channel type. Value kafka indicates that data is stored to Kafka.

kafka_bootstrap_servers

Yes

Port that connects CS to Kafka. Use VPC peering connections to connect CS clusters with Kafka clusters.

kafka_topic

Yes

Kafka topic into which CS writes data.

encode

Yes

Encoding format. Currently, only JSON is supported.

Precautions

None

Example

Output data to Kafka.

CREATE SINK STREAM kafka_sink (name STRING) 
WITH (
  type="kafka",
  kafka_bootstrap_servers =  "ip1:port1,ip2:port2",
  kafka_topic = "testsink",
  encode = "json" 
);