Updated on 2024-11-29 GMT+08:00

Connecting Elasticsearch to Flume ESSink

Scenario

The Solr and Elasticsearch components of MRS depend on Lucene of different versions. Therefore, Flume can connect only to Solr or Elasticsearch at the same time. For compatibility purposes, Flume connects to Solr by default. To connect Flume to Elasticsearch, you need to adjust Lucene.

This section describes how to adjust the Lucene JAR package.

Procedure

The Flume service can either run on a server or on a client. Perform the following adjustments based on the running location of ESSink:

  1. Go to the lib directory in the Flume installation directory, for example, ${BIGDATA_HOME}/FusionInsight_Porter_8.1.0.1/install/FusionInsight-Flume-Flume component version/flume/lib/, record the permissions and owner groups of all lucene-* files, and back up the files. Then, delete all lucene-* files.

    If Flume is running on a client, perform this step in the lib directory of the Flume client installation directory.

  2. Go to the lib directory on the Elasticsearch server, for example, ${BIGDATA_HOME}/FusionInsight_Elasticsearch_8.1.0.1/install/FusionInsight-Elasticsearch-7.10.2/elasticsearch/lib, and collect all packages on which Elasticsearch depends. The package names start with Lucene.
  3. Copy the JAR files collected in 2 to the lib directory in the Flume installation directory, for example, ${BIGDATA_HOME}/FusionInsight_Porter_8.1.0.1/install/FusionInsight-Flume-Flume component version/flume/lib/, and change the permission and owner group of the new JAR files to be the same as those of the original ones.

    If Flume is running on a client, perform this step in the lib directory of the Flume client installation directory.

  4. Restart the corresponding Flume instance processes. If Flume is running on the client, restart the Flume agent on the client.

    Log in to FusionInsight Manager and choose Cluster > Services > Flume. On the page that is displayed, click the Instance tab. In the instance list, select the instance to be restarted and choose More > Restart Instance. In the displayed dialog box, enter the password and click OK. Wait until the instance is restarted.

    To deploy ESSink on multiple hosts, perform the preceding steps on each host.

    Table 1 ESSink configuration

    Parameter

    Default Value

    Description

    type

    com.*.flume.sinks.elasticsearch.ESSink

    The default value is a type name.

    servers

    -

    EsNode list. Values are in the IP address:port format. The port is the TRANSPORT_TCP_PORT or SERVER_PORT of EsNode.

    NOTE:

    All IP addresses and ports of the EsNodes need to be configured for Flume fault migration.

    client

    transport

    Elasticsearch client connection type. The value can be transport or rest.

    securityEnable

    false

    Whether to enable the security mode for the Elasticsearch cluster

    clusterName

    elasticsearch_cluster

    Name of the Elasticsearch cluster

    NOTE:

    If multiple Elasticsearch services are installed in the cluster, this parameter must correspond to each service and cannot be only set to elasticsearch_cluster. For example, configure this parameter as follows:

    elasticsearch_cluster for the Elasticsearch service

    elasticsearch-1_cluster for the Elasticsearch-1 service

    batchSize

    1000

    Number of events written to the Channel in batches.

    indexName

    -

    Index name.

    indexType

    -

    Index type.

    rest.callbackConnectTimeout

    5000

    Timeout interval for connecting to the RequestConfigCallback of the REST client.

    rest.callbackSocketTimeout

    60000

    Timeout interval for session with the RequestConfigCallback of the REST client.

    rest.builderMaxTimeout

    60000

    Maximum retry timeout interval of the REST client RestClientBuilder.

    serializer

    -

    Serializer. Two options are provided:

    com.*.flume.sinks.elasticsearch.ElasticSearchLogStashEventSerializer com.*.flume.sinks.elasticsearch.ElasticSearchDynamicSerializer

    Default value:

    com.*.flume.sinks.elasticsearch.ElasticSearchLogStashEventSerializer

    indexNameBuilder

    -

    Index name builder. Two options are provided:

    com.*.flume.sinks.elasticsearch.TimeBasedIndexNameBuilder

    com.*.flume.sinks.elasticsearch.SimpleIndexNameBuilder

    Default value:

    com.*.flume.sinks.elasticsearch.TimeBasedIndexNameBuilder

    channel

    -

    Channel connected to ESSink.

    When Flume is running and is connected to the secure Elasticsearch, upload the configured jaas.conf, krb5.conf, and user keytab files to the corresponding directories using WinSCP.

    • If Flume runs on the server, upload the package to the etc directory on the server, for example, ${BIGDATA_HOME}/FusionInsight_Porter_8.1.0.1/1_11_Flume/etc/.
    • If Flume runs on the client, upload the file to the conf directory on the client, for example, /opt/flumeClient/fusioninsight-flume-Flume component version/conf. For details about the jaas.conf configuration, see Common Issues About Flume. The file permission must be the same as that of the file in the corresponding directory, and the jaas.conf configuration file must start with EsClient.