Updated on 2024-11-29 GMT+08:00

Optimization Suggestions on Solr over HBase

Scenario

Optimize the environment configuration as an MRS cluster administrator when using Solr over HBase.

Prerequisites

HDFS, Solr, Yarn, and HBase have been installed.

Procedure

When using Solr over HBase, you can optimize the configuration from the following aspects:

  • Suggestions on real-time collections
    1. Modify the <autoSoftCommit> configuration item in the solrconfig.xml configuration file of the collection configuration set. Set the configuration item to a larger value based on the application scenario. A larger value indicates higher index efficiency.
    2. Modify the HBase service configuration and restart the HBase service.
      • replication.source.nb.capacity: 5,000 (Maximum number of entries sent by the HBase cluster to HBaseIndexer each time. The recommended value is 5,000. The value can be adjusted based on the cluster scale and can be increased based on the HBaseIndexer deployment.)
      • replication.source.size.capacity: 4,194,304 (Maximum size of the entry packet sent by HBase to HBaseIndexer each time. It is not recommended that the size be too large.)
    3. Modify the HDFS service configuration and restart the HDFS service.
      • hadoop.rpc.protection: authentication (Data transmission encryption is disabled. The default value is privacy in security mode and authentication in common mode.)
      • ipc.server.handler.queue.size: indicates the number of calls that can be processed by each handler in a queue. Set this parameter based on the cluster environment.
      • dfs.namenode.handler.count: indicates the number of server threads on NameNode. Set this parameter based on the cluster environment.
      • dfs.namenode.service.handler.count: indicates the number of server threads on NameNode. Set this parameter based on the cluster environment.
      • dfs.datanode.handler.count: indicates the number of service threads on DataNode. Set this parameter based on the cluster environment.
    4. Modify the HBaseIndexer service configuration and restart the HBaseIndexer instance.
      • hbaseindexer.indexer.threads: 50 (The default value is 20, indicating the number of concurrent threads started when the HBaseIndexer instance performs collection operations.)
      • Change the value of GC_OPTS to 4 GB. If the memory space is sufficient, you can increase the value.
  • Suggestions on batch collections and incremental collections
    1. Modify the GC parameter configuration of SolrServer (you can increase the value if the memory is sufficient): -Xmx8G -Xms8G.
    2. Modify the Yarn configuration and restart the Yarn service.
      • mapreduce.reduce.memory.mb: 8192 (Change the value based on the node configuration.)
      • yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
    3. Go to the Solr client installation directory/hbase-indexer/ and run the following command to modify the HBaseIndexer configuration file:

      vi /opt/client/Solr/hbase-indexer/conf/hbase-indexer-site.xml

      Set the following parameters in the hbase-indexer-site.xml file. (If some parameters do not exist, manually add them.)

      Table 1 Modifying parameters in hbase-indexer-site.xml

      Parameter

      Value

      solr.record.writer.batch.size

      500

      solr.record.writer.max.queues.size

      300

      solr.record.writer.num.threads

      5

      solr.record.writer.maxSegments

      5

      vi /opt/client/Solr/hbase-indexer/conf/yarn-site.xml

      Set the following parameters in the yarn-site.xml file. (If some parameters do not exist, manually add them.)

      Table 2 Modifying parameters in yarn-site.xml

      Parameter

      Value

      mapreduce.map.speculative

      false

      mapreduce.reduce.speculative

      false