Optimization Suggestions on Solr over HBase

Scenario

Optimize the environment configuration as an MRS cluster administrator when using Solr over HBase.

Prerequisites

HDFS, Solr, Yarn, and HBase have been installed.

Procedure

When using Solr over HBase, you can optimize the configuration from the following aspects:

Suggestions on operating systems
If the Solr collection is stored on HDFS, configure the Solr collection by referring to Optimization Suggestions on Solr over HDFS.

Suggestions on real-time collections
1. Modify the <autoSoftCommit> configuration item in the solrconfig.xml configuration file of the collection configuration set. Set the configuration item to a larger value based on the application scenario. A larger value indicates higher index efficiency.
2. Modify the HBase service configuration and restart the HBase service.
  - replication.source.nb.capacity: 5,000 (Maximum number of entries sent by the HBase cluster to HBaseIndexer each time. The recommended value is 5,000. The value can be adjusted based on the cluster scale and can be increased based on the HBaseIndexer deployment.)
  - replication.source.size.capacity: 4,194,304 (Maximum size of the entry packet sent by HBase to HBaseIndexer each time. It is not recommended that the size be too large.)
3. Modify the HDFS service configuration and restart the HDFS service.
  - hadoop.rpc.protection: authentication (Data transmission encryption is disabled. The default value is privacy in security mode and authentication in common mode.)
  - ipc.server.handler.queue.size: indicates the number of calls that can be processed by each handler in a queue. Set this parameter based on the cluster environment.
  - dfs.namenode.handler.count: indicates the number of server threads on NameNode. Set this parameter based on the cluster environment.
  - dfs.namenode.service.handler.count: indicates the number of server threads on NameNode. Set this parameter based on the cluster environment.
  - dfs.datanode.handler.count: indicates the number of service threads on DataNode. Set this parameter based on the cluster environment.
4. Modify the HBaseIndexer service configuration and restart the HBaseIndexer instance.
  - hbaseindexer.indexer.threads: 50 (The default value is 20, indicating the number of concurrent threads started when the HBaseIndexer instance performs collection operations.)
  - Change the value of GC_OPTS to 4 GB. If the memory space is sufficient, you can increase the value.

Suggestions on batch collections and incremental collections

Modify the GC parameter configuration of SolrServer (you can increase the value if the memory is sufficient): -Xmx8G -Xms8G.
Modify the Yarn configuration and restart the Yarn service.
- mapreduce.reduce.memory.mb: 8192 (Change the value based on the node configuration.)
- yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

Go to the Solr client installation directory/hbase-indexer/ and run the following command to modify the HBaseIndexer configuration file:

vi /opt/client/Solr/hbase-indexer/conf/hbase-indexer-site.xml

Set the following parameters in the hbase-indexer-site.xml file. (If some parameters do not exist, manually add them.)

**Table 1** Modifying parameters in hbase-indexer-site.xml
Parameter	Value
solr.record.writer.batch.size	500
solr.record.writer.max.queues.size	300
solr.record.writer.num.threads	5
solr.record.writer.maxSegments	5

vi /opt/client/Solr/hbase-indexer/conf/yarn-site.xml

Set the following parameters in the yarn-site.xml file. (If some parameters do not exist, manually add them.)

**Table 2** Modifying parameters in yarn-site.xml
Parameter	Value
mapreduce.map.speculative	false
mapreduce.reduce.speculative	false

Parent topic: Solr Performance Tuning

Previous topic: Solr Public Read/Write Optimization Suggestions

Next topic: Optimization Suggestions on Solr over HDFS

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot