Optimization Suggestions on Solr over HBase
Scenario
Optimize the environment configuration as an MRS cluster administrator when using Solr over HBase.
Prerequisites
HDFS, Solr, Yarn, and HBase have been installed.
Procedure
When using Solr over HBase, you can optimize the configuration from the following aspects:
- Suggestions on operating systems
If the Solr collection is stored on HDFS, configure the Solr collection by referring to Optimization Suggestions on Solr over HDFS.
- Suggestions on real-time collections
- Modify the <autoSoftCommit> configuration item in the solrconfig.xml configuration file of the collection configuration set. Set the configuration item to a larger value based on the application scenario. A larger value indicates higher index efficiency.
- Modify the HBase service configuration and restart the HBase service.
- replication.source.nb.capacity: 5,000 (Maximum number of entries sent by the HBase cluster to HBaseIndexer each time. The recommended value is 5,000. The value can be adjusted based on the cluster scale and can be increased based on the HBaseIndexer deployment.)
- replication.source.size.capacity: 4,194,304 (Maximum size of the entry packet sent by HBase to HBaseIndexer each time. It is not recommended that the size be too large.)
- Modify the HDFS service configuration and restart the HDFS service.
- hadoop.rpc.protection: authentication (Data transmission encryption is disabled. The default value is privacy in security mode and authentication in common mode.)
- ipc.server.handler.queue.size: indicates the number of calls that can be processed by each handler in a queue. Set this parameter based on the cluster environment.
- dfs.namenode.handler.count: indicates the number of server threads on NameNode. Set this parameter based on the cluster environment.
- dfs.namenode.service.handler.count: indicates the number of server threads on NameNode. Set this parameter based on the cluster environment.
- dfs.datanode.handler.count: indicates the number of service threads on DataNode. Set this parameter based on the cluster environment.
- Modify the HBaseIndexer service configuration and restart the HBaseIndexer instance.
- hbaseindexer.indexer.threads: 50 (The default value is 20, indicating the number of concurrent threads started when the HBaseIndexer instance performs collection operations.)
- Change the value of GC_OPTS to 4 GB. If the memory space is sufficient, you can increase the value.
- Suggestions on batch collections and incremental collections
- Modify the GC parameter configuration of SolrServer (you can increase the value if the memory is sufficient): -Xmx8G -Xms8G.
- Modify the Yarn configuration and restart the Yarn service.
- mapreduce.reduce.memory.mb: 8192 (Change the value based on the node configuration.)
- yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
- Go to the Solr client installation directory/hbase-indexer/ and run the following command to modify the HBaseIndexer configuration file:
vi /opt/client/Solr/hbase-indexer/conf/hbase-indexer-site.xml
Set the following parameters in the hbase-indexer-site.xml file. (If some parameters do not exist, manually add them.)
Table 1 Modifying parameters in hbase-indexer-site.xml Parameter
Value
solr.record.writer.batch.size
500
solr.record.writer.max.queues.size
300
solr.record.writer.num.threads
5
solr.record.writer.maxSegments
5
vi /opt/client/Solr/hbase-indexer/conf/yarn-site.xml
Set the following parameters in the yarn-site.xml file. (If some parameters do not exist, manually add them.)
Table 2 Modifying parameters in yarn-site.xml Parameter
Value
mapreduce.map.speculative
false
mapreduce.reduce.speculative
false
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot