Updated on 2024-11-29 GMT+08:00

Solr Overview

Solr is an independent enterprise-level application search server based on Apache Lucene. Solr provides more diversified query languages than Lucene. It is configurable and scalable and has query performance optimized. Solr provides a GUI with comprehensive management functions.

The common Solr service operation process is as follows:

  1. Update the default config set file and upload it to ZooKeeper.

    Solr requires a specified config set to create a collection. The set includes two configuration files: solrconfig.xml (used for defining the Solr processing program and some extension programs) and managed-schema (used for defining index fields and field types). You can obtain the default Solr config set confWithSchema first, modify the configuration files in it, and then upload it.

    For details about the operation commands, see Shell Client Operation Commands.

  2. Create a collection.

    Specify a config set to create a collection based on service requirements.

    The method of creating collection data is described in Shell Client Operation Commands, Operations on the Solr Admin UI, Curl Commands in Linux, and REST Messages Sent in URLs Through Browsers. Referring to Shell Client Operation Commands is recommended.

  3. Query the collection status.

    Log in to Manager choose Cluster > Name of the desired cluster > Service > Solr, and check that all Solr instances are working properly.

    Click SolrServerAdmin (select one from the two options) on the Solr Web UI to go to the Solr Admin page.

    On the Solr Admin page, choose Cloud > Graph to view the collection status.

  4. Import data and create a collection.

    Collection creation can be classified into the following scenarios based on service requirements:

    • Solr over HBase: Solr collection data on HBase. For details, see Solr over HBase.
    • Solr over HDFS: Solr collection data in HDFS. For details, see Solr over HDFS.
    • Solrj: Create indexes through client development.
  5. Query collection data.

    For details about how to query collection data, see Operations on the Solr Admin UI.

  6. Delete collection data.

    The method of deleting index data is described in Shell Client Operation Commands, Operations on the Solr Admin UI, Curl Commands in Linux, and REST Messages Sent in URLs Through Browsers. Referring to Shell Client Operation Commands is recommended.

  7. Delete a collection.

    The method of deleting a collection is described in Shell Client Operation Commands, Operations on the Solr Admin UI, Curl Commands in Linux, and REST Messages Sent in URLs Through Browsers. You are advised to refer to Shell Client Operation Commands.

  • It is recommended that the total number of Solr Cores in Solr do not exceed 40,000. If the number of Solr Cores is large and the Solr service needs to be restarted, you are advised to restart SolrServer instances in batches to reduce the load on ZooKeeper.
  • Ensure that all Solr instances are working properly when creating a collection. Otherwise, some shards may fail to be created, causing collection creation failure.
  • Ensure that all Solr instances are in normal state when deleting a collection. Otherwise, data cannot be completely deleted, and the system displays a message indicating that garbage data exists after the Solr service is restarted. If garbage data is generated due to misoperations, record the name of the core on which the garbage data is generated, go to the core directory storing the garbage data in the background, delete the folders containing the core name, and restart the Solr service. After the instance is normal, it also can delete the corresponding collection name again and clear the garbage data.
  • In security mode, the login user must have related permission to operate collections. Without read/write permissions on the collection, the user cannot operate the collection.
  • Solr adopts the UTC time zone internally. If the time zones of the core and collection created on the client are inconsistent with the time zone in Solr, time difference exists in the core start time displayed on the Solr Admin page. Services will not be affected.
  • The created collections use memory resources. If the memory usage of Solr instances exceeds 90% of the -Xmx value set in the jvm configuration, the Solr cluster generates an alarm indicating that memory usage exceeds the threshold.