Updated on 2024-11-29 GMT+08:00

Word Filter Customization

Scenario

Solr provides multiple default word filters shown as follows:

  • StopFilterFactory: filters words based on the stop word list. The default stop word list (stopwords.txt) is added to the configuration set.
  • LowerCaseFilterFactory: converts words with uppercase letters to those with lowercase letters.
  • SynonymFilterFactory: converts words based on the synonym list. The default synonym list (synonyms.txt) is added to the configuration set.
  • UpperCaseFilterFactory: converts words with lowercase letters to those with uppercase letters.

In some scenarios, you need to customize a word filter. This section describes how to customize a word filter and how to specify the customized word filter during query.

Prerequisites

You have installed the Solr service.

Procedure

  1. Customize a word filter.

    1. Inherit abstract class TokenFilterFactory.

      Create a factory class and rewrite the create function to return a new filter.

      public abstract TokenStream create(TokenStream input);

      Example:

      @Override
      public LowerCaseFilter create(TokenStream input) {
      return new LowerCaseFilter(input);
      }
    2. Inherit abstract class TokenFilter.

      Create a filter class and rewrite the incrementToken function to convert the word.

      public final boolean incrementToken() throws IOException;

      Example:

      @Override
      public final boolean incrementToken() throws IOException {
      if (input.incrementToken()) {
      charUtils.toLowerCase(termAtt.buffer(), 0, termAtt.length());
      return true;
      } else
      return false;
      }
    3. Configure the customized word filter.
      • Compile and compress the customized word filter to a JAR file.
      • Copy the JAR file to solr/WEB-INF/lib in the installation directory of each Solr instance in the cluster.

        For example, the directory of the SolrServerAdmin instance is Software installation path/FusionInsight_HD_8.1.0.1/install/FusionInsight-Solr-8.11.2/cluster/SolrServerAdmin/apache-tomcat-xxx/webapps/solr/WEB-INF/lib.

  2. Specify a word filter during query.

    The word filter can be dynamically specified during query. For example, if you query name:Solr*, the content whose name begins with Solr is displayed. If you want to query the content whose name begins with the lowercase solr, set the query condition as follows:

    q=name:Solr*&name.filter=solr.LowerCaseFilterFactory

    Based on the proceeding query operation, LowerCaseFilterFactory can be dynamically specified.

    If you want to specify the customized word filter (com.test.ExampleFilterFactory), set the query condition as follows:

    q=name:Solr*&name.filter=solr.ExampleFilterFactory

    • If multiple word filters are specified for one field name, they will overlap each other. The last word filter being specified is used.
    • In one query statement, multiple word filters can be specified simultaneously for different field names.