Word Filter Customization
Scenario
Solr provides multiple default word filters shown as follows:
- StopFilterFactory: filters words based on the stop word list. The default stop word list (stopwords.txt) is added to the configuration set.
- LowerCaseFilterFactory: converts words with uppercase letters to those with lowercase letters.
- SynonymFilterFactory: converts words based on the synonym list. The default synonym list (synonyms.txt) is added to the configuration set.
- UpperCaseFilterFactory: converts words with lowercase letters to those with uppercase letters.
In some scenarios, you need to customize a word filter. This section describes how to customize a word filter and how to specify the customized word filter during query.
Prerequisites
You have installed the Solr service.
Procedure
- Customize a word filter.
- Inherit abstract class TokenFilterFactory.
Create a factory class and rewrite the create function to return a new filter.
public abstract TokenStream create(TokenStream input);
Example:
@Override public LowerCaseFilter create(TokenStream input) { return new LowerCaseFilter(input); }
- Inherit abstract class TokenFilter.
Create a filter class and rewrite the incrementToken function to convert the word.
public final boolean incrementToken() throws IOException;
Example:
@Override public final boolean incrementToken() throws IOException { if (input.incrementToken()) { charUtils.toLowerCase(termAtt.buffer(), 0, termAtt.length()); return true; } else return false; }
- Configure the customized word filter.
- Compile and compress the customized word filter to a JAR file.
- Copy the JAR file to solr/WEB-INF/lib in the installation directory of each Solr instance in the cluster.
For example, the directory of the SolrServerAdmin instance is Software installation path/FusionInsight_HD_8.1.0.1/install/FusionInsight-Solr-8.11.2/cluster/SolrServerAdmin/apache-tomcat-xxx/webapps/solr/WEB-INF/lib.
- Inherit abstract class TokenFilterFactory.
- Specify a word filter during query.
The word filter can be dynamically specified during query. For example, if you query name:Solr*, the content whose name begins with Solr is displayed. If you want to query the content whose name begins with the lowercase solr, set the query condition as follows:
q=name:Solr*&name.filter=solr.LowerCaseFilterFactory
Based on the proceeding query operation, LowerCaseFilterFactory can be dynamically specified.
If you want to specify the customized word filter (com.test.ExampleFilterFactory), set the query condition as follows:
q=name:Solr*&name.filter=solr.ExampleFilterFactory
- If multiple word filters are specified for one field name, they will overlap each other. The last word filter being specified is used.
- In one query statement, multiple word filters can be specified simultaneously for different field names.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot