Sensitive Word Filtering
Scenario
Sensitive words in query results are filtered.
- If distrib=false is configured when querying, the sensitive word filtering is not supported.
- Sensitive word filtering is used only for the String and Text fields.
Prerequisites
You have installed the Solr service.
Procedure
- Configure and update sensitive words.
Configuring sensitive words
- Download a config set template.
- Configure the sensitive word list excludeWords.txt.
- The maximum number of sensitive words is 10,000.
- For text fields, the matching of sensitive words depends on analyzers. If a text field contains any sensitive word, the word will be analyzed after being tokenized.
- If repeated sensitive words exist in the list, keep those of higher levels.
- When querying, filter the sensitive words whose sensitivity level is greater than or equal to the value of excludePriority.
vi ./test_exclude/conf/excludeWords.txt
# For example, sensitive word, sensitive level you,4 daisy,10 me,2 solr,7
- Configure the sensitive word filtering component in the request handler. In test_exclude/conf/solrconfig.xml, add the following configuration:
<requestHandler name="/exclude" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <bool name="preferLocalShards">false</bool> </lst> <arr name="components"> <str>query</str> <str>exclude_terms</str> <str>facet</str> <str>facet_module</str> <str>mlt</str> <str>highlight</str> <str>stats</str> <str>debug</str> <str>expand</str> </arr> </requestHandler>
- The configuration has been added to confWithSchema/conf/solrconfig.xml.
- The sensitive world filtering component is called exclude_terms.
- Create a config set.
- Create a collection and specify the exclude_conf config set.
solrctl collection --create exculde_coll -c exclude_conf -s 3 -r 1 -m 1
Updating sensitive words
- Download the exclude_conf config set for updating to the local update directory
solrctl confset --get exclude_conf ./update
- Update the sensitive word list in the update directory
vi ./update/conf/excludeWords.txt
- Update the config set to ZooKeeper.
- Reload the collection that uses the config set to make sensitive word changes take effect.
- When querying Solr Admin UI Usage Examples, enter /exclude in Request-Handler(qt) and enter excludePriority=2 in Raw Query Parameters.
In the result set shown in Figure 1, excludeResultNum indicates the number of filtered results.
- In Request-Handler (qt), enter /exclude. /exclude indicates the requestHandler name configured in 1.c.
- In Raw Query Parameters, enter useExclude=false to disable sensitive word filtering for query results.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot