Updated on 2024-11-29 GMT+08:00

Sensitive Word Filtering

Scenario

Sensitive words in query results are filtered.

  • If distrib=false is configured when querying, the sensitive word filtering is not supported.
  • Sensitive word filtering is used only for the String and Text fields.

Prerequisites

You have installed the Solr service.

Procedure

  1. Configure and update sensitive words.

    Configuring sensitive words
    1. Download a config set template.

      solrctl confset --get confWithSchema ./test_exclude

    2. Configure the sensitive word list excludeWords.txt.
      • The maximum number of sensitive words is 10,000.
      • For text fields, the matching of sensitive words depends on analyzers. If a text field contains any sensitive word, the word will be analyzed after being tokenized.
      • If repeated sensitive words exist in the list, keep those of higher levels.
      • When querying, filter the sensitive words whose sensitivity level is greater than or equal to the value of excludePriority.

      vi ./test_exclude/conf/excludeWords.txt

      # For example, sensitive word, sensitive level
      you,4
      daisy,10
      me,2
      solr,7
    3. Configure the sensitive word filtering component in the request handler. In test_exclude/conf/solrconfig.xml, add the following configuration:
      <requestHandler name="/exclude" class="solr.SearchHandler">
           <lst name="defaults">
             <str name="echoParams">explicit</str>
             <int name="rows">10</int>
             <bool name="preferLocalShards">false</bool>
           </lst>
           
           <arr name="components">
             <str>query</str>
             <str>exclude_terms</str>
             <str>facet</str>
             <str>facet_module</str>
             <str>mlt</str>
             <str>highlight</str>
             <str>stats</str>
             <str>debug</str>
             <str>expand</str>
           </arr>
      </requestHandler>
      • The configuration has been added to confWithSchema/conf/solrconfig.xml.
      • The sensitive world filtering component is called exclude_terms.
    4. Create a config set.

      solrctl confset --create exclude_conf ./test_exclude

    5. Create a collection and specify the exclude_conf config set.

      solrctl collection --create exculde_coll -c exclude_conf -s 3 -r 1 -m 1

    Updating sensitive words

    1. Download the exclude_conf config set for updating to the local update directory

      solrctl confset --get exclude_conf ./update

    2. Update the sensitive word list in the update directory

      vi ./update/conf/excludeWords.txt

    3. Update the config set to ZooKeeper.

      solrctl confset --update exclude_conf ./update/

    4. Reload the collection that uses the config set to make sensitive word changes take effect.

      solrctl collection --reload exculde_coll

  2. When querying Solr Admin UI Usage Examples, enter /exclude in Request-Handler(qt) and enter excludePriority=2 in Raw Query Parameters.

    Figure 1 Solr Admin UI

    In the result set shown in Figure 1, excludeResultNum indicates the number of filtered results.

    • In Request-Handler (qt), enter /exclude. /exclude indicates the requestHandler name configured in 1.c.
    • In Raw Query Parameters, enter useExclude=false to disable sensitive word filtering for query results.