Poisson Analyzer Usage Guide

Poisson Analyzer provides two analyzers poisson_index and poisson_search, two tokenizers poisson_index and poisson_search, and one synonym filter poisson_synonyms.

You can directly use the two built-in analyzers without extra configurations. The two analyzers are configured as follows:

"poisson_index": 
{
    "tokenizer": "poisson_index",
    "filter": [
        "stemmer",
        "poisson_synonyms",
        "stop"
    ]
}

If you need to customize the configuration, for example, by uploading your own dictionaries, then you need to specify tokenizer and filter in settings. The following provides an example:

{
    "settings": {
        "index": {
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "analysis": {
                "filter": {
                    "my_filter": {
                        "type": "poisson_synonyms",
                        "poisson_synonyms_dict": [
"word-1, word-2 => word-3",
"word-4, word-5",
"word-6, word-7, word-8"
                        ],
                        "poisson_synonyms_dict_paths": "synonyms.txt"
                    }
                },
                "tokenizer": {
                    "my_tokenizer": {
                        "type": "poisson_index",
                        "poisson_dict": [
"keyword-1",
"keyword-2"
"keyword-3"
                        ],
                        "poisson_dict_paths": "main_dict1.txt,main_dict2.txt",
                        "poisson_stopword_dict_paths": "stopword.txt",
                        "ignore": true
                    }
                },
                "analyzer": {
                    "my_analyzer": {
                        "filter": [
                            "my_filter"
                        ],
                        "tokenizer": "my_tokenizer"
                    }
                }
            }
        }
    }
}
Table 1 Parameter description

Type

Parameter Name

Parameter Type

Value

Description

filter

poisson_synonyms_dict

Array of poisson_synonyms_dict objects

["word-1, word-2 => word-3","word-4, word-5", "word-6, word-7, word-8"]

Synonym dictionary.

Configuration rules of this parameter are as follows:

  • If you configure word-4 and word-5 as synonyms, set this parameter to word-4, word-5.
  • If you want to replace one or two words with another word, then configure this parameter similar to the following:

    word-1, word-2 => word-3

    In this case, words word-1 and word-2 will be replaced with word-3.

poisson_synonyms_dict_paths

String

synonyms.txt

Synonym dictionary path. Multiple paths are separated by commas (,).

Configuration rules of this parameter are as follows:

  • The format of the file is UTF-8 without BOM.
  • Synonyms are separated by lines.

tokenizer

poisson_dict

Array of poisson_dict objects

["keyword-1","keyword-2", "keyword-3"]

Custom word segmentation dictionary.

To change the coarse-grained word segmentation mode to fine-grained, add <= following the target word. The following is an example:

Mobile navigation<=

poisson_dict_paths

String

main_dict1.txt,main_dict2.txt

Custom dictionary path. Multiple paths are separated by commas (,).

Configuration rules of this parameter are as follows:

  • The format of the file is UTF-8 without BOM.
  • Custom words are separated by lines.

poisson_stopword_dict_paths

String

stopword.txt

Custom stop word dictionary path. Multiple paths are separated by commas (,).

ignore

String

true or false

Whether to ignore case sensitivity for custom stop words. Value true indicates that case sensitivity is ignored, and value false indicates that case sensitivity is not ignored.