Introduction to Word Dictionaries

Poisson Word Splitter

Poisson Analyzer is a plug-in that is developed by Poisson Lab based on the open-source Jieba word splitter of the Java version and provides both word splitting and synonyms extension functions. Compared with Jieba, this plug-in optimizes the multi-granularity word splitting function and enhances non-login word identification and word splitting for Chinese-English hybrid phrases. In addition, the highlighting and phrase query functions are available for synonyms.

Poisson Analyzer has the following features:

  • Supports multi-granularity word splitting.
  • Supports synonym extension.
  • Supports custom word splitting and synonym configurations in settings.
  • Supports dictionary settings based on the index level in word dictionary management. Indexes can share dictionaries or can be configured with dictionary separately. Dictionary files support hot updates.

IK Word Splitter

IK Analyzer is an open-source, lightweight Chinese word splitting tool developed based on Java. By default, it has a built-in common Chinese word library and two analyzers ik_max_word (fine-grained word splitting) and ik_smart (coarse-grained word splitting). IK Analyzer is a globally shared word splitter.

Differences Between Poisson Analyzer and IK Analyzer

  • Poisson Analyzer supports multi-grained word splitting, while IK Analyzer only supports coarse-grained word splitting (ik_smart) and fine-grained word splitting (ik_max_word).
  • Poisson Analyzer supports synonym extension, while IK Analyzer does not.
  • Poisson Analyzer supports identification of new words and word splitting for Chinese-English hybrid phrases, while IK Analyzer does not.
  • Poisson Analyzer supports index-level custom word dictionaries. Multiple indices can share a word dictionary or be separately configured with a word dictionary. In addition, word dictionaries support hot update. In IK Analyzer, the word dictionary is shared by all indices.