更新时间:2024-03-06 GMT+08:00
Synonym词典
Synonym词典用于定义、识别token的同义词并转化,不支持词组(词组形式的同义词可用Thesaurus词典定义,详细请参见Thesaurus词典)。
示例
- Synonym词典可用于解决语言学相关问题,例如,为避免使单词"Paris"变成"pari",可在Synonym词典文件中定义一行"Paris paris",并将该词典放置在预定义的english_stem词典之前。
认证用的AK和SK硬编码到代码中或者明文存储都有很大的安全风险,建议在配置文件或者环境变量中密文存放,使用时解密,确保安全。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+----------------+--------------+--------- asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | {pari} (1 row) CREATE TEXT SEARCH DICTIONARY my_synonym ( TEMPLATE = synonym, SYNONYMS = my_synonyms, FILEPATH = 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx' ); ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR asciiword WITH my_synonym, english_stem; SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} (1 row) SELECT * FROM ts_debug('english', 'paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} (1 row) ALTER TEXT SEARCH DICTIONARY my_synonym ( CASESENSITIVE=true); SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} (1 row) SELECT * FROM ts_debug('english', 'paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {pari} (1 row)
其中,同义词词典文件全名为my_synonyms.syn,所在目录为 'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx'。关于创建词典的语法和更多参数,请参见CREATE TEXT SEARCH DICTIONARY。
- 星号(*)可用于词典文件中的同义词结尾,表示该同义词是一个前缀。在to_tsvector()中该星号将被忽略,但在to_tsquery()中会匹配该前缀并对应输出结果(参照处理查询一节)。
假设词典文件synonym_sample.syn内容如下:
1 2 3 4 5
postgres pgsql postgresql pgsql postgre pgsql gogle googl indices index*
创建并使用词典:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
CREATE TEXT SEARCH DICTIONARY syn ( TEMPLATE = synonym, SYNONYMS = synonym_sample ); SELECT ts_lexize('syn','indices'); ts_lexize ----------- {index} (1 row) CREATE TEXT SEARCH CONFIGURATION tst (copy=simple); ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn; SELECT to_tsvector('tst','indices'); to_tsvector ------------- 'index':1 (1 row) SELECT to_tsquery('tst','indices'); to_tsquery ------------ 'index':* (1 row) SELECT 'indexes are very useful'::tsvector; tsvector --------------------------------- 'are' 'indexes' 'useful' 'very' (1 row) SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices'); ?column? ---------- t (1 row)
父主题: 词典