Thesaurus Dictionary

Updated on 2024-10-14 GMT+08:00

View PDF

A Thesaurus dictionary (sometimes abbreviated as TZ) is a collection of relationships between words and phrases, such as broader terms (BT), narrower terms (NT), preferred terms, non-preferred terms, and related terms. Based on definitions in the dictionary file, a TZ replaces all non-preferred terms by one preferred term and, optionally, preserves the original terms for indexing as well. A TZ is an extension of a Synonym dictionary with added phrase support.

Precautions

A TZ has the capability to recognize phrases and therefore it must remember its state and interact with the parser to determine whether to handle the next token or stop accumulation. A TZ must be configured carefully. For example, if an AZ is configured to handle only asciiword tokens, a TZ definition like one 7 will not work because the token type uint is not assigned to the TZ.
TZs are used during indexing, so any change in the TZ's parameters requires reindexing. For most other dictionary types, small changes such as adding or removing stop words does not force reindexing.

Procedure

Create a TZ named thesaurus_astro.

thesaurus_astro is a simple astronomical TZ that defines two astronomical word combinations (word+synonym).

       
          supernovae stars : sn 
crab nebulae : crab

Run the following statement to create the TZ:

      
         openGauss=# CREATE TEXT SEARCH DICTIONARY thesaurus_astro (
    TEMPLATE = thesaurus,
    DictFile = thesaurus_astro,
    Dictionary = pg_catalog.english_stem,
    FILEPATH = 'file:///home/dicts/'
);

The full name of the TZ file is thesaurus_astro.ths, and the TZ is stored in the Connected CN/home/dicts/ directory. pg_catalog.english_stem is the subdictionary (a Snowball English stemmer) used for input normalization. The subdictionary has its own configuration (for example, stop words), which is not shown here. For details about the syntax and parameters for creating a Thesaurus dictionary, see CREATE TEXT SEARCH DICTIONARY.

Bind the TZ to the desired token types in the text search configuration.

      
         openGauss=# ALTER TEXT SEARCH CONFIGURATION russian
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
    WITH thesaurus_astro, english_stem;

Use the TZ.

Test the TZ.

The ts_lexize function is not very useful for testing the TZ because the function processes its input as a single token. Instead, you can use the plainto_tsquery, to_tsvector, or to_tsquery function which will break their input strings into multiple tokens.

         
          
            
            openGauss=# SELECT plainto_tsquery('russian','supernova star');
 plainto_tsquery 
-----------------
 'sn'
(1 row)

openGauss=# SELECT to_tsvector('russian','supernova star');
 to_tsvector 
-------------
 'sn':1
(1 row)

openGauss=# SELECT to_tsquery('russian','''supernova star''');
 to_tsquery 
------------
 'sn'
(1 row)

           

         
        

supernova star matches supernovae stars in thesaurus_astro because the english_stem stemmer is specified in the thesaurus_astro definition. The stemmer removed e and s.

To index the original phrase, include it in the right-hand part of the definition.

        
           supernovae stars : sn supernovae stars

openGauss=# ALTER TEXT SEARCH DICTIONARY thesaurus_astro (
    DictFile = thesaurus_astro,
    FILEPATH = 'file:///home/dicts/');

openGauss=# SELECT plainto_tsquery('russian','supernova star');
       plainto_tsquery       
-----------------------------
 'sn' & 'supernova' & 'star'
(1 row)

Parent topic: Dictionaries

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot