Gathering Document Statistics

The function ts_stat is useful for checking your configuration and for finding stop-word candidates.

     ts_stat(sqlquery text, [ weights text, ]
        OUT word text, OUT ndoc integer,
        OUT nentry integer) returns setof record

sqlquery is a text value containing an SQL query which must return a single tsvector column. ts_stat executes the query and returns statistics about each distinct lexeme (word) contained in the tsvector data. The columns returned are:

word text: the value of a lexeme.
ndoc integer: number of documents (tsvectors) the word occurred in.
nentry integer: total number of occurrences of the word.

If weights are supplied, only occurrences having one of those weights are counted. For example, to find the ten most frequent words in a document collection:

     SELECT * FROM ts_stat('SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk < 10') ORDER BY nentry DESC, ndoc DESC, word LIMIT 10;;
   word | ndoc | nentry 
------+------+--------
 |    2 |      2
 |    2 |      2
  |    1 |      1
 |    1 |      1
 |    1 |      1
 |    1 |      1
 |    1 |      1
 |    1 |      1
 |    1 |      1
 |    1 |      1
(10 rows)
 
 
  

The same, but counting only word occurrences with weight A or B:

     SELECT * FROM ts_stat('SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk < 10', 'a') ORDER BY nentry DESC, ndoc DESC, word LIMIT 10;
 word | ndoc | nentry 
------+------+--------
(0 rows)

Parent Topic: Additional Features

Previous topic: Rewriting Queries

Next topic: Parsers

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.