Gathering Document Statistics
The function ts_stat is useful for checking your configuration and for finding stop-word candidates.
1 2 3 |
ts_stat(sqlquery text, [ weights text, ] OUT word text, OUT ndoc integer, OUT nentry integer) returns setof record |
sqlquery is a text value containing an SQL query which must return a single tsvector column. ts_stat executes the query and returns statistics about each distinct lexeme (word) contained in the tsvector data. The columns returned are
- word text: the value of a lexeme
- ndoc integer: number of documents (tsvectors) the word occurred in
- nentry integer: total number of occurrences of the word
If weights are supplied, only occurrences having one of those weights are counted. For example, to find the ten most frequent words in a document collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT * FROM ts_stat('SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk < 10') ORDER BY nentry DESC, ndoc DESC, word LIMIT 10;; word | ndoc | nentry ------+------+-------- 32 | 2 | 2 33 | 2 | 2 1 | 1 | 1 10 | 1 | 1 13 | 1 | 1 14 | 1 | 1 15 | 1 | 1 17 | 1 | 1 20 | 1 | 1 22 | 1 | 1 (10 rows) |
The same, but counting only word occurrences with weight A or B:
1 2 3 4 |
SELECT * FROM ts_stat('SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk < 10', 'a') ORDER BY nentry DESC, ndoc DESC, word LIMIT 10; word | ndoc | nentry ------+------+-------- (0 rows) |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.