搜索表
本章节主要介绍如何使用文本搜索运算符搜索数据库表。
- 一个简单查询:将body字段中包含science的每一行打印出来。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
DROP SCHEMA IF EXISTS tsearch CASCADE; CREATE SCHEMA tsearch; CREATE TABLE tsearch.pgweb(id int, body text, title text, last_mod_date date); INSERT INTO tsearch.pgweb VALUES(1, 'Philology is the study of words, especially the history and development of the words in a particular language or group of languages.', 'Philology', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(2, 'Mathematics is the science that deals with the logic of shape, quantity and arrangement.', 'Mathematics', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(3, 'Computer science is the study of processes that interact with data and that can be represented as data in the form of programs.', 'Computer science', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(4, 'Chemistry is the scientific discipline involved with elements and compounds composed of atoms, molecules and ions.', 'Chemistry', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(5, 'Geography is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets.', 'Geography', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(6, 'History is a subject studied in schools, colleges, and universities that deals with events that have happened in the past.', 'History', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(7, 'Medical science is the science of dealing with the maintenance of health and the prevention and treatment of disease.', 'Medical science', '2010-1-1'); INSERT INTO tsearch.pgweb VALUES(8, 'Physics is one of the most fundamental scientific disciplines, and its main goal is to understand how the universe behaves.', 'Physics', '2010-1-1'); SELECT id, body, title FROM tsearch.pgweb WHERE to_tsvector('english', body) @@ to_tsquery('english', 'science'); id | body | title ----+-------------------------------------------------------------------------------------------------------------------------+--------- 2 | Mathematics is the science that deals with the logic of shape, quantity and arrangement. | Mathematics 3 | Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. | Computer science 5 | Geography is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets. | Geography 7 | Medical science is the science of dealing with the maintenance of health and the prevention and treatment of disease. | Medical science (4 rows)
像science这样的相关词都会被找到,因为这些词都被处理成了相同标准的词条。
上面的查询指定english配置来解析和规范化字符串。也可以省略此配置,通过default_text_search_config进行配置设置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SHOW default_text_search_config; default_text_search_config ---------------------------- pg_catalog.english (1 row) SELECT id, body, title FROM tsearch.pgweb WHERE to_tsvector(body) @@ to_tsquery('science'); id | body | title ----+-------------------------------------------------------------------------------------------------------------------------+--------- 2 | Mathematics is the science that deals with the logic of shape, quantity and arrangement. | Mathematics 3 | Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. | Computer science 5 | Geography is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets. | Geography 7 | Medical science is the science of dealing with the maintenance of health and the prevention and treatment of disease. | Medical science (4 rows)
- 一个复杂查询:检索出在title或者body字段中包含treatment和science的最近10篇文档:
1 2 3 4 5
SELECT title FROM tsearch.pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('treatment & science') ORDER BY last_mod_date DESC LIMIT 10; title -------- Medical science (1 rows)
为了清晰,举例中没有调用coalesce函数在两个字段中查找包含NULL的行。
以上例子均在没有索引的情况下进行查询。对于大多数应用程序来说,这个方法很慢。因此除了偶尔的特定搜索,文本搜索在实际使用中通常需要创建索引。