GIN Tips and Tricks
Create vs. Insert
Insertion into a GIN index can be slow due to the likelihood of many keys being inserted for each item. So, for bulk insertions into a table it is advisable to drop the GIN index and recreate it after finishing bulk insertion. GUC parameters related to GIN index creation and query performance as follows:
- maintenance_work_mem
Build time for a GIN index is very sensitive to the maintenance_work_mem setting;
- work_mem
During a series of insertions into an existing GIN index that has fastupdate enabled, the system will clean up the pending-entry list whenever the list grows larger than work_mem. To avoid fluctuations in observed response time, it is desirable to have pending-list cleanup occur in the background (that is, via autovacuum). Foreground cleanup operations can be avoided by increasing work_mem or making autovacuum more aggressive. However, increasing work_mem means that if a foreground cleanup occurs, it will take even longer.
- gin_fuzzy_search_limit
The primary goal of developing GIN indexes was to support highly scalable full-text search in GaussDB. A full-text search often returns a very large set of results. This often happens when the query contains very frequent words, so that the large result set is not even useful. Since reading many tuples from the disk and sorting them could take a lot of time, this is unacceptable for production.
To facilitate controlled execution of such queries, GIN has a configurable soft upper limit on the number of rows returned: the gin_fuzzy_search_limit configuration parameter. The default value 0 indicates that there is no limit on the returned set. If a non-zero limit is set, then the returned set is a subset of the whole result set, chosen at random.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot