Cache Table Usage Guide
Question
What is the function of a cache table? What should I keep in mind when deciding to cache a table?
Answer
Spark SQL supports caching tables in memory and can store cached data in a compressed format, which helps reduce memory pressure. Once a table is cached, subsequent queries can retrieve data directly from memory rather than reading from disk. This significantly reduces I/O overhead.
However, it's important to note that cached tables consume executor memory. Although Spark SQL leverages compressed storage to mitigate memory overhead and reduce garbage collection (GC) pressure, caching large tables or caching many tables simultaneously can still impact executor stability.
A best practice is to uncache tables once they are no longer needed for query acceleration, thereby freeing up executor memory. You can run the uncache table table_name command to uncache a table.

You can also check cached tables on the storage page of the Spark Driver UI.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot