Updated on 2025-08-22 GMT+08:00

Cache Table Usage Guide

Question

What is the function of a cache table? What should I keep in mind when deciding to cache a table?

Answer

Spark SQL supports caching tables in memory and can store cached data in a compressed format, which helps reduce memory pressure. Once a table is cached, subsequent queries can retrieve data directly from memory rather than reading from disk. This significantly reduces I/O overhead.

However, it's important to note that cached tables consume executor memory. Although Spark SQL leverages compressed storage to mitigate memory overhead and reduce garbage collection (GC) pressure, caching large tables or caching many tables simultaneously can still impact executor stability.

A best practice is to uncache tables once they are no longer needed for query acceleration, thereby freeing up executor memory. You can run the uncache table table_name command to uncache a table.

You can also check cached tables on the storage page of the Spark Driver UI.