Updated on 2024-05-29 GMT+08:00

Configuring Structured Streaming to Use RocksDB for State Store

Scenario

If a large amount of state information is stored in the default HDFSBackedStateStore and JVM GC takes a long time, you can use the following method to select RocksDB as the state backend.

Parameters

Set the following parameters in the spark-defaults.conf file of the Spark client.

Parameter

Description

Default Value

spark.sql.streaming.stateStore.providerClass

Class that manages state data for quires require stateful streaming. This class must be a subclass of StateStoreProvider and must have a zero argument constructor.

Set this parameter to org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider to select RocksDB as the state backend.

org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider