Updated on 2024-10-09 GMT+08:00

Configuring Structured Streaming to Use RocksDB for State Store

This section applies only to MRS 3.3.0 or later.

Scenarios

If a large amount of state information is stored in the default HDFS BackedStateStore and JVM GC takes a long time, you can use the following method to select RocksDB as the state backend.

Parameters

Configure the following parameters in the spark-defaults.conf file of the Spark client.

Parameter

Description

Default Value

spark.sql.streaming.stateStore.providerClass

Class that manages state data for stateful stream queries. This class must be a subclass of StateStoreProvider and must have a zero argument constructor.

Set this parameter to org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider to select RocksDB as the state backend.

org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider