Updated on 2023-01-11 GMT+08:00
Spark Streaming Task Issues
Symptom
- A message is displayed indicating that the class cannot be found when the Kafka is connected.
- An authentication error is reported when Kafka with Kerberos is connected.
- After the SparkStreaming task runs for a period, a message is displayed indicating that the token has expired.
Cause Analysis
- Symptom 1: By default, the Kafka JAR package is not loaded when Spark submits tasks. Therefore, --jars needs to be added to the startup command to specify the JAR package corresponding to the Kafka version.
- Symptom 2: The Spark authentication information cannot be used for connecting to Kafka. JVM parameters required for authentication must be set.
- Symptom 3: By default, Spark uses the authentication information of the current client to submit tasks. The code login mode can also be used. However, neither of the two authentication modes can update the token used by the task. When the token information generated during submission expires, the token cannot be used. As a result, an error is reported.
Procedure
- Symptom 1: Add --jars to the startup command to specify the JAR package corresponding to the Kafka version. Generally, the JAR package is in the Spark client directory /jars/streamingClient (Kafka 0.8) or /jars/streamingClient010 (Kafka 0.10).
- Symptom 2: Compile and run the application. For details, see Compiling and Running Applications.
- Symptom 3: Use --keytab and --principal to add the keytab file and the corresponding user to the task. If the keytab file is the same as that configured in jaas.conf of Kafka, Spark reports an error indicating that a file is uploaded for multiple times. The solution is to copy a keytab file so that different files are uploaded by --files and --keytab.
Parent topic: Using Spark
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.
The system is busy. Please try again later.