Worker Logs Are Empty After the Storm Topology Is Submitted
Symptom
After a topology is remotely submitted in Eclipse, the detailed information about the topology cannot be viewed on the Storm web UI, and the Worker node where Bolt and Spout of each topology are located keeps changing. The Worker log is empty.
Possible Causes
The Worker process fails to be started, triggering Nimbus to re-allocate tasks and start the Worker process on other Supervisors. The Worker process continues to restart. As a result, the Worker node keeps changing, and the Worker log is empty. The possible causes of the Worker process startup failure are as follows:
- The submitted JAR package contains the storm.yaml file.
Storm specifies that each classpath can contain only one storm.yaml file. If there is more than one storm.yaml file, an exception occurs. Use the Storm client to submit the topology. The classpath configuration of the client is different from the classpath configuration of Eclipse. The client automatically loads the JAR package of the user to classpath. As a result, two storm.yaml files exist in classpath.
- The initialization of the Worker process takes a long time, which exceeds the Worker startup timeout period set in the Storm cluster. As a result, the Worker process is killed and reallocated.
Troubleshooting Process
- Use the Storm client to submit the topology and check whether the storm.yaml file is duplicate.
- Repack the JAR file and submit the topology again.
- Modify the Worker startup timeout parameter in the Storm cluster.
Procedure
- If the Worker log is empty after the topology is remotely submitted using Eclipse, use the Storm client to submit the JAR package corresponding to the topology and view the prompt message.
For example, if the JAR package contains two storm.yaml files in different paths, the following information is displayed:
Exception in thread "main" java.lang.ExceptionInInitializerError at com.XXX.example.WordCountTopology.createConf(WordCountTopology.java:132) Caused by: java.lang.RuntimeException: Found multiple storm.yaml resources. You're probably bundling the Storm jars with your topology jar. [jar:file:/XXX/streaming-0.9.2/bin/stormDemo.jar!/storm.yaml, file:/XXX/Streaming/streaming-0.9.2/conf/storm.yaml] at backtype.storm.utils.Utils.findAndReadConfigFile(Utils.java:151)
- Compress the JAR package again. Ensure that the package does not contain the storm.yaml file and JAR packages related to log4j and slf4j-log4j.
- Use IntelliJ IDEA to remotely submit the new JAR package.
- Check whether the topology details and Worker logs can be viewed on the web UI.
- On MRS Manager, modify the Worker startup timeout parameter of the Storm cluster (for details about the parameter description, see Related Information). Save the modification, and restart the Storm service.
- MRS Manager: Log in to MRS Manager and choose Services > Storm > Configuration.
- FusionInsight Manager: Log in to FusionInsight Manager and choose Cluster > Services > Storm > Configurations.
- Submit the JAR package to be run again.
Related Information
- The nimbus.task.launch.secs and supervisor.worker.start.timeout.secs parameters indicate the topology startup timeout tolerance of the Nimbus and supervisor, respectively. Generally, the value of nimbus.task.launch.secs must be greater than or equal to that of supervisor.worker.start.timeout.secs. It is recommended that the value of nimbus.task.launch.secs be slightly greater or equal to that of supervisor.worker.start.timeout.secs. Otherwise, the task reallocation efficiency will be affected.
- nimbus.task.launch.secs: If the Nimbus does not receive the heartbeat message sent by the topology task within the period specified by this parameter, the Nimbus re-allocates the topology to another supervisor and updates the task information in ZooKeeper. The supervisor reads the task information in ZooKeeper and compares it with the topology started. If the topology does not belong to the supervisor, the supervisor deletes the metadata of the topology, that is, the /srv/Bigdata/streaming_data/stormdir/supervisor/stormdist/{worker-id} directory.
- supervisor.worker.start.timeout.secs: After the supervisor starts a worker, if no heartbeat message is received from the worker within the period specified by this parameter, the supervisor stops the worker and waits for worker rescheduling. Generally, the value of this parameter is increased when the service startup takes a long time to ensure that the worker can be started successfully.
If the value of supervisor.worker.start.timeout.secs is greater than that of nimbus.task.launch.secs, the worker is still started before the tolerance time of supervisor ends. However, the Nimbus considers that the service startup times out and allocates the service to another host. The background thread of the supervisor finds that the tasks are inconsistent and deletes the metadata of the topology. As a result, when the worker attempts to read stormconf.ser during startup, the file does not exist, and "FileNotFoundException" is thrown.
- The nimbus.task.timeout.secs and supervisor.worker.timeout.secs parameters indicate the timeout tolerance time for the Nimbus and supervisor to report heartbeat messages during topology running. Generally, the value of nimbus.task.timeout.secs must be slightly greater than or equal to that of supervisor.worker.timeout.secs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.