What Should I Do If a Hudi Job Stays in Booting Status for a Long Time and then Fails and the "Read timed out" Error Is Contained in the Log?
Symptom
The job log contains error message "Read timed out".
Troubleshooting
- Check whether the JDBCServer of the MRS cluster is in multi-instance or multi-tenant mode.
- Check whether the jobs of other tenants are normal.
- Create a script, select a direct connection, run a Spark SQL statement, and check whether a timeout error is reported (the database list may not be displayed). If a timeout error occurs, there is a high probability that the JDBCServer of the MRS cluster is faulty.
- If a single tenant cannot execute Spark SQL statements, queue resources may be insufficient. Start Yarn, search for the tenant queue, and check the Yarn task of Spark2x-JDBCServer2x. If the Yarn task cannot be found, or the task state is ACCEPTED, the yarn task cannot start due to insufficient resources. Open the schedule of Yarn and check the queue resources. Pay attention to the following parameters:
Used Resources: used memory and number of used CPU cores
Max Resources: maximum memory and maximum number of CPU cores available in the queue
Used Application Master Resources: used AM resources
Max Application Master Resources: maximum number of AM resources available in the queue
By comparison, you can determine which resource is insufficient and causes the Yarn task execution exception.
Solution
Add queue resources or stop other Yarn tasks to release resources.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot