Help Center/ Cloud Data Migration/ FAQs/ Troubleshooting/ Hudi Destination Case Library/ What Should I Do If a Hudi Job Stays in Booting Status for a Long Time and then Fails and the "Read timed out" Error Is Contained in the Log?
Updated on 2023-02-06 GMT+08:00

What Should I Do If a Hudi Job Stays in Booting Status for a Long Time and then Fails and the "Read timed out" Error Is Contained in the Log?

Symptom

The job log contains error message "Read timed out".

Troubleshooting

  1. Check whether the JDBCServer of the MRS cluster is in multi-instance or multi-tenant mode.
    • If it is in multi-instance mode, go to 3.
    • If it is in multi-tenant mode, go to 2.
  2. Check whether the jobs of other tenants are normal.
    • If Spark SQL fails to be executed for the jobs of all tenants, go to 3.
    • Otherwise, go to 4.
  3. Create a script, select a direct connection, run a Spark SQL statement, and check whether a timeout error is reported (the database list may not be displayed). If a timeout error occurs, there is a high probability that the JDBCServer of the MRS cluster is faulty.
  4. If a single tenant cannot execute Spark SQL statements, queue resources may be insufficient. Start Yarn, search for the tenant queue, and check the Yarn task of Spark2x-JDBCServer2x. If the Yarn task cannot be found, or the task state is ACCEPTED, the yarn task cannot start due to insufficient resources. Open the schedule of Yarn and check the queue resources. Pay attention to the following parameters:

    Used Resources: used memory and number of used CPU cores

    Max Resources: maximum memory and maximum number of CPU cores available in the queue

    Used Application Master Resources: used AM resources

    Max Application Master Resources: maximum number of AM resources available in the queue

    By comparison, you can determine which resource is insufficient and causes the Yarn task execution exception.

Solution

Add queue resources or stop other Yarn tasks to release resources.