Updated on 2024-10-11 GMT+08:00

Spark Task Execution Failure

Symptom

  • An executor out of memory (OOM) error occurs.
  • The information about the failed task shows that the failure cause is "lost task xxx."

Cause Analysis

  • Symptom 1: The data volume is too large or too many tasks are running on the same executor at the same time.
  • Symptom 2: Some tasks fail to be executed. When the error is reported, determine the node where the lost task is running. Generally, the error is caused by the abnormal exit of the lost task.

Procedure

  • Symptom 1:
    • If the data volume is too large, adjust the memory size of the executor and use --executor-memory to specify the memory size.
    • If too many tasks are running at the same time, check the number of vcores specified by --executor-cores.
  • Symptom 2: Locate the cause in the corresponding task log. If an OOM error occurs, see the solutions to symptom 1.