Updated on 2022-12-14 GMT+08:00

Failed to View Spark Task Logs

Symptom

  • A user fails to view logs when a task is running.
  • A user fails to view logs when a task is complete.

Cause Analysis

  • Symptom 1: The MapReduce component is abnormal.
  • Symptom 2:
    • The JobHistory service of Spark is abnormal.
    • The log size is too large, and NodeManager times out during log aggregation.
    • The permission on the HDFS log storage directory (/tmp/logs/Username/logs by default) is abnormal.
    • Logs have been deleted. By default, Spark JobHistory stores event logs for seven days (specified by spark.history.fs.cleaner.maxAge). MapReduce stores task logs for 15 days (specified by mapreduce.jobhistory.max-age-ms).
    • If the task cannot be found on the Yarn page, it may have been cleared by Yarn. By default, Yarn stores 10,000 historical tasks (specified by yarn.resourcemanager.max-completed-applications).

Procedure

  • Symptom 1: Check whether the MapReduce component is running properly. If it is abnormal, restart it. If the fault persists, check the JobhistoryServer log file in the background.
  • Symptom 2: Perform the following checks in sequence:
    1. Check whether JobHistory of Spark is running properly.
    2. On the app details page of Yarn, check whether the log file is too large. If log aggregation fails, the value of Log Aggregation Status should be Failed or Timeout.
    3. Check whether the permission on the corresponding directory is normal.
    4. Check whether the corresponding appid file exists in the directory. In MRS 3.x or later, the event log files are stored in the hdfs://hacluster/spark2xJobHistory2x directory. In versions earlier than MRS 3.x, the event log files are stored in the hdfs://hacluster/sparkJobHistory directory. The task run logs are stored in the hdfs://hacluster/tmp/logs/Username/logs directory.
    5. Check whether appid or the current job ID exceeds the maximum value in the historical records.