Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using MapReduce/ Configuring the Archiving and Clearing Mechanism for MapReduce Task Logs
Updated on 2024-12-11 GMT+08:00

Configuring the Archiving and Clearing Mechanism for MapReduce Task Logs

Scenario

Job and task logs are generated during execution of a MapReduce application.

  • Job logs are generated by the MRApplicationMaster, which record details about the start and running time of jobs and each task, Counter value, and other information. After being analyzed by HistoryServer, the job logs are used to view job execution details.
  • A task log records the log information generated by each task running in a container. By default, task logs are stored only on the local disk of each NodeManager. After the log aggregation function is enabled, the NodeManager merges local task logs and writes them into HDFS after job execution completes.

The job logs and task logs of the MapReduce are stored on HDFS (when the log aggregation function is enabled). If the mechanism for periodically archiving and deleting log files is not configured for a cluster with a large number of computation tasks, the log files will occupy large memory space of HDFS and increase the cluster load.

Log archive is implemented by Hadoop Archives. The number (number of Map tasks) of concurrent archiving tasks started by the Hadoop Archives is related to the total size of log files to be archived. The formula is as follows: Number of concurrent archive tasks = Total size of log files to be archived/Size of archive files.

Configuration

Go to the All Configurations page of the MapReduce service. For details, see Modifying Cluster Service Configuration Parameters.

Enter the parameter name in the search box, change the parameter value, and save the configuration. On the Dashboard tab page of the Mapreduce service, choose More > Synchronize Configuration. After the synchronization is complete, restart the Mapreduce service.

  • Job log parameters:
    Table 1 Parameter description

    Parameter

    Description

    Default Value

    mapreduce.jobhistory.cleaner.enable

    Whether to enable the job log file deletion function.

    true

    mapreduce.jobhistory.cleaner.interval-ms

    Period for starting a log file cleanup. Only log files whose retention period is longer than the time specified by mapreduce.jobhistory.max-age-ms can be deleted.

    86,400,000 ms (1 day)

    mapreduce.jobhistory.max-age-ms

    Log files whose retention period is longer than the retention period in milliseconds specified by this parameter will be deleted.

    1,296,000,000 ms (15 days)

  • Task log parameters:
    Table 2 Parameter description

    Parameter

    Description

    Default Value

    yarn.log-aggregation.archive.files.minimum

    Indicates the minimum number of archived MapReduce job log files. The archiving task starts when the number of files in the yarn.nodemanager.remote-app-log-dir folder is greater than or equal to the value of this parameter.

    This parameter applies to MRS 3.x.

    5,000

    yarn.log-aggregation.archive-check-interval-seconds

    Indicates the MapReduce job log archiving interval, in seconds. Log files are archived only when the number of log files reaches the value of yarn.log-aggregation.archive.files.minimum. The archiving function is disabled when the period is set to 0 or -1.

    This parameter applies to MRS 3.x.

    -1

    yarn.log-aggregation.retain-seconds

    Indicates the retention period on HDFS for archiving the MapReduce job logs. The value -1 indicates that log files are stored permanently.

    1,296,000

    yarn.log-aggregation.retain-check-interval-seconds

    Indicates the check period (in seconds) of the MapReduce job log deletion task. If this parameter is set to -1, the check period is one tenth of the log retention period.

    86400

    If task logs occupy too much HDFS storage space, modify the mapreduce.jobhistory.max-age-ms and yarn.log-aggregation.retain-check-interval-seconds configuration items to control the storage duration of task logs.