Configuring Container Log Aggregation

Scenario

Yarn provides the container log aggregation function to collect logs generated by containers on each node to HDFS to release local disk space. You can collect logs in either of the following ways:

After the application is complete, collect container logs to HDFS at a time.
During application running, periodically collect log segments generated by containers and save them to HDFS.

Configuring Container Log Aggregation

Log in to FusionInsight Manager.

For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
Choose Cluster > Services > Yarn > Configurations > All Configurations.

Search for the following parameters and change their values as required.

**Table 1** Parameter description
Parameter	Description	Default Value
yarn.log-aggregation-enable	Whether to enable container log aggregation If this parameter is set to true, logs are collected to the HDFS directory. If you need to view the logs generated before on the web UI, you are advised to set this parameter to true. If this parameter is set to false, the function is disabled, and logs are not collected to HDFS. After the value is changed to false and takes effect, logs generated before the modification cannot be obtained on the web UI. After changing the parameter value, restart the Yarn service for the setting to take effect.	true
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds	Interval for NodeManager to periodically collect logs If this parameter is set to -1 or 0, periodic log collection is disabled. Logs are collected at a time after application running is complete. The minimum collection interval can be set to 3,600 seconds. If this parameter is set to a value greater than 0 and less than 3,600, the collection interval is 3,600 seconds. The value must be greater than or equal to -1.	-1
yarn.nodemanager.disk-health-checker.log-dirs.max-disk-utilization-per-disk-percentage	Maximum percentage of the YARN disk quota that can be occupied by the container log directory on each disk. The value ranges from -1 to 100. Only applications with the periodic log collection function enabled can trigger log collection when the disk quota of the log directory exceeds the threshold. When the space occupied by the container log directory exceeds the value of this parameter, the periodic log collection service is triggered to start a log collection activity beyond the period to release the local disk space. If you set this parameter to a value smaller than -1 or greater than 100, it will be forcibly reset to 25. The value -1 indicates that the function of checking the disk capacity of the container log directory is disabled. Percentage of the available disk space of the container log directory = Percentage of the available disk space of Yarn (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) x Percentage of the available disk space of the container log directory (yarn.nodemanager.disk-health-checker.log-dirs.max-disk-utilization-per-disk-percentage)	25
yarn.nodemanager.remote-app-log-dir-suffix	Name of the HDFS folder in which container logs are to be stored. The complete path for storing container logs is {yarn.nodemanager.remote-app-log-dir}/${user}/{yarn.nodemanager.remote-app-log-dir-suffix}. yarn.nodemanager.remote-app-log-dir is the value of this parameter, and {user} is the username used for running the task.	logs
yarn.nodemanager.log-aggregator.on-fail.remain-log-in-sec	Duration for retaining container logs on the local host after the logs fail to be collected, in second If this parameter is set to 0, local logs are deleted immediately. If this parameter is set to a positive number, local logs are retained for this period.	604800
yarn.nodemanager.log-aggregation.queue-user.enable	Whether the log aggregation path contains Queue User. This parameter applies to Hive jobs. Queue User is the real user who submits Hive jobs. It is disabled by default. (This parameter is supported only in MRS 3.3.1 and later versions.) The options are: true: The log aggregation path contains Queue User. false: The log aggregation path does not contain Queue User. You can also add this parameter as a custom parameter on FusionInsight Manager to achieve the same effect. Choose Cluster > Services > Yarn and click All Configurations. On the displayed page, choose NodeManager(Role) > Customization, add this parameter in the nodemanager.yarn-site.customized.configs area, and set the parameter value to true.	false
yarn.nodemanager.remote-app-log-dir	Log aggregation path for YARN jobs. (This parameter is supported only in MRS 3.3.1 and later versions.) To collect logs to a directory on multiple NameServices, you can use the following path format: hdfs://hacluster,ns1/tmp/logs To collect logs to a NameService for a specified user, you can use the following path format. By default, the aggregation path must be at the forefront. hdfs://hacluster/tmp/logs;[username]hdfs://logcluster/tmp/logs To collect logs to a NameService for a specified user and Queue User, you can use the following path format. By default, the aggregation path must be at the forefront. hdfs://hacluster/tmp/logs;[username:queue user]hdfs://hacluster/tmp/logs	/tmp/logs

Modify and save the configuration. On the Dashboard tab page, choose More > Synchronize Configuration. After the synchronization is complete, restart the YARN service.

Configuring the Container Log Retention Duration

Log in to FusionInsight Manager.
Choose Cluster > Services > Mapreduce > Configurations > All Configurations.

Search for the following parameters and change their values as required.

**Table 2** Parameter description
Parameter	Description	Default Value
yarn.log-aggregation.retain-seconds	Duration for retaining aggregated logs, in second If this parameter is set to –1, the container logs will be retained permanently in the HDFS. If this parameter is set to 0 or a positive integer, container logs will be stored for such a period and deleted after the period expires. A short period may increase load of the NameNode. Therefore, you are advised to set this parameter to a proper value.	1296000
yarn.log-aggregation.retain-check-interval-seconds	Interval for storing container logs in HDFS, in second If this parameter is set to -1 or 0, the interval will be one tenth of the period specified by yarn.log-aggregation.retain-seconds. If yarn.log-aggregation.retain-seconds is set to 0, this parameter cannot be set to 0 or -1. If this parameter is set to a positive number, container logs in HDFS will be scanned at such an interval. A short interval may increase load of the NameNode. Therefore, you are advised to set this parameter to a proper value.	86400

Save the modified configuration. Restart the expired service or instance for the configuration to take effect.

Configuring the Rolling Output of MapReduce Application Log Files

The periodic log collection function applies only to MapReduce applications, for which rolling output of log files must be configured. Table 3 describes the configurations in the Client installation path/Yarn/config/mapred-site.xml configuration file on the MapReduce client node.

Log in to FusionInsight Manager.

For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
Choose Cluster > Services > Yarn > Configurations > All Configurations.

Search for the following parameters and change their values as required.

**Table 3** Configuring rolling output of MapReduce application log files
Parameter	Description	Default Value
mapreduce.task.userlog.limit.kb	Maximum size of a single task log file of the MapReduce application, in KB. The value must be greater than or equal to 0. When the maximum size of the log file has been reached, another log file is generated. The value 0 indicates that the size of the log file is not limited.	51200
yarn.app.mapreduce.task.container.log.backups	Maximum number of task logs that can be retained for the MapReduce application. Number of task log backup files when ContainerRollingLogAppender (CRLA) is used. By default, the ContainerLogAppender (CLA) is used and container logs are not rolled back. When both mapreduce.task.userlog.limit.kb and yarn.app.mapreduce.task.container.log.backups are greater than 0, CRLA is enabled. If this parameter is set to 0, rolling output is disabled. The value ranges from 0 to 999.	10
yarn.app.mapreduce.am.container.log.limit.kb	Maximum size of a single ApplicationMaster log file of the MapReduce application, in KB. When the maximum size of the log file has been reached, another log file is generated. The value must be greater than or equal to 0. The value 0 indicates that the size of a single ApplicationMaster log file is not limited.	51200
yarn.app.mapreduce.am.container.log.backups	Number of ApplicationMaster log backup files when CRLA is used. When both yarn.app.mapreduce.am.container.log.limit.kb and yarn.app.mapreduce.am.container.log.backups are greater than 0, CRLA is enabled for the ApplicationMaster. By default, CLA is used and container logs are not rolled back. The value ranges from 0 to 999. If this parameter is set to 0, rolling output is disabled.	20
yarn.app.mapreduce.shuffle.log.backups	Maximum number of shuffle logs that can be retained for an MR application. When both yarn.app.mapreduce.shuffle.log.limit.kb and yarn.app.mapreduce.shuffle.log.backups are greater than 0, syslog.shuffle uses CRLA. The value ranges from 0 to 999. If this parameter is set to 0, rolling output is disabled.	10
yarn.app.mapreduce.shuffle.log.limit.kb	Maximum size of a single shuffle log file of a MapReduce application, in KB. When the maximum size of the log file has been reached, another log file is generated. The value must be greater than or equal to 0. The value 0 indicates that the size of a single shuffle log file is not limited.	51200