Updated on 2024-06-12 GMT+08:00

Introduction to Training Job Logs

Overview

Training logs record the runtime process and exception information of training jobs and provide useful details for fault location. The standard output and standard error information in your code are displayed in training logs. If you encounter an issue during the execution of a ModelArts training job, view logs first. In most scenarios, you can locate the issue based on the error information reported in logs.

Retention Period

Logs are classified into the following types based on the retention period:

  • Real-time logs: generated during training job running and can be viewed on the ModelArts training job details page.
  • Historical logs: After a training job is complete, you can view its historical logs on the ModelArts training job details page. ModelArts automatically stores the logs for 30 days.
  • Permanent logs: dumped to your OBS bucket. When creating a training job, you can set an OBS dump path. You need to manually enable Persistent Log Saving for CPU- or GPU-based training jobs.
    Figure 1 Enabling Persistent Log Saving

Real-time logs and historical logs have no difference in content. Real-time logs, historical logs, and permanent logs of CPU- or GPU-based training jobs are the same.

Related Chapters