Updated on 2024-06-12 GMT+08:00

TensorBoard Visualization Jobs

ModelArts supports TensorBoard for visualizing training jobs. TensorBoard is a visualization tool package of TensorFlow. It provides visualization functions and tools required for machine learning experiments.

TensorBoard effectively displays the computational graph of TensorFlow in the running process, the trend of all metrics in time, and the data used in the training.

Prerequisites

When you compile a training script, add the code for collecting the summary record to the script to ensure that the summary file is generated in the training result.

For details about how to add the code for collecting the summary record to a TensorFlow-powered training script, see TensorFlow official website.

Precautions

  • TensorBoard visualization training jobs support only CPU and GPU flavors based on TensorFlow2.1, and PyTorch1.4, 1.8 or later images. Select images and flavors based on the site requirements.

Step 1 Create a Development Environment and Access It Online

On the ModelArts management console, choose DevEnviron > Notebook to access notebook of the new version and create an instance using a TensorFlow or PyTorch image. After the instance is created, click Open in the Operation column of the instance to access it online.

TensorBoard visualization training jobs support only CPU and GPU flavors based on TensorFlow2.1, and PyTorch1.4, 1.8 or later images. Select images and flavors based on the site requirements.

Step 2 Upload the Summary Data

Summary data is required for using TensorBoard visualization functions in DevEnviron.

You can upload the summary data to the /home/ma-user/work/ directory in the development environment or store it in the OBS parallel file system.

  • For details about how to upload the summary data to the notebook path /home/ma-user/work/, see Uploading Files to JupyterLab.
  • If you want the notebook development environment to mount the OBS parallel file system directory and read the summary data, upload the summary file generated during model training to the OBS parallel file system When TensorBoard is started in a notebook instance, the notebook instance automatically mounts the OBS parallel file system directory and reads the summary data.

Step 3 Start TensorBoard

Choose a way you like to start TensorBoard in JupyterLab.

Figure 1 Starting TensorBoard in JupyterLab

You can upgrade TensorBoard to any version except 2.4.0. After the upgrade, only method 1 starts the new-version TensorBoard. Using other methods will still start TensorBoard 2.1.1.

Method 1

  1. Click to go to the JupyterLab development environment. The .ipynb file is automatically created.
  2. Enter the following command in the dialog box:
    %reload_ext ma_tensorboard
    %ma_tensorboard  --port {PORT} --logdir {BASE_DIR}

    Parameters:

    • port {PORT}: web service port for visualization, which defaults to 8080. If the default port 8080 is occupied, specify a port ranging from 1 to 65535.
    • logdir {BASE_DIR}: data storage path in the development environment
      • Local path of the development environment: ./work/xxx (relative path) or /home/ma-user/work/xxx (absolute path)
      • Path of the OBS parallel file system: obs://xxx/
    For example:
    # If the summary data is stored in /home/ma-user/work/ of the development environment, run the following command:
    %ma_tensorboard  --port {PORT} --logdir /home/ma-user/work/xxx 
    or
    # If the summary data is stored in the OBS parallel file system, run the following command and the development environment automatically mounts the storage path of the OBS parallel file system and reads data.
    %ma_tensorboard  --port {PORT} --logdir obs://xxx/
    Figure 2 TensorBoard page (1)

Method 2

Click to go to the TensorBoard page.

The directory /home/ma-user/work/ is read by default.

All project log names are displayed in the Runs area. You can view the logs of the target project in the Runs area on the left.

Figure 3 TensorBoard page (2)

Method 3

  1. Choose View > Activate Command Palette, enter TensorBoard in the search box, and click Create a new TensorBoard.
    Figure 4 Creating a TensorBoard instance
  2. Enter the path of the summary data you want to view or the storage path of the OBS parallel file system.
    • Local path of the development environment: ./summary (relative path) or /home/ma-user/work/summary (absolute path)
    • Path of the OBS parallel file system bucket: obs://xxx/
    Figure 5 Entering the summary data path
    Figure 6 TensorBoard page (3)

Method 4

Click and run the following command. (In this way, the UI cannot be displayed.)

tensorboard --logdir ./log 
Figure 7 Opening TensorBoard through Terminal

Step 4 View Visualized Data on the Training Dashboard

The training dashboard is important for TensorBoard visualization. The training dashboard allows for scalar visualization, image visualization, and computational graph visualization.

For more functions, see Get started with TensorBoard.

Related Operations

To stop a TensorBoard instance, perform the following steps:

  • Method 1: Enter the following command in the .ipynb file window of JupyterLab. (Obtain PID on the startup screen or using the command ps -ef | grep tensorboard.)
    !kill PID
  • Method 2: Click . The TensorBoard instance management page is displayed, which shows all started TensorBoard instances. Click SHUT DOWN next to an instance to stop it.
    Figure 8 Clicking SHUT DOWN to stop an instance
  • Method 3: Click in the following figure to close all started TensorBoard instances.
    Figure 9 Stopping all started TensorBoard instances
  • Method 4 (not recommended): Close the TensorBoard window on JupyterLab. In this case, only the visualization window is closed, but the instance is still running on the backend.