Updated on 2025-06-05 GMT+08:00

Routine O&M

Monitoring SFS Turbo Capacity

To prevent your services from being affected when the file system capacity is used up, you can create an alarm rule and configure a threshold on Cloud Eye to monitor the file system capacity. When the file system storage usage exceeds the alarm threshold, the system will send email and message notifications to the O&M personnel. Upon this time, you need to clear the file system storage space, set a shorter cold data eviction duration, or expand the file system capacity. For details, see SFS Turbo Metrics and Creating an Alarm Rule.

Monitoring SFS Turbo Performance

You can monitor the performance and usage of the SFS Turbo file system on Cloud Eye. When the storage read/write bandwidth no longer meets the AI training needs, for example, checkpoint saves and loads take a longer time or loading datasets slows down the training, you can expand the file system performance to reduce the data loading time. For details, see SFS Turbo Metrics and Creating an Alarm Rule.

Adjusting the SFS Turbo Data Eviction Policy

For details, see Configuring the SFS Turbo Data Eviction Policy.

Expanding the Capacity and Performance of an SFS Turbo File System

If the SFS Turbo file system has insufficient storage space, you can expand its capacity. For details, see Expanding Capacity.

SFS Turbo HPC file systems provide a certain bandwidth based on every TB. If your file system performance is not enough, you can increase the bandwidth by expanding the file system capacity.

Monitoring OBS Performance

You can monitor the performance of the associated OBS bucket on Cloud Eye. The file import and export speeds are affected by the maximum read and write bandwidth of OBS. The default maximum bandwidth is 16 Gbit/s. You can submit a service ticket to contact technical support to configure a higher OBS read/write bandwidth.