Updated on 2025-08-12 GMT+08:00

Parallel File System Overview

What Is a Parallel File System?

Parallel File System, a sub-product of OBS, is a high-performance file system. It aims to provide solutions for big data scenarios where OBS is used as the unified data lake storage. It features access latency in milliseconds, TB/s-level bandwidth, millions of IOPS, and high compatibility, performance, scalability, and reliability.

Different from the bucket structure, each directory in the access path of a parallel file system is independent. For example, /dir01/dir02/example.txt is a file and /dir01/ and /dir01/dir02/ are directories. In a hierarchical directory structure, you can rename a single directory. You do not need to list and modify all files with a specific directory prefix. This hierarchical structure makes the data organization of a parallel file system basically the same as that of Hadoop Distributed File System (HDFS). The big data analysis framework that uses HDFS as the data access layer can access data in a parallel file system through the OBSFileSystem plug-in (OBSA-HDFS). For details, see About PFS.

Performance

Every time when a volume created from a parallel file system is mounted to a workload, there will be a resident process at the backend for the volume. When a workload uses too many parallel file system volumes or reads and writes a large number of parallel file systems, resident processes will consume a significant amount of memory. Table 1 list the used memory in some scenarios. To ensure that the workload can run normally, the number of parallel file system volumes used depends on the requested memory. For example, if the workload requests 4 GiB of memory, the workload can have no more than 4 parallel file system volumes.

Table 1 Memory used by a resident process per parallel file system

Test Item

Used Memory (MiB)

Long-term stable operation

About 50

Concurrent write to a 10-MB file from two processes

About 110

Concurrent write to a 10-MB file from four processes

About 220

Write to a 100-GB file from a single process

About 300

Prerequisites

Before using a parallel file system for persistent data storage, you have configured a VPC endpoint for accessing OBS. Otherwise, the volume created from the parallel file system may fail to be mounted. You are advised to create all possible VPC endpoints for OBS at a time to avoid repeated operations when creating volumes from existing parallel file systems. For details about VPC Endpoint, see What Is VPC Endpoint? For details about how to create a VPC endpoint, see Purchasing VPC Endpoints. You can submit a service ticket or contact OBS O&M personnel to obtain the name of each VPC endpoint for OBS.

Scenarios

Only existing parallel file systems can be mounted to workloads as volumes. You need to use an existing parallel file system to create a PV and then mount the PV to a workload through a PVC. For details, see Creating a Volume from an Existing Parallel File System.

Process Flowchart

Figure 1 Process of using a parallel file system to create a volume

Billing

For details about the billing, see OBS Billing.