Updated on 2024-12-23 GMT+08:00

High-performance Computing

Context

A high-performance computing (HPC) system or environment is made up of a single computer system with many CPUs, or a cluster of multiple computer clusters. It can handle a large amount of data and perform high-performance computing that would be rather difficult for PCs. HPC has ultra-high capability in floating-point computation and can be used for compute-intensive and data-intensive fields, such as industrial design, bioscience, energy exploration, image rendering, and heterogeneous computing. Different scenarios put different requirements on the file system:

  • Industrial design: In automobile manufacturing, CAE and CAD simulation software is widely used. When the software is operating, compute nodes need to communicate with each other closely, which requires a file system that can provide high bandwidth and low latency.
  • Bioscience: The file system should have high bandwidth and large storage, and be easy to expand.
    • Bioinformatics: To sequence, stitch, and compare genes.
    • Molecular dynamics: To simulate the changes of proteins at molecular and atomic levels.
    • New drug R&D: To complete high-throughput screening (HTS) to shorten the R&D cycle and reduce the investment.
  • Energy exploration: Field operations, geologic prospecting, geological data processing and interpretation, and identification of oil and gas reservoirs all require the file system to provide large memory and high bandwidth.
  • Image rendering: Image processing, 3D rendering, and frequent processing of small files require high read/write performance, large capacity, and high bandwidth of file systems.
  • Heterogeneous computing: Compute elements may have different instruction set architectures, requiring the file system to provide high bandwidth and low latency.

SFS Turbo is a shared storage service based on file systems. It features high-speed data sharing, dynamic storage tiering, as well as on-demand, smooth, and online capacity expansion. These outstanding features empower SFS Turbo to meet the demanding requirements of HPC on storage capacity, throughput, IOPS, and latency.

A biological company needs to perform plenty of gene sequencing using software. However, due to the trivial steps, slow deployment, complex process, and low efficiency, self-built clusters are reluctant to keep abreast of business development. Things are getting better since the company resorted to professional HPC service process management software. With massive compute and storage resource of the cloud platform, the initial investment cost and O&M cost are greatly reduced, the service rollout time is shortened, and efficiency is boosted.

Configuration Process

  1. Prepare the files of DNA sequencing to be uploaded.
  2. Log in to the SFS Turbo console. Create a file system to store the files of DNA sequencing.
  3. Log in to the cloud servers that function as the head node and compute node, and mount the file system on them, respectively.
  4. On the head node, upload the files to the file system.
  5. On the compute node, edit the files.

Prerequisites

  • A VPC has been created.
  • Cloud servers that function as the head node and compute node have been created, and are in the created VPC. For details about how to upload on-premises gene sequencing files to SFS Turbo, see Migrating Data to SFS Turbo Using Direct Connect.
  • SFS Turbo has been enabled.

Example Configuration

  1. Log in to the SFS Turbo console.
  2. In the upper right corner of the page, click Create File System.
  3. On the Create File System page, configure parameters as instructed.
  4. After the configuration is complete, click Create Now.

    To mount a file system to Linux ECSs, see Mounting an NFS File System to Linux ECSs as Root.

  5. Log in to the head node and upload the files to the file system.
  6. Start gene sequencing. The compute node obtains the gene sequencing file from the mounted file system for calculation.