Updated on 2025-05-22 GMT+08:00

COST08-03 Decoupling Storage and Compute

  • Risk level

    Medium

  • Key strategies

    In traditional big data solutions, compute and storage are deployed together. Compute nodes must be added when disks are added, which is a waste of resources. Storage-compute decoupling is a data processing technology that separates data storage from data processing (compute). This technology allows independent optimization and scaling of storage and compute. It improves data processing efficiency, reduces costs, and meets the requirements of large-scale data storage and analysis.

    For example, in log analysis of a shopping guide website, storage capacity is frequently expanded, but compute resources are underutilized as demand remains relatively stable. Conversely, in an Internet recommendation service, storage capacity increases steadily and linearly while compute requirements experience sharp fluctuations, peaking at dozens of times the trough levels. This results in low utilization of compute resources, limiting flexibility in resource allocation. Using object storage instead of HDFS and local disks separates storage and compute. This ensures on-demand usage of compute and storage resources, preventing unnecessary resource binding and reducing costs by 30%.