Updated on 2024-12-10 GMT+08:00

MemArtsCC Basic Principles

MemArtsCC is a distributed cache system on compute nodes. A compute task runs on a virtual machine (VM) of a compute cluster, and data is stored in a remote cluster hosting the Object Storage Service (OBS). Due to the limited data access speed of the remote OBS, compute tasks on VMs often need to wait for data. A high-speed cache is required to bridge the data access gap between the compute cluster and OBS. MemArts is a distributed client cache. It is deployed on VMs in a compute cluster and intelligently prefetches data from OBS to accelerate compute tasks.

Figure 1 MemArtsCC structure
Table 1 MemArtsCC architecture

Name

Description

CC SDK

SDK used by OBSA, a Hadoop client plug-in on the FS client, to access OBS server objects.

ShardView

Global cluster view. ShardView locates the physical node where a specified file shard key is.

CacheCore

Data read, fragment query, data prefetch, and cache elimination

LocalStore

Read and write of cached data in local SSDs

RemoteStore

Interface for accessing the OBS server and prefetch bandwidth control

Cluster Manager (CM)

Cluster view management: static and dynamic view update for view consistency, and active node election for high service reliability