MemArtsCC Basic Principles

Updated on 2024-12-10 GMT+08:00

View PDF

MemArtsCC is a distributed cache system on compute nodes. A compute task runs on a virtual machine (VM) of a compute cluster, and data is stored in a remote cluster hosting the Object Storage Service (OBS). Due to the limited data access speed of the remote OBS, compute tasks on VMs often need to wait for data. A high-speed cache is required to bridge the data access gap between the compute cluster and OBS. MemArts is a distributed client cache. It is deployed on VMs in a compute cluster and intelligently prefetches data from OBS to accelerate compute tasks.

Figure 1 MemArtsCC structure
Click to enlarge

**Table 1** MemArtsCC architecture
Name	Description
CC SDK	SDK used by OBSA, a Hadoop client plug-in on the FS client, to access OBS server objects.
ShardView	Global cluster view. ShardView locates the physical node where a specified file shard key is.
CacheCore	Data read, fragment query, data prefetch, and cache elimination
LocalStore	Read and write of cached data in local SSDs
RemoteStore	Interface for accessing the OBS server and prefetch bandwidth control
Cluster Manager (CM)	Cluster view management: static and dynamic view update for view consistency, and active node election for high service reliability