Updated on 2024-11-29 GMT+08:00

MemArtsCC Basic Principles

MemArtsCC is a distributed caching service designed for the architecture with decoupled storage and compute. It adopts a lightweight architecture and is deployed in a compute cluster. It prefetches data from remote object storage to provide high-speed access to these data, accelerating execution of compute tasks.

MemArtsCC shards objects on remote object storage (OBS) and creates indexes, greatly improving the performance of reading cached data. ZooKeeper is used to make service discovery lightweight and provides ultra-high availability. The lifecycle management of sharded data is based on the LRU algorithm.

Main Features

  • The decentralized architecture enables all instances to provide same service capabilities.
  • With a lightweight design, the resource usage is extremely low.
  • MemArtsCC is decoupled from applications and therefore is transparent to them and can be used without adaptation.
  • MemArtsCC ensures high availability in case of node failures.

MemArtsCC Structure

There are CCSideCar and CCWorker roles of MemArtsCC instances.

In an architecture with decoupled storage and compute, data of computing and analytics applications such as Spark and Hive is stored in OBS. In a MemArtsCC cluster, a service instance is called a worker. Workers cache some or all of the object data in OBS to local persistent storage (SDD/HDD). When an application reads an object through the MemArtsCC SDK, the application reads sharded data from a specific worker based on the shard index. If the cache is hit, the worker returns the shards. If the cache is not hit, the application directly reads data from OBS. The worker asynchronously loads the shards that are not hit to local storage for subsequent use.

Figure 1 MemArtsCC structure
Table 1 Structure

Name

Description

MemArtsCC SDK

SDK used by OBSA, a Hadoop client plug-in on the FS client, to access OBS server objects.

CCSideCar

The management plane service monitors MemArtsCC, collects data, delivers configurations, and starts and stops the service.

CCWorker

The data plane service reads/writes, stores, and deletes data cached by MemArtsCC.