Updated on 2024-11-29 GMT+08:00

Integrating MemArtsCC into Spark Tasks

Scenario

MemArtsCC stores hotspot data in compute clusters to reduce the required bandwidth on the OBS server. With the local storage of MemArtsCC, hotspot data does not need to be accessed across networks, improving the data read efficiency of Spark. This topic describes how to integrate MemArtsCC into Spark tasks for a system where storage and compute are decoupled.

Prerequisites

  • The Guardian service is running properly, and decoupled storage and compute have been used.
  • Spark has been connected to OBS.

Modifying Spark Configurations

  1. Log in to FusionInsight Manager and choose Cluster > Services > Spark. Click Configurations, click All Configurations, and click SparkResource(Role) > OBS.
  2. Set fs.obs.readahead.policy to memArtsCC.
  3. Click Save. In the displayed dialog box, click OK to save the configuration. Click Dashboard and choose More > Service Rolling Restart to restart the Spark service.
  4. Download and install the Spark service client again.

Verifying the Configuration

  1. Log in to FusionInsight Manager and choose Cluster > Services > MemArtsCC > Chart > Capacity.
  2. View and record the number of shards in the cluster.
  3. Log in to the Spark client node, create a table whose Location is an OBS path, and query the table.
  4. Repeat 1 and 2. If there are more shards in the cluster than there were in 2, the interconnection is successful.