Configuring the Connection Between Hive and MemArtsCC
Scenario
MemArtsCC stores hotspot data in computing clusters to reduce the required bandwidth on the OBS server. With the local storage of MemArtsCC, hotspot data does not need to be accessed across networks, improving the data read efficiency of Hive. This topic describes how to integrate Hive into HetuEngine tasks for a system where storage and compute are decoupled.
Prerequisites
- The Guardian service is running properly, and decoupled storage and compute have been used.
- Hive has been connected to OBS.
Modifying Hive Configurations
- Log in to FusionInsight Manager and choose Cluster > Services > Hive, click Configurations and then All Configurations, and choose Hive(Service) > OBS.
- Set fs.obs.readahead.policy to memArtsCC.
- Click Save. In the displayed dialog box, click OK to save the configuration. Click Dashboard and choose More > Service Rolling Restart to restart the Hive service.
Verifying the Configuration
- Log in to FusionInsight Manager and choose Cluster > Services > MemArtsCC > Chart > Capacity.
- View and record the number of shards in the cluster.
- Log in to the Hive client node, use Beeline to create a table, and ensure that Location is an OBS path.
Run the following statement in Beeline to execute MapReduce tasks:
select count(*) from tablename;
- Repeat 1 to 2. If there are more shards in the cluster than there were in 2, the interconnection is successful.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot