Configuring HDFS Hedged Read
This section is available for MRS 3.3.1 or later version only.
Scenario
In traditional HDFS, when a client requests to read data, it communicates with the NameNode to determine the DataNodes where the data block is and then connects to one node for data transmission. If the connected DataNode responds slowly or is faulty, the client must wait before attempting to obtain data from other replicas. There is a read latency. Enabling hedged read improves HDFS reliability in a high-latency network environment.
- Low read latency: The same data block is read from multiple data nodes at the same time.
- Adaptive to network changes: When the network is unstable or performance deteriorates on some nodes, the read efficiency of the client is improved.
Impact on the System
- Hedged read increases network loads and CPU usage because more connections and requests need to be processed. Enable this function based on the hardware and job conditions on the live network. For example, hedged read is enabled by default in a system that has three replicas. You need to set the component memory to at least three times the existing memory.
- When the disk I/O load is high (greater than 50% during peak hours), enabling hedged read may cause low disk performance deterioration.
Procedure
- Log in to FusionInsight Manager.
- Choose Cluster > Services > HDFS and click the Configurations tab and then All Configurations.
- Search for hdfs.hdfs-site.customized.configs and add the custom parameters listed in the following table. Set the parameters based on the site requirements.
Parameter
Description
Value Range
dfs.client.hedged.read.threshold.millis
The number of milliseconds the client waits for the first byte of the first data block before deciding whether to start a hedged read
Greater than or equal to 0
dfs.client.hedged.read.threadpool.size
Size of the hedged read thread pool. If this parameter is set to 0, the hedged read function is disabled.
Greater than or equal to 0
- Save the settings.
- On the Instance page of HDFS, select all DataNode instances, choose More > Instance Rolling Restart, and wait until the rolling restart is complete.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot