Improving HDFS Write Performance
Scenario
The HDFS write performance directly affects the efficiency of the entire system. Improving the write performance can reduce the data write time, make the data processing and system response more efficient. Additionally, improving the HDFS write performance enables the HDFS cluster to better adapt to its service requirements.
Notes and Constraints
This section applies to MRS 3.x or later.
Procedure
- Log in to FusionInsight Manager.
For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
- Choose Cluster > Services > HDFS > Configurations > All Configurations.
- Search for the following parameters and change their values as required.
Table 1 Parameters for improving HDFS write performance Parameter
Description
Default Value
dfs.datanode.drop.cache.behind.reads
Whether to enable a DataNode to automatically clear all data in the cache after the data in the cache is transferred to the client.
- true: The cached data is discarded. This parameter needs to be configured on the DataNode.
You are advised to set it to true if data is repeatedly read only a few times, so that the cache can be used by other operations.
- false: You are advised to set it to false if data is read repeatedly for many times to improve the read speed.
This parameter is optional for improving write performance. You can configure it as needed.
false
dfs.client-write-packet-size
Size of each data packet when client writes data, in bytes.
When the HDFS client writes data to the DataNode, the client generates multiple data packets and sends them to the DataNode for storage over the network. This parameter specifies the size of the data packet to be transmitted, which can be specified by each job.
- Larger data packets can reduce the number of transmissions, improve the bandwidth utilization and write performance, but may increase the delay of each transmission.
- Smaller data packets have lower transmission delay, but increase the number of transmissions. They are applicable to delay-sensitive scenarios.
In the 10-Gigabit network, you can increase the value of this parameter to enhance the transmission throughput.
262144
- true: The cached data is discarded. This parameter needs to be configured on the DataNode.
- Save the settings. Restart the expired service or instance for the configuration to take effect.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot