Updated on 2024-12-11 GMT+08:00

Improving HDFS Write Performance

Scenario

Improve the HDFS write performance by modifying the HDFS attributes.

This section applies to MRS 3.x or later.

Procedure

Navigation path for setting parameters:

On FusionInsight Manager, choose Cluster > Services > HDFS and click Configurations then All Configurations. Enter a parameter name in the search box.

Table 1 Parameters for improving HDFS write performance

Parameter

Description

Default Value

dfs.datanode.drop.cache.behind.reads

Specifies whether to enable a DataNode to automatically clear all data in the cache after the data in the cache is transferred to the client.

  • true: The cached data is discarded. This parameter needs to be configured on the DataNode.

    You are advised to set it to true if data is repeatedly read only a few times, so that the cache can be used by other operations.

  • false: You are advised to set it to false if data is read repeatedly for many times to improve the read speed.
NOTE:

This parameter is optional for improving write performance. You can configure it as needed.

false

dfs.client-write-packet-size

Specifies the size of the client write packet. When the HDFS client writes data to the DataNode, the data will be accumulated until a packet is generated. Then, the packet is transmitted over the network. This parameter specifies the size (unit: byte) of the data packet to be transmitted, which can be specified by each job.

In the 10-Gigabit network, you can increase the value of this parameter to enhance the transmission throughput.

262144