HDFS Application Development Suggestions

Notes for reading and writing HDFS files

The HDFS does not support random read/write.

Data can be appended only to the end of an HDFS file.

Only data stored in the HDFS supports append. edit.log and metadata files do not support append. When using the append function, set dfs.support.append in hdfs-site.xml to true.

dfs.support.append is disabled by default in open-source versions but enabled by default in FusionInsight versions.
This parameter is a server parameter. You are advised to enable this parameter to use the append function.
Store data in other modes, such as HBase, if the HDFS is not applicable.

The HDFS is not suitable for storing a large number of small files

The HDFS is not suitable for storing a large number of small files because the metadata of small files will consume excessive memory resources of the NameNode.

Back up HDFS data in three duplicates

Three duplicates are enough for DataNode data backup. System data security is improved when more duplicates are generated but system efficiency is reduced. When a node is faulty, data on the node is balanced to other nodes.

Periodical HDFS Image Back-up

The system can back up the data on NameNode periodically after the image back-up parameter fs.namenode.image.backup.enable is set to true.

Provide operations to ensure data reliability

When you invoke the write function to write data, HDFS client does not write the data to HDFS but caches it in the client memory. If the client is abnormal, power-off, the data will be lost. For high-reliability demanding data, invoke hflush to refresh the data to HDFS after writing finishes.

Parent topic: HDFS

Previous topic: HDFS Application Development Rules

Next topic: Hive