Updated on 2024-08-16 GMT+08:00

Common Concepts of HDFS Application Development

DataNode

A DataNode is used to store data blocks of each file and periodically report the DataNode data storage status to the NameNode.

NameNode

A NameNode is used to manage the namespace, directory structure, and metadata information of a file system and provide the backup mechanism.

  • Active NameNode: An active NameNode manages the namespace, directory structure, and metadata of file systems, and records the mapping relationships between data blocks and files to which the data blocks belong.
  • Standby NameNode: A standby NameNode synchronizes data with the active NameNode and takes over services from the active NameNode if the active NameNode becomes abnormal.

JournalNode

A JournalNode synchronizes metadata between the active and standby NameNodes in the High Availability (HA) cluster.

ZKFC

ZKFC must be deployed for each NameNode. It is responsible for monitoring NameNode status and writes status information to ZooKeeper. ZKFC also has permission to select an active NameNode.

Colocation

Colocation is to store associated data or data on which associated operations are performed on the same storage node. The HDFS Colocation stores files to be associated on a same data node so that data can be obtained from the same data node during associated operations. This greatly reduces network bandwidth consumption.

Client

HDFS clients include Java API, C API, shell, HTTP REST API, and web UI.

  • Java API

    Provides application APIs for HDFS. You can follow instructions in HDFS Java APIs to use Java APIs to develop HDFS applications.

  • C API

    Provides application APIs for HDFS. You can follow instructions in HDFS C APIs to use C language to develop applications.

  • Shell

    Provides shell commands to perform operations on HDFS. For details, see HDFS Shell Commands.

  • HTTP REST API

    Provides APIs except shell, Java APIs, and C APIs to monitor HDFS status. For details, see HDFS HTTP REST APIs.

  • Web UI

    Provides a visualized management web page.

Keytab file

The keytab file is a key file that stores user information. Applications use the keytab file to perform API authentication on the MRS Hadoop component.