Updated on 2022-11-18 GMT+08:00

Basic Concepts

DataNode

A DataNode is used to store data blocks of each file and periodically report the storage status to the NameNode.

NameNode

A NameNode is used to manage the namespace, directory structure, and metadata information of a file system and provide the backup mechanism. NameNodes are classified into the following two types:

  • Active NameNode: manages the file system namespace, maintains the directory structure tree and metadata information of a file system, and records the relationship between each data block and the file to which the data block belongs.
  • Standby NameNode: Data in a standby NameNode is synchronous with those in an active NameNode. A standby NameNode takes over services from the active NameNode if the active NameNode is exception.

JournalNode

A JournalNode synchronizes metadata between the active and standby NameNodes in the High Availability (HA) cluster.

ZKFC

ZKFC must be deployed for each NameNode. It is responsible for monitoring NameNode status and writing status information to the ZooKeeper. ZKFC also has permission to select the active NameNode.

Colocation

Colocation is used to store associated data or the data to be associated on the same storage node. The HDFS Colocation stores files to be associated on a same data node so that data can be obtained from the same data node during associated operations. This greatly reduces network bandwidth consumption.

Client

The HDFS can be accessed from the Java application programming interface (API), C API, Shell, HTTP REST API and web user interface (WebUI). For details, see Common API Introduction and Shell Command Introduce.

  • JAVA API

    Provides an application interface for the HDFS. This guide describes how to use the Java API to develop HDFS applications.

  • C API

    Provides an application interface for the HDFS. This guide describes how to use the C API to develop HDFS applications.

  • Shell

    Provides shell commands to perform operations on the HDFS.

  • HTTP REST API

    Additional interfaces except Shell, Java API and C API. You can use the interfaces to monitor HDFS status.

  • WEB UI

    Provides a visualized management web page.

keytab file

The keytab file is a key file that stores user information. Applications use the key file for API authentication on MRS.