Basic Concepts
DataNode
A DataNode is used to store data blocks of each file and periodically report the storage status to the NameNode.
NameNode
A NameNode is used to manage the namespace, directory structure, and metadata information of a file system and provide the backup mechanism. NameNodes are classified into the following two types:
- Active NameNode: manages the file system namespace, maintains the directory structure tree and metadata information of a file system, and records the relationship between each data block and the file to which the data block belongs.
- Standby NameNode: Data in a standby NameNode is synchronous with those in an active NameNode. A standby NameNode takes over services from the active NameNode if the active NameNode is exception.
JournalNode
A JournalNode synchronizes metadata between the active and standby NameNodes in the High Availability (HA) cluster.
ZKFC
ZKFC must be deployed for each NameNode. It is responsible for monitoring NameNode status and writing status information to the ZooKeeper. ZKFC also has permission to select the active NameNode.
Colocation
Colocation is used to store associated data or the data to be associated on the same storage node. The HDFS Colocation stores files to be associated on a same data node so that data can be obtained from the same data node during associated operations. This greatly reduces network bandwidth consumption.
SmallFS
The new background small file merging feature of SmallFS enables it to automatically detect small files in the system based on the file size threshold, merge them during idle hours, and store metadata to a third-party Key Value (KV) system to reduce NameNode load. Moreover, it provides a new FileSystem interface for users to transparently access these small files.
The new FileSystem interface provides a wealth of file operation functions and is almost the same as the Hadoop Distributed File System (HDFS).
Client
The HDFS can be accessed from the Java application programming interface (API), C API, Shell, HTTP REST API and web user interface (WebUI).
For details, see Common API Introduction and Shell Command Introduce.
- JAVA API
Provides an application interface for the HDFS. This guide describes how to use the Java API to develop HDFS applications.
- C API
Provides an application interface for the HDFS. This guide describes how to use the C API to develop HDFS applications.
- Shell
- HTTP REST API
Additional interfaces except Shell, Java API and C API. You can use the interfaces to monitor HDFS status.
- WEB UI
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.