Introduction to HDFS

Hadoop distribute file system (HDFS) is a distributed file system with high fault tolerance. HDFS supports data access with high throughput and applies to processing of large data sets.

HDFS applies to the following application scenarios:

Massive data processing (higher than the TB or PB level).
Scenarios that require high throughput.
Scenarios that require high reliability.
Scenarios that require good scalability.

Introduction to HDFS Interface

HDFS can be developed by using Java language. For details of API interface, see Java API Introduction.

Basic Concepts

Colocation
Colocation is used to store associated data or the data to be associated on the same storage node. The HDFS Colocation stores files to be associated on a same data node so that data can be obtained from the same data node during associated operations. This greatly reduces network bandwidth consumption.
Client
The HDFS can be accessed from the Java application programming interface (API), C API, Shell, HTTP REST API and web user interface (WebUI). For details, see HDFS Common API Introduction and HDFS Shell Command Introduce.
- JAVA API
  Provides an application interface for the HDFS. This guide describes how to use the Java API to develop HDFS applications.
- C API
  Provides an application interface for the HDFS. This guide describes how to use the C API to develop HDFS applications.
- Shell
  Provides shell commands to perform operations on the HDFS.
- HTTP REST API
  Additional interfaces except Shell, Java API and C API. You can use the interfaces to monitor HDFS status.
- WEB UI
  Provides a visualized management web page.
keytab file
The keytab file is a key file that stores user information. Applications use the key file for API authentication on FusionInsight MRS.