Updated on 2024-08-10 GMT+08:00

Introduction to HDFS

Introduction to HDFS

Hadoop distribute file system (HDFS) is a distributed file system with high fault tolerance. HDFS supports data access with high throughput and applies to processing of large data sets.

HDFS applies to the following application scenarios:

  • Massive data processing (higher than the TB or PB level).
  • Scenarios that require high throughput.
  • Scenarios that require high reliability.
  • Scenarios that require good scalability.

Introduction to HDFS Interface

HDFS can be developed by using Java language. For details of API interface, see Java API Introduction.

Basic Concepts

  • Colocation

    Colocation is used to store associated data or the data to be associated on the same storage node. The HDFS Colocation stores files to be associated on a same data node so that data can be obtained from the same data node during associated operations. This greatly reduces network bandwidth consumption.

  • Client

    The HDFS can be accessed from the Java application programming interface (API), C API, Shell, HTTP REST API and web user interface (WebUI). For details, see HDFS Common API Introduction and HDFS Shell Command Introduce.

    • JAVA API

      Provides an application interface for the HDFS. This guide describes how to use the Java API to develop HDFS applications.

    • C API

      Provides an application interface for the HDFS. This guide describes how to use the C API to develop HDFS applications.

    • Shell

      Provides shell commands to perform operations on the HDFS.

    • HTTP REST API

      Additional interfaces except Shell, Java API and C API. You can use the interfaces to monitor HDFS status.

    • WEB UI

      Provides a visualized management web page.

  • keytab file

    The keytab file is a key file that stores user information. Applications use the key file for API authentication on FusionInsight MRS.