Updated on 2023-04-10 GMT+08:00

Java API

For details about Hadoop distributed file system (HDFS) APIs, see http://hadoop.apache.org/docs/r3.1.1/api/index.html.

HDFS Common API

Common HDFS Java classes are as follows:

  • FileSystem: the core class of client applications. For details about common APIs, see Table 1.
  • FileStatus: record the status of files and directories. For details about common APIs, see Table 2.
  • DFSColocationAdmin: API used to manage colocation group information. For details about common APIs, see Table 3.
  • DFSColocationClient: API used to manage colocation files. For details about common APIs, see Table 4.
    • The system reserves only the mapping between nodes and locator IDs, but does not reserve the mapping between files and locator IDs. When a file is created using a Colocation interface, the file is created on the node that corresponds to a locator ID. File creation and writing must be performed using Colocation interfaces.
    • After the file is written, subsequent operations on the file can use other open-source interfaces in addition to Colocation interfaces.
    • The DFSColocationClient class inherits from the open-source DistributedFileSystem class and contains common file operation functions. If a user uses the DFSColocationClient class to create a Colocation file, the user is advanced to use the functions of this class in file operations.
Table 1 Common FileSystem APIs

API

Description

public static FileSystem get(Configuration conf)

FileSystem is the API class provided for users in the Hadoop class library. FileSystem is an abstract class. Concrete classes can be obtained only using the get method. The get method has multiple overload versions and is commonly used.

public FSDataOutputStream create(Path f)

This API is used to create files in the HDFS. f indicates a complete file path.

public void copyFromLocalFile(Path src, Path dst)

This API is used to upload local files to a specified directory in the HDFS. src and dst indicate complete file paths.

public boolean mkdirs(Path f)

This API is used to create folders in the HDFS. f indicates a complete folder path.

public abstract boolean rename(Path src, Path dst)

This API is used to rename a specified HDFS file. src and dst indicate complete file paths.

public abstract boolean delete(Path f, boolean recursive)

This API is used to delete a specified HDFS file. f indicates the complete path of the file to be deleted, and recursive specifies recursive deletion.

public boolean exists(Path f)

This API is used to query a specified HDFS file. f indicates a complete file path.

public FileStatus getFileStatus(Path f)

This API is used to obtain the FileStatus object of a file or folder. The FsStatus object records status information of the file or folder, including the modification time and file directory.

public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)

This API is used to query the block location of a specified file in an HDFS cluster. file indicates a complete file path, and start and len specify the block scope.

public FSDataInputStream open(Path f)

This API is used to open the output stream of a specified file in the HDFS and read the file using the API provided by the FSDataInputStream class. f indicates a complete file path.

public FSDataOutputStream create(Path f, boolean overwrite)

This API is used to create the input stream of a specified file in the HDFS and write the file using the API provided by the FSDataOutputStream class f indicates a complete file path. If overwrite is true, the file is rewritten if it exists; if overwrite is false, an error is reported if the file exists.

public FSDataOutputStream append(Path f)

This API is used to open the input stream of a specified file in the HDFS and write the file using the API provided by the FSDataOutputStream class f indicates a complete file path.

Table 2 Common FileStatus APIs

API

Description

public long getModificationTime()

This API is used to query the modification time of a specified HDFS file.

public Path getPath()

This API is used to query all files in an HDFS directory.

Table 3 Common DFSColocationAdmin APIs

API

Description

public Map<String, List<DatanodeInfo>> createColocationGroup(String groupId,String file)

This API is used to create a group based on the locatorIds information in the file. file indicates the file path.

public Map<String, List<DatanodeInfo>> createColocationGroup(String groupId,List<String> locators)

This API is used to create a group based on the locatorIds information in the list in the memory.

public void deleteColocationGroup(String groupId)

This API is used to delete a group.

public List<String> listColocationGroups()

This API is used to return all group information of Colocation. The returned group ID arrays are sorted by the creation time.

public List<DatanodeInfo> getNodesForLocator(String groupId, String locatorId)

This API is used to obtain the list of all nodes in the locator.

Table 4 Common DFSColocationAdmin APIs

API

Description

public FSDataOutputStream create(Path f, boolean overwrite, String groupId,String locatorId)

This API is used to create a FSDataOutputStream in colocation mode to allow users to write files in f.

f is the HDFS path.

overwrite indicates whether an existing file can be overwritten.

groupId and locatorId of the file specified by a user must exist.

public FSDataOutputStream create(final Path f, final FsPermission permission, final EnumSet<CreateFlag> cflags, final int bufferSize, final short replication, final long blockSize, final Progressable progress, final ChecksumOpt checksumOpt, final String groupId, final String locatorId)

The function of this API is the same as that of FSDataOutputStream create(Path f, boolean overwrite, String groupId, String locatorId), except that users are allowed to customize checksum.

public void close()

This API is used to close the connection.

Table 5 HDFS client WebHdfsFileSystem API

API

Description

public RemoteIterator<FileStatus> listStatusIterator(final Path)

This API will help in fetching the child files and folders information through multiple requests using remote iterator, thus avoiding the user interface from becoming slow when there is a large amount of child information to be fetched.

Glob path pattern based API to get LocatedFileStatus and Open file from FileStatus

Following APIs are added in DistributedFileSystem to get the FileStatus with block location and open file from FileStatus object. These APIs will reduce the number of RPC calls from client to Namenodes.

Table 6 FileSystem APIs

Interface

Description

public LocatedFileStatus[] globLocatedStatus(Path, PathFilter, boolean) throws IOException

Return an array of LocatedFileStatus objects whose path names match pathPattern and pass the in path filter.

public FSDataInputStream open(FileStatus stat) throws IOException

If the stat is an instance of LocatedFileStatusHdfs that already have the location information, the InputStream is created without contacting NameNode.