Updated on 2022-08-16 GMT+08:00

Java API

For details about Hadoop distributed file system (HDFS) APIs, see:

http://hadoop.apache.org/docs/r3.1.1/api/index.html.

HDFS Common API

Common HDFS Java classes are as follows:

  • FileSystem: the core class of client applications. For details about common APIs, see Table 1.
  • FileStatus: record the status of files and directories. For details about common APIs, see Table 2.
  • DFSColocationAdmin: API used to manage colocation group information. For details about common APIs, see Table 3.
  • DFSColocationClient: API used to manage colocation files. For details about common APIs, see Table 4.
    • The system reserves only the mapping between nodes and locator IDs, but does not reserve the mapping between files and locator IDs. When a file is created using a Colocation interface, the file is created on the node that corresponds to a locator ID. File creation and writing must be performed using Colocation interfaces.
    • After the file is written, subsequent operations on the file can use other open-source interfaces in addition to Colocation interfaces.
    • The DFSColocationClient class inherits from the open-source DistributedFileSystem class and contains common file operation functions. If a user uses the DFSColocationClient class to create a Colocation file, the user is advanced to use the functions of this class in file operations.
Table 1 Common FileSystem APIs

API

Description

public static FileSystem get(Configuration conf)

FileSystem is the API class provided for users in the Hadoop class library. FileSystem is an abstract class. Concrete classes can be obtained only using the get method. The get method has multiple overload versions and is commonly used.

public FSDataOutputStream create(Path f)

This API is used to create files in the HDFS. f indicates a complete file path.

public void copyFromLocalFile(Path src, Path dst)

This API is used to upload local files to a specified directory in the HDFS. src and dst indicate complete file paths.

public boolean mkdirs(Path f)

This API is used to create folders in the HDFS. f indicates a complete folder path.

public abstract boolean rename(Path src, Path dst)

This API is used to rename a specified HDFS file. src and dst indicate complete file paths.

public abstract boolean delete(Path f, boolean recursive)

This API is used to delete a specified HDFS file. f indicates the complete path of the file to be deleted, and recursive specifies recursive deletion.

public boolean exists(Path f)

This API is used to query a specified HDFS file. f indicates a complete file path.

public FileStatus getFileStatus(Path f)

This API is used to obtain the FileStatus object of a file or folder. The FsStatus object records status information of the file or folder, including the modification time and file directory.

public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)

This API is used to query the block location of a specified file in an HDFS cluster. file indicates a complete file path, and start and len specify the block scope.

public FSDataInputStream open(Path f)

This API is used to open the output stream of a specified file in the HDFS and read the file using the API provided by the FSDataInputStream class. f indicates a complete file path.

public FSDataOutputStream create(Path f, boolean overwrite)

This API is used to create the input stream of a specified file in the HDFS and write the file using the API provided by the FSDataOutputStream class f indicates a complete file path. If overwrite is true, the file is rewritten if it exists; if overwrite is false, an error is reported if the file exists.

public FSDataOutputStream append(Path f)

This API is used to open the input stream of a specified file in the HDFS and write the file using the API provided by the FSDataOutputStream class f indicates a complete file path.

Table 2 Common FileStatus APIs

API

Description

public long getModificationTime()

This API is used to query the modification time of a specified HDFS file.

public Path getPath()

This API is used to query all files in an HDFS directory.

Table 3 Common DFSColocationAdmin APIs

API

Description

public Map<String, List<DatanodeInfo>> createColocationGroup(String groupId,String file)

This API is used to create a group based on the locatorIds information in the file. file indicates the file path.

public Map<String, List<DatanodeInfo>> createColocationGroup(String groupId,List<String> locators)

This API is used to create a group based on the locatorIds information in the list in the memory.

public void deleteColocationGroup(String groupId)

This API is used to delete a group.

public List<String> listColocationGroups()

This API is used to return all group information of Colocation. The returned group ID arrays are sorted by the creation time.

public List<DatanodeInfo> getNodesForLocator(String groupId, String locatorId)

This API is used to obtain the list of all nodes in the locator.

Table 4 Common DFSColocationAdmin APIs

API

Description

public FSDataOutputStream create(Path f, boolean overwrite, String groupId,String locatorId)

This API is used to create a FSDataOutputStream in colocation mode to allow users to write files in f.

f is the HDFS path.

overwrite indicates whether an existing file can be overwritten.

groupId and locatorId of the file specified by a user must exist.

public FSDataOutputStream create(final Path f, final FsPermission permission, final EnumSet<CreateFlag> cflags, final int bufferSize, final short replication, final long blockSize, final Progressable progress, final ChecksumOpt checksumOpt, final String groupId, final String locatorId)

The function of this API is the same as that of FSDataOutputStream create(Path f, boolean overwrite, String groupId, String locatorId), except that users are allowed to customize checksum.

public void close()

This API is used to close the connection.

Table 5 HDFS client WebHdfsFileSystem API

API

Description

public RemoteIterator<FileStatus> listStatusIterator(final Path)

This API will help in fetching the child files and folders information through multiple requests using remote iterator, thus avoiding the user interface from becoming slow when there is a large amount of child information to be fetched.

SmallFS Common API

The Java class SmallFileSystem common interfaces of the SmallFS are shown in Table 6.

Table 6 Description of Class SmallFileSystem Common Interfaces

Interface

Description

public void close()

Closes the connection after use.

public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path src,Path dst) throws IOException

This interface is used to upload the local file to the given location of the SmallFileSystem. src and dst indicate complete file paths.

public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst) throws IOException

This interface is used to upload multiple local files to the given location of the SmallFileSystem. src and dst indicate complete file paths.

public void copyToLocalFile(boolean delSrc, Path src, Path dst, boolean useRawLocalFileSystem) throws IOException

This interface is used to download specified files of the SmallFileSystem to the local folder. src and dst indicate complete file paths.

public FSDataOutputStream create(Path path, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException

This interface is used to create files in the given path in the SmallFileSystem.

public FileSystem[] getChildFileSystems()

This interface is used to obtain all sub-file systems of the SmallFileSystem.

public long getDefaultBlockSize()

This interface is used to obtain the default block size of the SmallFileSystem.

public short getDefaultReplication()

This interface is used to obtain the default backup counts of the SmallFileSystem.

public BlockLocation[] getFileBlockLocations(Path path, long start, long len) throws IOException

This interface is used to obtain the block location of the given file path.

public Path getHomeDirectory()

This interface is used to obtain the original path.

public String getScheme()

This interface is used to obtain the Schema of the SmallFileSystem.

public FsServerDefaults getServerDefaults() throws IOException

This interface is used to obtain the default configuration of the SmallFileSystem.

public void initialize(URI name, Configuration conf) throws IOException

This interface is used to initialize the SmallFileSystem.

public void setOwner(Path path, String username, String groupname)

throws IOException

This interface is used to set the owner of the given path (file or path).

The parameters username and groupname cannot both be null.

NOTE:

The merged files do not support this interface.

public void setPermission(Path p, FsPermission permission) throws IOException

This interface is used to set the file permission of the given path (file or path).

NOTE:

The merged files do not support this interface.

public boolean setReplication(Path path, short replication) throws IOException

This interface is used to set the backup counts of the given path (file or path).

NOTE:

The merged files do not support this interface.

public void setTimes(Path path, long mtime, long atime) throws IOException

This interface is used to set the modification time and access time of the given path (file or path).

NOTE:

The merged files do not support this interface.

public boolean delete(Path path, boolean recursive) throws IOException

This interface is used to delete the given file path (file or path) in the SmallFileSystem.

public FileStatus getFileStatus(Path path) throws IOException

This interface is used to obtain the FsStatus object in the designated partition of the SmallFileSystem. The object records information such as the total capacity of the partition, used capacity, and remaining capacity.

public URI getUri()

This interface is used to return the default URI of the SmallFileSystem.

public Path getWorkingDirectory()

This interface is used to obtain the current working directory for the SmallFileSystem.

public FileStatus[] listStatus(Path path) throws IOException

This interface is used to list the statuses of files/directories in the given path if the path is a directory.

public boolean mkdirs(Path path, FsPermission permission) throws IOException

This interface is used to create files or folders for the given path.

NOTE:

The merged files do not support this interface.

public FSDataInputStream open(Path path, int bufferSize) throws IOException

This interface is used to open the output stream of the specified file in the SmallFileSystem and read files through the interface provided by the FSDataInputStream class. path indicates a complete file path.

public boolean rename(Path src, Path dst) throws IOException

This interface is used to rename the specified files of the SmallFileSystem. src and dst indicate complete file paths.

public void setWorkingDirectory(Path path)

The method is rewritten in the SmallFileSystem and cannot be set to other work directories.

public Configuration getConf()

This interface is used to obtain the configuration of the SmallFileSystem.

public Path getInitialWorkingDirectory()

This interface is used to obtain the initial working directory of the SmallFileSystem.

public FsStatus getStatus(Path path) throws IOException

This interface is used to return a status object describing the usage and capacity of the file system. If the file system has multiple partitions, the usage and capacity of the partition pointed by the specified path is reflected.

public FSDataOutputStream createNonRecursive(Path path, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException

This interface is used to open an FSDataOutputStream at the indicated path with write-progress reporting. Same as create(), except fails if the parent directory does not exist.

NOTE:

This interface is not recommended. Use the create interface instead.

public FSDataOutputStream append(Path path) throws IOException

This interface is used to add the additional content to the specified file path in the SmallFileSystem.

NOTE:

The merged files do not support this interface.

public boolean truncate(Path path, long newLength) throws IOException

This interface is used to cut relevant content of the specified file path in the SmallFileSystem.

NOTE:

The merged files do not support this interface.

public FsServerDefaults getServerDefaults(Path path) throws IOException

This interface is used to obtain the FsServerDefaults object of the target file system in the designated path. The object records information such as block size, backup count, and garbage retention time.

public long getUsed() throws IOException

This interface is used to return the total size of all files in the filesystem.

public long getDefaultBlockSize(Path path)

This interface is used to obtain the default block size of the specified file path.

public short getDefaultReplication(Path path)

This interface is used to obtain the default backup counts of the specified file path.

Glob path pattern based API to get LocatedFileStatus and Open file from FileStatus

Following APIs are added in DistributedFileSystem to get the FileStatus with block location and open file from FileStatus object. These APIs will reduce the number of RPC calls from client to Namenodes.

Table 7 FileSystem APIs

Interface

Description

public LocatedFileStatus[] globLocatedStatus(Path, PathFilter, boolean) throws IOException

Return an array of LocatedFileStatus objects whose path names match pathPattern and pass the in path filter.

public FSDataInputStream open(FileStatus stat) throws IOException

If the stat is an instance of LocatedFileStatusHdfs that already have the location information, the InputStream is created without contacting NameNode.