Updated on 2023-04-10 GMT+08:00

Initializing the HDFS

Function

Hadoop distributed file system (HDFS) initialization is a prerequisite for using application programming interfaces (APIs) provided by the HDFS. The process of initializing the HDFS is

  1. Load the HDFS service configuration file.
  2. Instantiate a FileSystem.

Configuration File Description

Table 1 lists the configuration files to be used during the login to the HDFS. These files already imported to the conf directory of the hadoop-examples project.

Table 1 Configuration files

File

Function

core-site.xml

Configures HDFS parameters.

hdfs-site.xml

Configures HDFS parameters.

  • The log4j.properties file under the conf directory can be configured based on your needs.

Example Codes

The following is code snippets. For complete codes, see the HdfsExample class in com.huawei.bigdata.hdfs.examples.

The initialization codes used when applications are run in Linux and the codes used when applications are run in Windows are the same. The example codes are as follows:
 //initialization
 /**confLoad();

 // Creating a sample project
 HdfsExample hdfs_examples = new HdfsExample("/user/hdfs-examples", "test.txt");
 /**
  * 
  * If the application is running in the Linux OS, the path of core-site.xml, hdfs-site.xml must be modified to the absolute path of the client file in the Linux OS. 
  *
  * 
  */  
 private static void confLoad() throws IOException {
   conf = new Configuration();
   // conf file
   conf.addResource(new Path(PATH_TO_HDFS_SITE_XML));
   conf.addResource(new Path(PATH_TO_CORE_SITE_XML));
   // conf.addResource(new Path(PATH_TO_SMALL_SITE_XML));
 }

 /**
  *Create a sample project.
  */
 public HdfsExample(String path, String fileName) throws IOException  {
   this.DEST_PATH = path;
   this.FILE_NAME = fileName;
   instanceBuild();
 }
 private void instanceBuild() throws IOException {
   fSystem = FileSystem.get(conf);
 }
  • (Optional) Specify a user to run the example code. To run the example code related to the Colocation operation, the user must be a member of the supergroup group. The following describes two ways to specify the user who runs the example code:

    Add the environment variable HADOOP_USER_NAME: For operation details in a Windows-based environment, see 1 in section Compiling and Running an Application For operation details in a Linux-based environment, see 4 in section Compiling and Running an Application with the Client Installed or 4.in section Compiling and Running an Application with the Client Not Installed.

    Modify the code: If HADOOP_USER_NAME is not specified, modify "USER" in the code to the actual user name:
    System.setProperty("HADOOP_USER_NAME", USER);