Updated on 2024-10-23 GMT+08:00

HDFS HTTP REST APIs

Function Description

Users can use the application programming interface (API) of Representational State Transfer (REST) to create, read and write, append, and delete files. For details of the REST API, see the following official guidelines:

http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html.

Preparing Running Environment

  1. Install the client. Install the client on the node. For example, install the client in the /opt/client directory. See details in "Installing the Client."

    1. Prepare files testFile and testFileAppend and write content 'Hello, webhdfs user!' and 'Welcome back to webhdfs!'. Run the following command to prepare testFile and testFileAppend files:

      touch testFile

      vi testFile

      Write 'Hello, webhdfs user!", save the files, and exit.

      touch testFileAppend

      vi testFileAppend

      Write 'Welcome back to webhdfs!', save the files, and exit.

  2. In normal mode, only the HTTP service is supported. Log in to the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HDFS > Configurations > All Configurations. Type dfs.http.policy in the research box, select HTTP_ONLY, click Save Configuration, and select Restart the affected services or instances. Click OK to restart the HDFS service.

    HTTP_ONLY is selected by default.

Procedure

  1. Log in to the FusionInsight Manager portal, click Cluster > Name of the desired cluster > Services, and then select HDFS. The HDFS page is displayed.

    Because webhdfs is accessed through HTTP, you need to obtain the IP address of the active NameNode and the HTTP port.

    1. Click Instances, view the host name and IP address of the active NameNode.
    2. Click Configurations, search namenode.http.port in the search box (9870).

  2. Create a directory by referring to the following link:

    http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Make_a_Directory

    Click the link, Figure 1 is displayed:

    Figure 1 Example code of creating a directory

    Go to the /opt/client directory, the installation directory of the client, and create the huawei directory.

    1. Run the following command to check whether the huawei directory exists in the current path.

      hdfs dfs -ls /

      The running results are as follows:

      linux1:/opt/client # hdfs dfs -ls /
      16/04/22 16:10:02 INFO hdfs.PeerCache: SocketCache disabled.
      Found 7 items
      -rw-r--r--   3 hdfs   supergroup          0 2016-04-20 18:03 /PRE_CREATE_DIR.SUCCESS
      drwxr-x---   - flume  hadoop              0 2016-04-20 18:02 /flume
      drwx------   - hbase  hadoop              0 2016-04-22 15:19 /hbase
      drwxrwxrwx   - mapred hadoop              0 2016-04-20 18:02 /mr-history
      drwxrwxrwx   - spark  supergroup          0 2016-04-22 15:19 /sparkJobHistory
      drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 14:51 /tmp
      drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 14:50 /user

      The huawei directory does not exist in the current path.

    2. Run the command in Figure 1 that is named with huawei. Replace the <HOST> and <PORT>in the command with the host name or IP address and port number that are obtained in 1. Type the huawei as the directory in the <PATH>.

      <HOST> can be replaced by the host name or IP address. It is noted that the port of HTTP is different from the port of HTTPS.

      • Run the following command to access HTTP:
        curl -i -X PUT --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei?user.name=test&op=MKDIRS"

        In the command, <HOST> is replaced by linux1 and <PORT> is replaced by 9870.

        The test in the preceding command is the user who performs the operation. The user must confirm with the administrator for the permission.

      • The running result is displayed as follows:
        HTTP/1.1 200 OK
        Cache-Control: no-cache
        Expires: Thu, 14 Jul 2016 08:04:39 GMT
        Date: Thu, 14 Jul 2016 08:04:39 GMT
        Pragma: no-cache
        Expires: Thu, 14 Jul 2016 08:04:39 GMT
        Date: Thu, 14 Jul 2016 08:04:39 GMT
        Pragma: no-cache
        Content-Type: application/json
        X-FRAME-OPTIONS: SAMEORIGIN
        Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468519479514&s=/j/J+ZnVrN7NSz1yKnB2JVIwkj0="; Path=/; Expires=Thu, 14-Jul-2016 18:04:39 GMT; HttpOnly
        Transfer-Encoding: chunked
        {"boolean":true}

        If {"boolean":true} returns, the huawei directory is successfully created.

    3. Run the following command to check the huawei directory in the path.
      linux1:/opt/client # hdfs dfs -ls /
      16/04/22 16:14:25 INFO hdfs.PeerCache: SocketCache disabled.
      Found 8 items
      -rw-r--r--   3 hdfs   supergroup          0 2016-04-20 18:03 /PRE_CREATE_DIR.SUCCESS
      drwxr-x---   - flume  hadoop              0 2016-04-20 18:02 /flume
      drwx------   - hbase  hadoop              0 2016-04-22 15:19 /hbase
      drwxr-xr-x   - hdfs  supergroup          0 2016-04-22 16:13 /huawei
      drwxrwxrwx   - mapred hadoop              0 2016-04-20 18:02 /mr-history
      drwxrwxrwx   - spark  supergroup          0 2016-04-22 16:12 /sparkJobHistory
      drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 14:51 /tmp
      drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 16:10 /user

  3. Create a command of the upload request to obtain the information about Location where the DataNode IP address is written in.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -i -X PUT --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=CREATE"
    • The running result is displayed as follows:
      HTTP/1.1 307 TEMPORARY_REDIRECT
      Cache-Control: no-cache
      Expires: Thu, 14 Jul 2016 08:53:07 GMT
      Date: Thu, 14 Jul 2016 08:53:07 GMT
      Pragma: no-cache
      Expires: Thu, 14 Jul 2016 08:53:07 GMT
      Date: Thu, 14 Jul 2016 08:53:07 GMT
      Pragma: no-cache
      Content-Type: application/octet-stream
      X-FRAME-OPTIONS: SAMEORIGIN
      Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468522387880&s=OIksfRJvEkh/Out9y2Ot2FvrxWk="; Path=/; Expires=Thu, 14-Jul-2016 18:53:07 GMT; HttpOnly
      Location: 
      http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=CREATE&user.name=hdfs&namenoderpcaddress=hacluster&createflag=&createparent=true&overwrite=false
      Content-Length: 0

  4. According to the Location information, create the testHdfs file in the /huawei/testHdfs file on the HDFS and upload the content in the local testFile file into the testHdfs file.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -i -X PUT -T testFile --negotiate -u: "http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=CREATE&user.name=test&namenoderpcaddress=hacluster&createflag=&createparent=true&overwrite=false"
    • The running result is displayed as follows:
      HTTP/1.1 100 Continue
      HTTP/1.1 201 Created
      Location: hdfs://hacluster/testHdfs
      Content-Length: 0
      Connection: close

  5. Go to the /huawei/testHdfs directory and read the content of testHdfs file.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -L --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs??user.name=test&op=OPEN"
    • The running result is displayed as follows:
      Hello, webhdfs user!

  6. Create a command of the upload request to obtain the information about Location where the DataNode IP address of testHdfs file is written in.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -i -X POST --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs??user.name=test&op=APPEND"
    • The running result is displayed as follows:
      HTTP/1.1 307 TEMPORARY_REDIRECT
      Cache-Control: no-cache
      Expires: Thu, 14 Jul 2016 09:18:30 GMT
      Date: Thu, 14 Jul 2016 09:18:30 GMT
      Pragma: no-cache
      Expires: Thu, 14 Jul 2016 09:18:30 GMT
      Date: Thu, 14 Jul 2016 09:18:30 GMT
      Pragma: no-cache
      Content-Type: application/octet-stream
      X-FRAME-OPTIONS: SAMEORIGIN
      Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468523910234&s=JGK+6M6PsVMFdAw2cgIHaKU1kBM="; Path=/; Expires=Thu, 14-Jul-2016 19:18:30 GMT; HttpOnly
      Location: 
      http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=APPEND&user.name=hdfs&namenoderpcaddress=hacluster
      Content-Length: 0

  7. According to the Location information,add the content in the local testFileAppend file to the testHdfs file that is in the /huawei/testHdfs directory of HDFS.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -i -X POST -T testFileAppend --negotiate -u: "http://linux1:25010/webhdfs/v1/huawei/testHdfs?user.name=test&op=APPEND&namenoderpcaddress=hacluster"
    • The running result is displayed as follows:
      HTTP/1.1 100 Continue
      HTTP/1.1 200 OK
      Content-Length: 0
      Connection: close

  8. Go to the /huawei/testHdfs directory and read all content in the testHdfs file.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -L --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=OPEN"
    • The running result is displayed as follows:
      Hello, webhdfs user!
      Welcome back to webhdfs!

  9. List details of all directory and file information in the huawei directory of the HDFS.

    LISTSTATUS will return all child files and folders information in a single request.
    • Run the following command to access HTTP.
      linux1:/opt/client # curl --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=LISTSTATUS"
    • The result is displayed as follows:
      {"FileStatuses":{"FileStatus":[
      {"accessTime":1462425245595,"blockSize":134217728,"childrenNum":0,"fileId":17680,"group":"supergroup","length":70,"modificationTime":1462426678379,"owner":"test","pathSuffix":"","permission":"755","replication":3,"storagePolicy":0,"type":"FILE"}
      ]}}
    LISTSTATUS along with size and startafter param will help in fetching the child files and folders information through multiple request, thus avoiding the user interface from becoming slow when there are millions of child information to be fetched.
    • Run the following command to access HTTP.
      linux1:/opt/client # curl --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/?user.name=test&op=LISTSTATUS&startafter=sparkJobHistory&size=1"
    • The result is displayed as follows:
      {"FileStatuses":{"FileStatus":[
      {"accessTime":1462425245595,"blockSize":134217728,"childrenNum":0,"fileId":17680,"group":"supergroup","length":70,"modificationTime":1462426678379,"owner":"test","pathSuffix":"testHdfs","permission":"755","replication":3,"storagePolicy":0,"type":"FILE"}
      ]}}

  10. Delete thetestHdfs file that is in the /huawei/testHdfs directory of HDFS.

    • Run the following command to access HTTP:
      linux1:/opt/client # curl -i -X DELETE  --negotiate -u: "http://linux1:25002/webhdfs/v1/huawei/testHdfs?user.name=test&op=DELETE"
    • The running result is displayed as follows:
      HTTP/1.1 200 OK
      Cache-Control: no-cache
      Expires: Thu, 14 Jul 2016 10:27:44 GMT
      Date: Thu, 14 Jul 2016 10:27:44 GMT
      Pragma: no-cache
      Expires: Thu, 14 Jul 2016 10:27:44 GMT
      Date: Thu, 14 Jul 2016 10:27:44 GMT
      Pragma: no-cache
      Content-Type: application/json
      X-FRAME-OPTIONS: SAMEORIGIN
      Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468528064220&s=HrvUEd72+V5L4GwCLC/sG3xTI0o="; Path=/; Expires=Thu, 14-Jul-2016 20:27:44 GMT; HttpOnly
      Transfer-Encoding: chunked
      {"boolean":true}

The Key Management Server (KMS) uses the HTTP REST API to provide key management services for external systems. For details about the API, see

http://hadoop.apache.org/docs/r3.1.1/hadoop-kms/index.html.

As REST API reference has done security hardening to prevent script injection attack. Through REST API reference, it cannot create directory and file name which contain those key words "<script ", "<iframe", "<frame", "javascript:".