Help Center/ MapReduce Service/ Developer Guide (Normal_3.x)/ HDFS Development Guide (Normal Mode)/ More Information/ Common API Introduction/ HTTP REST API

Updated on 2022-11-18 GMT+08:00

View PDF

HTTP REST API

Function Description

Users can use the application programming interface (API) of Representational State Transfer (REST) to create, read and write, append, and delete files. For details of the REST API, see the following official guidelines:

http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html.

Preparing Running Environment

Install the client. Install the client on the node. For example, install the client in the /opt/client directory. See details in "Installing the Client."
1. Prepare files testFile and testFileAppend and write content 'Hello, webhdfs user!' and 'Welcome back to webhdfs!'. Run the following command to prepare testFile and testFileAppend files:
  touch testFile
  
  vi testFile
  
  Write 'Hello, webhdfs user!", save the files, and exit.
  
  touch testFileAppend
  
  vi testFileAppend
  
  Write 'Welcome back to webhdfs!', save the files, and exit.
In normal mode, only the HTTP service is supported. Log in to the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HDFS > Configurations > All Configurations. Type dfs.http.policy in the research box, select HTTP_ONLY, click Save Configuration, and select Restart the affected services or instances. Click OK to restart the HDFS service.

HTTP_ONLY is selected by default.

Procedure

Log in to the FusionInsight Manager portal, click Cluster > Name of the desired cluster > Services, and then select HDFS. The HDFS page is displayed.

Because webhdfs is accessed through HTTP, you need to obtain the IP address of the active NameNode and the HTTP port.
1. Click Instances, view the host name and IP address of the active NameNode.
2. Click Configurations, search namenode.http.port in the search box (9870).

Create a directory by referring to the following link:

http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Make_a_Directory

Click the link, Figure 1 is displayed:

Figure 1 Example code of creating a directory

Go to the /opt/client directory, the installation directory of the client, and create the huawei directory.

Run the following command to check whether the huawei directory exists in the current path.

hdfs dfs -ls /

The running results are as follows:

linux1:/opt/client # hdfs dfs -ls /
16/04/22 16:10:02 INFO hdfs.PeerCache: SocketCache disabled.
Found 7 items
-rw-r--r--   3 hdfs   supergroup          0 2016-04-20 18:03 /PRE_CREATE_DIR.SUCCESS
drwxr-x---   - flume  hadoop              0 2016-04-20 18:02 /flume
drwx------   - hbase  hadoop              0 2016-04-22 15:19 /hbase
drwxrwxrwx   - mapred hadoop              0 2016-04-20 18:02 /mr-history
drwxrwxrwx   - spark  supergroup          0 2016-04-22 15:19 /sparkJobHistory
drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 14:51 /tmp
drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 14:50 /user

The huawei directory does not exist in the current path.

Run the command in Figure 1 that is named with huawei. Replace the <HOST> and <PORT>in the command with the host name or IP address and port number that are obtained in 1. Type the huawei as the directory in the <PATH>.

<HOST> can be replaced by the host name or IP address. It is noted that the port of HTTP is different from the port of HTTPS.
- Run the following command to access HTTP:
```
curl -i -X PUT --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei?user.name=test&op=MKDIRS"
```
  In the command, <HOST> is replaced by linux1 and <PORT> is replaced by 9870.
  
  The test in the preceding command is the user who performs the operation. The user must confirm with the administrator for the permission.
- The running result is displayed as follows:
```
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Thu, 14 Jul 2016 08:04:39 GMT
Date: Thu, 14 Jul 2016 08:04:39 GMT
Pragma: no-cache
Expires: Thu, 14 Jul 2016 08:04:39 GMT
Date: Thu, 14 Jul 2016 08:04:39 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468519479514&s=/j/J+ZnVrN7NSz1yKnB2JVIwkj0="; Path=/; Expires=Thu, 14-Jul-2016 18:04:39 GMT; HttpOnly
Transfer-Encoding: chunked
{"boolean":true}
```
  If {"boolean":true} returns, the huawei directory is successfully created.

Run the following command to check the huawei directory in the path.

linux1:/opt/client # hdfs dfs -ls /
16/04/22 16:14:25 INFO hdfs.PeerCache: SocketCache disabled.
Found 8 items
-rw-r--r--   3 hdfs   supergroup          0 2016-04-20 18:03 /PRE_CREATE_DIR.SUCCESS
drwxr-x---   - flume  hadoop              0 2016-04-20 18:02 /flume
drwx------   - hbase  hadoop              0 2016-04-22 15:19 /hbase
drwxr-xr-x   - hdfs  supergroup          0 2016-04-22 16:13 /huawei
drwxrwxrwx   - mapred hadoop              0 2016-04-20 18:02 /mr-history
drwxrwxrwx   - spark  supergroup          0 2016-04-22 16:12 /sparkJobHistory
drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 14:51 /tmp
drwxrwxrwx   - hdfs   hadoop              0 2016-04-22 16:10 /user

Create a command of the upload request to obtain the information about Location where the DataNode IP address is written in.

Run the following command to access HTTP:

linux1:/opt/client # curl -i -X PUT --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=CREATE"

The running result is displayed as follows:

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Thu, 14 Jul 2016 08:53:07 GMT
Date: Thu, 14 Jul 2016 08:53:07 GMT
Pragma: no-cache
Expires: Thu, 14 Jul 2016 08:53:07 GMT
Date: Thu, 14 Jul 2016 08:53:07 GMT
Pragma: no-cache
Content-Type: application/octet-stream
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468522387880&s=OIksfRJvEkh/Out9y2Ot2FvrxWk="; Path=/; Expires=Thu, 14-Jul-2016 18:53:07 GMT; HttpOnly
Location: 
http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=CREATE&user.name=hdfs&namenoderpcaddress=hacluster&createflag=&createparent=true&overwrite=false
Content-Length: 0

According to the Location information, create the testHdfs file in the /huawei/testHdfs file on the HDFS and upload the content in the local testFile file into the testHdfs file.

Run the following command to access HTTP:

linux1:/opt/client # curl -i -X PUT -T testFile --negotiate -u: "http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=CREATE&user.name=test&namenoderpcaddress=hacluster&createflag=&createparent=true&overwrite=false"

The running result is displayed as follows:

HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Location: hdfs://hacluster/testHdfs
Content-Length: 0
Connection: close

Go to the /huawei/testHdfs directory and read the content of testHdfs file.
- Run the following command to access HTTP:
```
linux1:/opt/client # curl -L --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs??user.name=test&op=OPEN"
```
- The running result is displayed as follows:
```
Hello, webhdfs user!
```

Create a command of the upload request to obtain the information about Location where the DataNode IP address of testHdfs file is written in.

Run the following command to access HTTP:

linux1:/opt/client # curl -i -X POST --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs??user.name=test&op=APPEND"

The running result is displayed as follows:

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Thu, 14 Jul 2016 09:18:30 GMT
Date: Thu, 14 Jul 2016 09:18:30 GMT
Pragma: no-cache
Expires: Thu, 14 Jul 2016 09:18:30 GMT
Date: Thu, 14 Jul 2016 09:18:30 GMT
Pragma: no-cache
Content-Type: application/octet-stream
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468523910234&s=JGK+6M6PsVMFdAw2cgIHaKU1kBM="; Path=/; Expires=Thu, 14-Jul-2016 19:18:30 GMT; HttpOnly
Location: 
http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=APPEND&user.name=hdfs&namenoderpcaddress=hacluster
Content-Length: 0

According to the Location information,add the content in the local testFileAppend file to the testHdfs file that is in the /huawei/testHdfs directory of HDFS.

Run the following command to access HTTP:

linux1:/opt/client # curl -i -X POST -T testFileAppend --negotiate -u: "http://linux1:25010/webhdfs/v1/huawei/testHdfs?user.name=test&op=APPEND&namenoderpcaddress=hacluster"

The running result is displayed as follows:

HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Content-Length: 0
Connection: close

Go to the /huawei/testHdfs directory and read all content in the testHdfs file.

Run the following command to access HTTP:

linux1:/opt/client # curl -L --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=OPEN"

The running result is displayed as follows:

Hello, webhdfs user!
Welcome back to webhdfs!

List details of all directory and file information in the huawei directory of the HDFS.

LISTSTATUS will return all child files and folders information in a single request.

Run the following command to access HTTP.

linux1:/opt/client # curl --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=LISTSTATUS"

The result is displayed as follows:

{"FileStatuses":{"FileStatus":[
{"accessTime":1462425245595,"blockSize":134217728,"childrenNum":0,"fileId":17680,"group":"supergroup","length":70,"modificationTime":1462426678379,"owner":"test","pathSuffix":"","permission":"755","replication":3,"storagePolicy":0,"type":"FILE"}
]}}

LISTSTATUS along with size and startafter param will help in fetching the child files and folders information through multiple request, thus avoiding the user interface from becoming slow when there are millions of child information to be fetched.

Run the following command to access HTTP.

linux1:/opt/client # curl --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/?user.name=test&op=LISTSTATUS&startafter=sparkJobHistory&size=1"

The result is displayed as follows:

{"FileStatuses":{"FileStatus":[
{"accessTime":1462425245595,"blockSize":134217728,"childrenNum":0,"fileId":17680,"group":"supergroup","length":70,"modificationTime":1462426678379,"owner":"test","pathSuffix":"testHdfs","permission":"755","replication":3,"storagePolicy":0,"type":"FILE"}
]}}

Delete thetestHdfs file that is in the /huawei/testHdfs directory of HDFS.

Run the following command to access HTTP:

linux1:/opt/client # curl -i -X DELETE  --negotiate -u: "http://linux1:25002/webhdfs/v1/huawei/testHdfs?user.name=test&op=DELETE"

The running result is displayed as follows:

HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Thu, 14 Jul 2016 10:27:44 GMT
Date: Thu, 14 Jul 2016 10:27:44 GMT
Pragma: no-cache
Expires: Thu, 14 Jul 2016 10:27:44 GMT
Date: Thu, 14 Jul 2016 10:27:44 GMT
Pragma: no-cache
Content-Type: application/json
X-FRAME-OPTIONS: SAMEORIGIN
Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468528064220&s=HrvUEd72+V5L4GwCLC/sG3xTI0o="; Path=/; Expires=Thu, 14-Jul-2016 20:27:44 GMT; HttpOnly
Transfer-Encoding: chunked
{"boolean":true}

The Key Management Server (KMS) uses the HTTP REST API to provide key management services for external systems. For details about the API, see

http://hadoop.apache.org/docs/r3.1.1/hadoop-kms/index.html.

As REST API reference has done security hardening to prevent script injection attack. Through REST API reference, it cannot create directory and file name which contain those key words "<script ", "<iframe", "<frame", "javascript:".

Parent topic: Common API Introduction

Previous topic: C API

Next topic: Shell Command Introduce