HDFS HTTP REST APIs
Function Description
Users can use the application programming interface (API) of Representational State Transfer (REST) to create, read and write, append, and delete files. For details of the REST API, see the following official guidelines:
http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html.
Preparing Running Environment
- Install the client. Install the client on the node. For example, install the client in the /opt/client directory. See details in "Installing the Client."
- Prepare files testFile and testFileAppend and write content 'Hello, webhdfs user!' and 'Welcome back to webhdfs!'. Run the following command to prepare testFile and testFileAppend files:
touch testFile
vi testFile
Write 'Hello, webhdfs user!", save the files, and exit.
touch testFileAppend
vi testFileAppend
Write 'Welcome back to webhdfs!', save the files, and exit.
- Prepare files testFile and testFileAppend and write content 'Hello, webhdfs user!' and 'Welcome back to webhdfs!'. Run the following command to prepare testFile and testFileAppend files:
- In normal mode, only the HTTP service is supported. Log in to the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HDFS > Configurations > All Configurations. Type dfs.http.policy in the research box, select HTTP_ONLY, click Save Configuration, and select Restart the affected services or instances. Click OK to restart the HDFS service.
HTTP_ONLY is selected by default.
Procedure
- Log in to the FusionInsight Manager portal, click Cluster > Name of the desired cluster > Services, and then select HDFS. The HDFS page is displayed.
Because webhdfs is accessed through HTTP, you need to obtain the IP address of the active NameNode and the HTTP port.
- Click Instances, view the host name and IP address of the active NameNode.
- Click Configurations, search namenode.http.port in the search box (9870).
- Create a directory by referring to the following link:
http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Make_a_Directory
Click the link, Figure 1 is displayed:
Go to the /opt/client directory, the installation directory of the client, and create the huawei directory.
- Run the following command to check whether the huawei directory exists in the current path.
hdfs dfs -ls /
The running results are as follows:
linux1:/opt/client # hdfs dfs -ls / 16/04/22 16:10:02 INFO hdfs.PeerCache: SocketCache disabled. Found 7 items -rw-r--r-- 3 hdfs supergroup 0 2016-04-20 18:03 /PRE_CREATE_DIR.SUCCESS drwxr-x--- - flume hadoop 0 2016-04-20 18:02 /flume drwx------ - hbase hadoop 0 2016-04-22 15:19 /hbase drwxrwxrwx - mapred hadoop 0 2016-04-20 18:02 /mr-history drwxrwxrwx - spark supergroup 0 2016-04-22 15:19 /sparkJobHistory drwxrwxrwx - hdfs hadoop 0 2016-04-22 14:51 /tmp drwxrwxrwx - hdfs hadoop 0 2016-04-22 14:50 /user
The huawei directory does not exist in the current path.
- Run the command in Figure 1 that is named with huawei. Replace the <HOST> and <PORT>in the command with the host name or IP address and port number that are obtained in 1. Type the huawei as the directory in the <PATH>.
<HOST> can be replaced by the host name or IP address. It is noted that the port of HTTP is different from the port of HTTPS.
- Run the following command to access HTTP:
curl -i -X PUT --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei?user.name=test&op=MKDIRS"
In the command, <HOST> is replaced by linux1 and <PORT> is replaced by 9870.
The test in the preceding command is the user who performs the operation. The user must confirm with the administrator for the permission.
- The running result is displayed as follows:
HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 14 Jul 2016 08:04:39 GMT Date: Thu, 14 Jul 2016 08:04:39 GMT Pragma: no-cache Expires: Thu, 14 Jul 2016 08:04:39 GMT Date: Thu, 14 Jul 2016 08:04:39 GMT Pragma: no-cache Content-Type: application/json X-FRAME-OPTIONS: SAMEORIGIN Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468519479514&s=/j/J+ZnVrN7NSz1yKnB2JVIwkj0="; Path=/; Expires=Thu, 14-Jul-2016 18:04:39 GMT; HttpOnly Transfer-Encoding: chunked {"boolean":true}
If {"boolean":true} returns, the huawei directory is successfully created.
- Run the following command to access HTTP:
- Run the following command to check the huawei directory in the path.
linux1:/opt/client # hdfs dfs -ls / 16/04/22 16:14:25 INFO hdfs.PeerCache: SocketCache disabled. Found 8 items -rw-r--r-- 3 hdfs supergroup 0 2016-04-20 18:03 /PRE_CREATE_DIR.SUCCESS drwxr-x--- - flume hadoop 0 2016-04-20 18:02 /flume drwx------ - hbase hadoop 0 2016-04-22 15:19 /hbase drwxr-xr-x - hdfs supergroup 0 2016-04-22 16:13 /huawei drwxrwxrwx - mapred hadoop 0 2016-04-20 18:02 /mr-history drwxrwxrwx - spark supergroup 0 2016-04-22 16:12 /sparkJobHistory drwxrwxrwx - hdfs hadoop 0 2016-04-22 14:51 /tmp drwxrwxrwx - hdfs hadoop 0 2016-04-22 16:10 /user
- Run the following command to check whether the huawei directory exists in the current path.
- Create a command of the upload request to obtain the information about Location where the DataNode IP address is written in.
- Run the following command to access HTTP:
linux1:/opt/client # curl -i -X PUT --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=CREATE"
- The running result is displayed as follows:
HTTP/1.1 307 TEMPORARY_REDIRECT Cache-Control: no-cache Expires: Thu, 14 Jul 2016 08:53:07 GMT Date: Thu, 14 Jul 2016 08:53:07 GMT Pragma: no-cache Expires: Thu, 14 Jul 2016 08:53:07 GMT Date: Thu, 14 Jul 2016 08:53:07 GMT Pragma: no-cache Content-Type: application/octet-stream X-FRAME-OPTIONS: SAMEORIGIN Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468522387880&s=OIksfRJvEkh/Out9y2Ot2FvrxWk="; Path=/; Expires=Thu, 14-Jul-2016 18:53:07 GMT; HttpOnly Location: http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=CREATE&user.name=hdfs&namenoderpcaddress=hacluster&createflag=&createparent=true&overwrite=false Content-Length: 0
- Run the following command to access HTTP:
- According to the Location information, create the testHdfs file in the /huawei/testHdfs file on the HDFS and upload the content in the local testFile file into the testHdfs file.
- Run the following command to access HTTP:
linux1:/opt/client # curl -i -X PUT -T testFile --negotiate -u: "http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=CREATE&user.name=test&namenoderpcaddress=hacluster&createflag=&createparent=true&overwrite=false"
- The running result is displayed as follows:
HTTP/1.1 100 Continue HTTP/1.1 201 Created Location: hdfs://hacluster/testHdfs Content-Length: 0 Connection: close
- Run the following command to access HTTP:
- Go to the /huawei/testHdfs directory and read the content of testHdfs file.
- Run the following command to access HTTP:
linux1:/opt/client # curl -L --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs??user.name=test&op=OPEN"
- The running result is displayed as follows:
Hello, webhdfs user!
- Run the following command to access HTTP:
- Create a command of the upload request to obtain the information about Location where the DataNode IP address of testHdfs file is written in.
- Run the following command to access HTTP:
linux1:/opt/client # curl -i -X POST --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs??user.name=test&op=APPEND"
- The running result is displayed as follows:
HTTP/1.1 307 TEMPORARY_REDIRECT Cache-Control: no-cache Expires: Thu, 14 Jul 2016 09:18:30 GMT Date: Thu, 14 Jul 2016 09:18:30 GMT Pragma: no-cache Expires: Thu, 14 Jul 2016 09:18:30 GMT Date: Thu, 14 Jul 2016 09:18:30 GMT Pragma: no-cache Content-Type: application/octet-stream X-FRAME-OPTIONS: SAMEORIGIN Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468523910234&s=JGK+6M6PsVMFdAw2cgIHaKU1kBM="; Path=/; Expires=Thu, 14-Jul-2016 19:18:30 GMT; HttpOnly Location: http://10-120-180-170:25010/webhdfs/v1/testHdfs?op=APPEND&user.name=hdfs&namenoderpcaddress=hacluster Content-Length: 0
- Run the following command to access HTTP:
- According to the Location information,add the content in the local testFileAppend file to the testHdfs file that is in the /huawei/testHdfs directory of HDFS.
- Run the following command to access HTTP:
linux1:/opt/client # curl -i -X POST -T testFileAppend --negotiate -u: "http://linux1:25010/webhdfs/v1/huawei/testHdfs?user.name=test&op=APPEND&namenoderpcaddress=hacluster"
- The running result is displayed as follows:
HTTP/1.1 100 Continue HTTP/1.1 200 OK Content-Length: 0 Connection: close
- Run the following command to access HTTP:
- Go to the /huawei/testHdfs directory and read all content in the testHdfs file.
- Run the following command to access HTTP:
linux1:/opt/client # curl -L --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=OPEN"
- The running result is displayed as follows:
Hello, webhdfs user! Welcome back to webhdfs!
- Run the following command to access HTTP:
- List details of all directory and file information in the huawei directory of the HDFS.
LISTSTATUS will return all child files and folders information in a single request.
- Run the following command to access HTTP.
linux1:/opt/client # curl --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/testHdfs?user.name=test&op=LISTSTATUS"
- The result is displayed as follows:
{"FileStatuses":{"FileStatus":[ {"accessTime":1462425245595,"blockSize":134217728,"childrenNum":0,"fileId":17680,"group":"supergroup","length":70,"modificationTime":1462426678379,"owner":"test","pathSuffix":"","permission":"755","replication":3,"storagePolicy":0,"type":"FILE"} ]}}
LISTSTATUS along with size and startafter param will help in fetching the child files and folders information through multiple request, thus avoiding the user interface from becoming slow when there are millions of child information to be fetched.- Run the following command to access HTTP.
linux1:/opt/client # curl --negotiate -u: "http://linux1:9870/webhdfs/v1/huawei/?user.name=test&op=LISTSTATUS&startafter=sparkJobHistory&size=1"
- The result is displayed as follows:
{"FileStatuses":{"FileStatus":[ {"accessTime":1462425245595,"blockSize":134217728,"childrenNum":0,"fileId":17680,"group":"supergroup","length":70,"modificationTime":1462426678379,"owner":"test","pathSuffix":"testHdfs","permission":"755","replication":3,"storagePolicy":0,"type":"FILE"} ]}}
- Run the following command to access HTTP.
- Delete thetestHdfs file that is in the /huawei/testHdfs directory of HDFS.
- Run the following command to access HTTP:
linux1:/opt/client # curl -i -X DELETE --negotiate -u: "http://linux1:25002/webhdfs/v1/huawei/testHdfs?user.name=test&op=DELETE"
- The running result is displayed as follows:
HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 14 Jul 2016 10:27:44 GMT Date: Thu, 14 Jul 2016 10:27:44 GMT Pragma: no-cache Expires: Thu, 14 Jul 2016 10:27:44 GMT Date: Thu, 14 Jul 2016 10:27:44 GMT Pragma: no-cache Content-Type: application/json X-FRAME-OPTIONS: SAMEORIGIN Set-Cookie: hadoop.auth="u=hdfs&p=hdfs&t=simple&e=1468528064220&s=HrvUEd72+V5L4GwCLC/sG3xTI0o="; Path=/; Expires=Thu, 14-Jul-2016 20:27:44 GMT; HttpOnly Transfer-Encoding: chunked {"boolean":true}
- Run the following command to access HTTP:
The Key Management Server (KMS) uses the HTTP REST API to provide key management services for external systems. For details about the API, see
http://hadoop.apache.org/docs/r3.1.1/hadoop-kms/index.html.
As REST API reference has done security hardening to prevent script injection attack. Through REST API reference, it cannot create directory and file name which contain those key words "<script ", "<iframe", "<frame", "javascript:".
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot