Cloudera提供HTTP方式连接HDFS工具
Hoop是Apache Hadoop贡献者Cloudera公司推出的一款新工具。Hoop可通过REST API提供以HTTP方式访问Hadoop 分布式文件系统(HDFS)的功能。
Hoop是Hadoop NameNode的一个单独服务。例如在UNIX下:
- $ curl http:
- $ curl http:
- $ curl -X POST http:
Hoop遵循Apache License 2.0,并可用于平台间不同版本Hadoop集群之间的数据交换,或访问防火墙后面的数据。

Hoop是一个完全重写的Hadoop HDFS代理。Cloudera表示其具有以下优点:
●支持所有HDFS的操作(读、写)
●JSON格式的数据状态(文件的状态、操作状态、错误信息)
●Kerberos HTTP SPNEGO client/server authentication and pseudo authentication out of the box (using Alfredo)
●Hadoop代理用户的支持
●提供在任何集群运行的工具,如DistCP(李智/译)
from:
http://cloud.csdn.net/a/20110722/302086.html
http://www.readwriteweb.com/hack/2011/07/access-hadoop-hdfs-over-http.php
What is Hoop?
Hoop provides access to all Hadoop Distributed File System (HDFS) operations (read and write) over HTTP/S.
Hoop can be used to:
- Access HDFS using HTTP REST.
- Transfer data between clusters running different versions of Hadoop (thereby overcoming RPC versioning issues).
- Access data in a HDFS cluster behind a firewall. The Hoop server acts as a gateway and is the only system that is allowed to go through the firewall.
Hoop has a Hoop client and a Hoop server component:
- The Hoop server component is a REST HTTP gateway to HDFS supporting all file system operations. It can be accessed using standard HTTP tools (i.e. curl and wget), HTTP libraries from different programing languages (i.e. Perl, JavaScript) as well as using the Hoop client. The Hoop server component is a standard Java web-application and it has been implemented using Jersey (JAX-RS).
- The Hoop client component is an implementation of Hadoop FileSystem client that allows using the familiar Hadoop filesystem API to access HDFS data through a Hoop server.
Hoop and Hadoop HDFS Proxy
Hoop server is a full rewrite of Hadoop HDFS Proxy. Although it is similar to Hadoop HDFS Proxy (runs in a servlet-container, provides a REST API, pluggable authentication and authorization), Hoop server improves many of Hadoop HDFS Proxy shortcomings by providing:
- Support for all HDFS operations (read, write, status).
- Cleaner HTTP REST API.
- JSON format for status data (files status, operations status, error messages).
- Kerberos HTTP SPNEGO client/server authentication and pseudo authentication out of the box (using Alfredo).
- Hadoop proxy-user support.
- Tools such as DistCP could run on either cluster.
Accessing HDFS files -via Hoop- using Unix ‘curl’ command
Assuming Hoop is running on http://hoopbar:14000, the following examples show how the Unix ‘curl’ command can be used to access data in HDFS via Hoop using pseudo authentication.
Getting the home directory:
$ curl -i "http://hoopbar:14000?op=homedir&user.name=babu"
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{"homeDir":"http:\/\/hoopbar:14000\/user\/babu"}
$
Reading a file:
$ curl -i "http://hoopbar:14000?/user/babu/hello.txt&user.name=babu"
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Hello World!
$
Writing a file:
$ curl -i -X POST "http://hoopbar:14000/user/babu/data.txt?op=create" --data-binary @mydata.txt --header "content-type: application/octet-stream"
HTTP/1.1 200 OK
Location: http://hoopbar:14000/user/babu/data.txt
Content-Type: application/json
Content-Length: 0
$
Listing the contents of a directory:
$ curl -i "http://hoopbar:14000?/user/babu?op=list&user.name=babu"
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
[
{
"path" : "http:\/\/hoopbar:14000\/user\/babu\/data.txt"
"isDir" : false,
"len" : 966,
"owner" : "babu",
"group" : "supergroup",
"permission" : "-rw-r--r--",
"accessTime" : 1310671662423,
"modificationTime" : 1310671662423,
"blockSize" : 67108864,
"replication" : 3
}
]
$
Click this link for more details about the Hoop HTTP REST API.
Getting Hoop
Hoop is distributed with an Apache License 2.0.
The source code is available at http://github.com/cloudera/hoop.
Instructions on how to build, install and configure Hoop server and the rest of documentation is available at http://cloudera.github.com/hoop.
Contributing Hoop to Apache Hadoop
The goal is to contribute Hoop to Apache Hadoop as the next generation of Hadoop HDFS proxy. We are just waiting on the Mavenization of Hadoop Common and Hadoop HDFS which will make integration easier.
from: http://www.cloudera.com/blog/2011/07/hoop-hadoop-hdfs-over-http/