Kerberos and delegation tokens security with WebHDFS
By David WORMS
Jul 25, 2013
- Categories
- Cyber Security
- Tags
- HTTP
- HDFS
- Big Data
- Kerberos
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
WebHDFS is an HTTP Rest server bundle with the latest version of Hadoop. What interests me on this article is to dig into security with the Kerberos and delegation tokens functionalities. I will cover its usages from the command line and a programming language perspective.
Don’t crawl the web looking for a command to start it. Indeed, it is already available as part of the namenode HTTP interface, by default, on port 50070.
Lets review how a URL is built. Considering a namenode running on a “nn” host and a default port of 50070, all the URLs start as http://nn:50070
. Then the URL path is prefix by “/webhdfs/v1” to guaranty that WebHDFS clients will talk to clusters with different Hadoop versions. The remaining of the URL path indicates the HDFS path point to a file or a directory. Inside the URL query parameters, the “op” parameter tells the type of operation to execute, for example “LISTSTATUS” to list the content of a directory.
Speaking URL, here’s an interesting side note. There is no support for HTTPS at the moment. Going through the WebHDFS Jira issue umbrella, there is no mention of implementing it. Maybe because the usage of Kerberos prevents the transmission of password in clear or maybe shall we just create on? A solution is to use a secured proxy HTTP server in front of WebHDFS.
So a basic URL to list the content of the directory “/user/test” is:
curl -s "http://nn:50070/webhdfs/v1/user/test?op=LISTSTATUS"
Question, how do we secure this request? WebHDFS proposes two solutions. The example codes below initiates the Kerberos tickets from a keytab instead of password.
The first uses Kerberos to send the request. CURL knows how to do this with the --negotiate
option. Here’s an example:
kinit -kt /etc/security/keytabs/test.headless.keytab test && {
curl -s --negotiate -u : "http://nn:50070/webhdfs/v1/user/test?op=LISTSTATUS"
kdestroy
}
The second obtains a delegation token using a Kerberos request and uses the token to send the request. Said differently, it uses the first method to get the token and then just pass the token in the URLs. In this example, we obtain the token and destroy our Kerberos ticket to show that it has no influence. In final request, we added to “delegation” parameter to the URL.
kinit -kt /etc/security/keytabs/test.headless.keytab test && {
token=`curl -s --negotiate -u : "http://nn:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN"`
token=`echo $token | grep -Po 'urlString":"\K[^"]*'`
kdestroy
curl -s "http://nn:50070/webhdfs/v1/user/test?delegation=#{token}&op=LISTSTATUS"
}