Adaltas

Kerberos and delegation tokens security with WebHDFS

WebHDFS is an HTTP Rest server bundle with the latest version of Hadoop. What interest me on this article is to dig into security with the Kerberos and delegation tokens functionalities. I will cover its usages from the command line and a programming language perspective.

Don’t crawl the web looking for a command to start it. Indeed, it is already available as part of the namenode HTTP interface, by default, on port 50070.

Lets review how a URL is build. Considering a namenode running on a “nn” host and a default port of 50070, all the URLs start as “http://nn:50070”. Then the URL path is prefix by “/webhdfs/v1” to guaranty that WebHDFS clients will talk to clusters with different Hadoop versions. The remaining of the URL path indicates the HDFS path point to a file or a directory. Inside the URL query parameters, the “op” parameter tells the type of operation to execute, for example “LISTSTATUS” to list the content of a directory.

Speaking URL, here’s an interesting side note. There is no support for https at the moment. Going through the WebHDFS Jira issue umbrella, there is no mention of implementing it. Maybe because the usage of Kerberos prevent the transmission of password in clear or maybe shall we just create on ? A solution is to use a secured proxy HTTP server in front of WebHDFS.

So a basic URL to list the content of the directory “/user/test” is:

1
curl -s "http://nn:50070/webhdfs/v1/user/test?op=LISTSTATUS"

Question, how do we secure this request? WebHDFS propose two solutions. The example codes below initiate the Kerberos tickets from a keytab instead of password.

The first uses Kerberos to send the request. CURL knows how to do this with the “–negotiate” option. Here’s an example:

1
2
3
4
kinit -kt /etc/security/keytabs/test.headless.keytab test && {
  curl -s --negotiate -u : "http://nn:50070/webhdfs/v1/user/test?op=LISTSTATUS"
  kdestroy
}

The second obtains a delegation token using a Kerberos request and uses the token to send the request. Said differently, it uses the first method to get the token and then just pass the token in the URLs. In this example, we obtain the token and destroy our Kerberos ticket to show that it has no influence. In final request, we added to “delegation” parameter to the URL.

1
2
3
4
5
6
kinit -kt /etc/security/keytabs/test.headless.keytab test && {
  token=`curl -s --negotiate -u : "http://nn:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN"`
  token=`echo $token | grep -Po 'urlString":"\K[^"]*'`
  kdestroy
  curl -s "http://nn:50070/webhdfs/v1/user/test?delegation=#{token}&op=LISTSTATUS"
}

Comments