Managing authorizations with Apache Sentry
By Axel JACQIN
Jul 24, 2017
- Categories
- Data Governance
- Tags
- Hue
- Database
- LDAP
- Nikita
- Sentry
- Ansible
- CDH
- Deployment [more][less]
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
Apache Sentry is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster.
With this article, we will show you how we are using Apache Sentry at Adaltas. For this demonstration, we have pulled one use case that we have been faced on at one of our customer: Pages Jaunes. For the sake of privacy, data displayed in this article is faked.
The use case
Pages Jaunes is the biggest French web-announcer, and sells its audience to its customer. In order to convince potential customers, commercials have to present potential outcomes to their customer to join Pages Jaunes website. To this extent, Data Scientist in Pages Jaunes have to predict at best the plausible audience of their customer. So we had to introduce to the market team the Data Lake already present.
Let’s call these Data Scientists marketing_analysts
. Our objective is to give read access for marketing_analysts
to the database dw_audience
and to give full access to a sandbox database that we are going to call dw_marketing_analysts
.
Configuration:
- CDH 5.8 on CentOs 6
- Hue 3.10
First step: Create missing Unix groups and users
The first thing you have to do is to create appropriate Unix group and Unix users on all hosts of your cluster.
If your cluster is connected to an LDAP, just add new entries in your LDAP. If not, rather than create user one by one on each host, you can deploy it using a deployment tools suc as Nikita or Ansible.
Let’s create the group grp_marketing_analysts
, an applicative_user
usr_marketing_analysts
then two data scientists: John Doe and Marcelus Wallace. In the below, we use the command ryba exec
which itself rely on Nikita to distribute SSH commands:
./bin/ryba exec 'sudo groupadd grp_marketing_analysts'
./bin/ryba exec 'sudo adduser -g grp_marketing_analysts usr_marketing_analysts'
./bin/ryba exec 'sudo adduser -G grp_marketing_analysts jdoe'
./bin/ryba exec 'sudo adduser -G grp_marketing_analysts mwallace'
Second step: Create missing database dw_marketing_analysts
First create the HDFS directory with HDFS superuser that will storage the database and set the right permissions. From an edge node:
sudo -u hdfs hdfs dfs -mkdir -p /user/usr_marketing_analysts/warehouse/dw_marketing_analysts
sudo -u hdfs hdfs dfs -chown -R usr_marketing_analysts:grp_marketing_analysts /user/usr_marketing_analysts
According to your policy, set a quota to this directory
sudo -u hdfs hdfs dfsadmin -setSpaceQuota 100g /user/usr_marketing_analysts
Then set an ACL to allow hive and impala users to write into these directories:
sudo -u hdfs hdfs dfs -setfacl -R -m user:impala:rwx /user/usr_marketing_analysts/warehouse
sudo -u hdfs hdfs dfs -setfacl -R -m user:hive:rwx /user/usr_marketing_analysts/warehouse
Create the database with Hive/Impala superuser according to the previous directory
sudo -u hive hive -e CREATE DATABASE dw_marketing_analysts LOCATION '/user/usr_marketing_analysts/warehouse/dw_marketing_analysts'
Third step: Set up privileges with Sentry through the Hue web UI
Create the group and the users in Hue. This part is pretty straight-forward thanks to the Hue web UI.
Go to Security > Hive Tables panel and click on Roles on the left side. Now create a new roles which has to be named with the group name according to your policies.
You have to specify Hive privileges and HDFS privileges.
Here we have set up privileges for nominative users, you can apply the exact same process for applicative users.
We have also set up privileges on databases, but you can apply authorizations finer grained on tables or columns. For more information on privileges and their hierarchies, please visit the Sentry documentation.