Getting Started with Sentry in Hive

Apache Sentry (incubating) is a highly modular system for providing fine-grained role based authorization to both data and metadata stored on an Apache Hadoop cluster. It currently works out of the box with Apache Hive and Cloudera Impala. In this blog post, you will learn how to use Sentry with Hive.

Sentry uses a policy provider to define the access control to Hive. Sentry currently ships with a file-based policy provider, see below for an example. A single global policy file can be used to control access to an entire HiveServer2 instance, and multiple dependent per database policy files can be linked to the global one. Lets look at the structure of policy file with an example.

Global policy file:

[groups]
admin_group = admin_role
dep1_admin = uri_role

[roles]
admin_role = server=server1
uri_role = hdfs:///ha-nn-uri/data

[databases]
db1 = hdfs://ha-nn-uri/user/hive/sentry/db1.ini

Per db policy file: (at hdfs://ha-nn-uri/user/hive/sentry/db1.ini)

[groups]
dep1_admin = db1_admin_role
dep1_analyst = db1_read_role

[roles]
db1_admin_role = server=server1->db=db1
db1_read_role = server=server1->db=db1->table=*->action=select

As you can see above, there are usually three sections in the global policy file:

A [groups] section that provides group-to-role mapping
A [roles] section that provides role-to-privileges mapping
A [databases] (optional) section that provides database-to-per-database policy file mapping. This allows for maintaining per-database privileges separately.

Sentry provides authorization through a hook in HiveServer2. When a user makes a connection to HiveServer2, it authenticates the connecting user and persists the user information for the session. For the subsequent operations that user performs, Sentry authorizes the operation by mapping the user to the groups he/she belongs to and determining whether the group(s) have necessary privileges on the relevant objects.

Hive security landscape with Sentry

Next, lets look at how Sentry fits into the security landscape of Hive. The below infographic shows how different authentication and authorization pieces fit together.

Here are the main points to take away:

Sentry requires that HiveServer2 be configured to use strong authentication. HiveServer2 supports Kerberos as well as LDAP (and AD) authentication mechanisms.
At the Sentry authorization level, there are two supported forms of user-group mappings:

HadoopGroup mapping, which uses the underlying Hadoop groups

Hadoop groups in turn support Shell-based mapping as well as LDAP group mapping. Please note that in case of Sentry with Hive, the mapping of users to groups is performed on the HiveServer2 host

LocalGroups, where the users and groups can be defined locally in the policy file using [users] section (for testing purposes only)

Demo

In this demo, we will be using Kerberos authentication for HiveServer2 with HadoopGroups as the Sentry group provider, which by default uses Shell mapping. We briefly go over Sentry and see how to configure and use it in this configuration. (Note: Cloudera Manager 4.7 and CDH 4.4 are shown here; for future versions, the steps will be similar.)

Conclusion

Sentry brings in fine-grained authorization support for both data and metadata in a Hadoop cluster. It is already being used in production systems to secure the data and provide fine-grained access to its users. It is also integrated with the version of Hive shipping in CDH (upstream contribution is pending), Cloudera Impala, and Cloudera Search. Also, here is a short demo if you are interested in using it with Hue.