Security in Apache Hadoop Ozone - 1

Apache Hadoop Ozone is a highly scalable distributed object store for big data applications [1]. This blog post provides an overview of Ozone security and details required to set up secure Ozone cluster.

The Ozone security architecture is described in detail in [2].

Authentication, Authorization and Auditing are three basic tenets of security. Ozone security design borrows heavily from Apache Hadoop security. Having said that, there are areas where Ozone security differs from Hadoop. Let’s have a closer look at what this exactly means.

Authentication

Similar to hadoop, Ozone allows kerberos-based authentication. So one way to setup identities for all the daemons and clients is to create kerberos keytabs and configure it like any other service in hadoop.

Below are some important configurations to configure kerberos security.

hdds.scm.kerberos.principal=scm/scm@EXAMPLE.COM

hdds.scm.kerberos.keytab.file=/etc/security/keytabs/scm.keytab

ozone.om.kerberos.principal=om/om@EXAMPLE.COM

ozone.om.kerberos.keytab.file=/etc/security/keytabs/om.keytab

hdds.scm.http.kerberos.principal=HTTP/scm@EXAMPLE.COM

hdds.scm.http.kerberos.keytab=/etc/security/keytabs/HTTP.keytab

ozone.om.http.kerberos.principal=HTTP/om@EXAMPLE.COM

ozone.om.http.kerberos.keytab=/etc/security/keytabs/HTTP.keytab

Certificates

Apart from kerberos and tokens Ozone utilizes certificate based authentication for Ozone service components. To enable this, SCM (StorageContainerManager) bootstraps itself as an Certificate Authority when security is enabled. This allows all daemons inside Ozone to have an SCM signed certificate. Below is brief descriptions of steps involved:

  1. Datanodes and OzoneManagers submits a CSR (certificate signing request) to SCM.
  2. SCM verifies identity of DN (Datanode) or OM via Kerberos and generates a certificate.
  3. This certificate is used by OM and DN to prove their identities.
  4. Datanodes use OzoneManager certificate to validate block tokens. This is possible because both of them trust SCM signed certificates. (i.e OzoneManager and Datanodes)

Tokens

Tokens are widely used in distributed systems as mean to achieve lightweight authentication without compromising on security. Main motivation for using tokens inside Ozone is to prevent the unauthorized access while keeping the protocol lightweight and without sharing secret over the wire. Ozone utilizes three types of token:

Delegation token

Once client establishes their identity via kerberos they can request a delegation token from OzoneManager. This token can be used by a client to prove its identity until the token expires. Like Hadoop delegation tokens, an Ozone delegation token has 3 important fields:

Renewer: User responsible for renewing the token.

Issue date: Time at which token was issued.

Max date: Time after which token can’t be renewed.

Token operations like get, renew and cancel can only be performed over an Kerberos authenticated connection. Clients can use delegation token to establish connection with OzoneManager and perform any file system/object store related operations like, listing the objects in a bucket or creating a volume etc.

Block Tokens

Block tokens are similar to delegation tokens in sense that they are signed by OzoneManager. But this is where similarity between two stops. Block tokens are created by OM (OzoneManager) when a client request involves interaction with DataNodes. Unlike delegation tokens there is no client API to request block tokens. Instead they are handled transparently for client. Block tokens are embedded directly into client request responses. It means that clients don’t need to fetch it explicitly from Ozone Manager. This is handled implicitly inside ozone client. Datanodes validates block tokens from clients for every client connection. Below sequence diagram shows steps involved in block token.

image2.png

S3Token

Like block tokens S3Tokens are handled transparently for clients. It is signed by S3secret created by client. S3Gateway creates this token for every s3 client request. To create an S3Token user must have a S3 secret. Below sequence diagram shows steps involved in s3 secret and S3 token usage.


image3.jpg

image1.png

Authorization

Ozone provides a pluggable API to control authorization of all client related operations. Default implementation allows every request. Clearly it is not meant for production environments. To configure a more fine grained policy one may configure Ranger plugin for Ozone. Since it is a pluggable module clients can also implement their own custom authorization policy and configure it using [ozone.acl.authorizer.class].

Audit

Ozone provides ability to audit all read & write operations to OM, SCM and Datanodes. Ozone audit leverages the Marker feature which enables user to selectively audit only READ or WRITE operations by a simple config change without restarting the service(s).

To enable/disable audit of READ operations, set filter.read.onMatch to NEUTRAL or DENY respectively. Similarly, the audit of WRITE operations can be controlled using filter.write.onMatch.

Generating audit logs is only half the job, so Ozone also provides AuditParser - a sqllite based command line utility to parse/query audit logs with predefined templates(ex. Top 5 commands) and options for custom query. Once the log file has been loaded to AuditParser, one can simply run a template as shown below:

ozone auditparser template top5cmds

Similarly, users can also execute custom query using:

ozone auditparser query "select * from audit where level=='FATAL'"

How to enable security in Ozone?

To turn on Ozone security set “ozone.security.enabled” to true. Below is list of important properties for Ozone security:

ozone.security.enabled

True if security is enabled for Ozone. When this property is true, hadoop.security.authentication should be Kerberos.

hdds.scm.kerberos.principal

The SCM service principal. Ex scm/_HOST@REALM.COM

hdds.scm.kerberos.keytab.file

The keytab file used by SCM daemon to login as its

service principal.

ozone.om.kerberos.principal

The OzoneManager service principal. Ex om/_HOST@REALM.COM

ozone.om.kerberos.keytab.file

The keytab file used by SCM daemon to login as its

service principal.

hdds.scm.http.kerberos.principal

SCM http server service principal.

hdds.scm.http.kerberos.keytab

The keytab file used by SCM http server to login as its

service principal.

ozone.om.http.kerberos.principal

OzoneManager http server principal.

ozone.om.http.kerberos.keytab

The keytab file used by OM http server to login as its

service principal.

References:

  1. Apache Hadoop Ozone website: https://hadoop.apache.org/ozone/
  2. HDDS security design document
  3. https://issues.apache.org/jira/browse/HDDS-4