Apache Ranger response to incorrect analyst report on Cloud data security

Introduction

A recent industry analyst report by GigaOm and sponsored by Immuta comparing Apache Ranger to Immuta paints an incorrect picture on the complexities of using Apache Ranger. We believe the report contains a number of errors and inconsistencies. Unfortunately the Apache Ranger Project Management Committee (PMC) was not contacted by the analyst firm during preparation of the report.

We have attempted to contact the authors and members of the research team several times, requesting the opportunity to review the inaccuracies and have them corrected. Despite our many attempts to rectify the misinformation, no-one from the analyst firm responded.

For the benefit of existing and potential users of Apache Ranger, it is important for Apache Ranger PMC to respond to this report with facts.

Use cases

Let us now go through the scenarios covered in the report, and see how the numbers reported change with appropriate use of Apache Ranger to address the requirements.

Scenario 1b: Mask All PII Data

lists 2 policy changes in Immuta vs 5 in Apache Ranger. In fact, only one Apache Ranger policy would be needed to address this requirement.
Shows author's lack of understanding of Apache Ranger policy model. Series of steps to allow/deny/deny-exception listed are applicable only for an access policy but not for a masking policy. Also, in access policies, allow/deny/deny-exception can be replaced by a switch named denyAllElse, as shown in the image below.
With use of user-groups or roles, a time-tested best practice followed universally by access control systems, this requirement can be met by a single Apache Ranger policy, as shown below.
Masking policy:

Access policy:

Scenario 1c: Allow Email Domains Through the Masking Policy

lists 2 policy changes in Immuta vs 5 in Apache Ranger. In fact, only one Apache Ranger masking policy would be needed to address this requirement. Same as the previous scenario.
Claim: Apache Ranger does not have a regular expression masking policy
Truth: instead of building a virtualization layer that can introduce significant complexities and performance penalties, Apache Ranger uses native capabilities of the data processing application to perform masking and filtering. Given regular expressions are supported by such applications, it will be simpler to create a custom expression to suit your needs like email address, account numbers, credit card numbers; importantly without having to drag security software vendor.

Scenario 1d: Add Two Users Access to All PII Data

lists 1 policy change in Immuta vs 4 in Apache Ranger. However, the following suggests that each user must be updated in Immuta UI to add necessary attributes. Wouldn't the number of steps be as large as the number of users?

Added the AuthorizedSensitiveData > All attribute to each user in the Immuta UI.

counts 4 policy changes in Apache Ranger policies, while the only change needed is to add users (2 or 200 users!) to a group or role. No policy changes are needed if time tested best practices are followed - by referencing groups or roles in policies instead of individual users.

Scenario 2a: Share Data With Managers

lists 1 policy change in Immuta vs 101 in Apache Ranger. With use of lookup tables, which is a common practice in enterprises, the requirement can be met with a single row-filter policy in Apache Ranger.

ss_store_sk in (select store_id from store_authorization where user_name=current_user())

Scenario 2b: Merging Groups

lists 0 policy change in Immuta vs 1 in Apache Ranger. This is the same as the previous scenario, where the author chose to not follow common practice of using lookup tables. With use of a lookup table, as detailed above, no policy changes will be needed in Apache Ranger.

Scenario 2c: Share Additional Data With Managers

lists 0 policy changes in Immuta vs 102 in Apache Ranger. Once again, with use of a lookup table, only 2 policies would be required in Apache Ranger:

table store:
s_store_sk in (select store_id from store_authorization where user_name=current_user())

table store_returns:
sr_store_sk in (select store_id from store_authorization where user_name=current_user())

Scenario 2d: Reorganize Managers Into Regions

lists 0 policy changes in Immuta vs 40 in Apache Ranger. Same as previous scenarios - with use of a lookup table, no policy changes will be needed in Apache Ranger.

Scenario 2e: Restrict Data Access to Specific Countries

lists 1 policy change in Immuta vs 71 in Apache Ranger. With use of a lookup table, only one row-filter policy is needed in Apache Ranger.

Scenario 2f: Grant New User Group Access to All Rows by Default

lists 0 policy change in Immuta vs 30 in Apache Ranger. With use of a lookup table, no additional policy would be needed in Apache Ranger.

Scenario 2g: Apply Policies to a Derived Data Mart

lists 0 policy changes in Immuta vs 140 in Apache Ranger for the addition of 15 tables. With Apache Ranger, new tables can either be added to existing policies, or new policies can be created. It will require 15 policy updates in Apache Ranger - not 140 as claimed by the author. Also, no details on the changes to be done in Immuta (other than ‘0 policy changes’) are provided.

Scenario 3a: "AND" logic policy

says "unable to meet requirement" in Apache Ranger - which is incorrect. The author does suggest a good approach to meet this requirement in Apache Ranger - by creating a role with users who are both the groups, and referencing this role in policies. However, the point about Apache Ranger not supporting policies based on a user belonging to multiple groups is correct. However, this can easily be addressed with a custom condition extension. If there is enough interest from the user community, an enhancement to support this condition out of the box would be considered.

Scenario 3b: Conditional Policies

says "unable to meet requirement" in Apache Ranger - which is incorrect. As mentioned earlier, Apache Ranger leverages expressions supported by underlying data processing engine for masking and row-filtering. The requirement can easily be met with following expression in the masking policy:

CASE WHEN (extract(year FROM current_date()) - birth_year) > 16) THEN {col} ELSE NULL END

There is no need to create views as suggested in the report.

Scenario 3c: Minimization Policies

as mentioned in the report Apache Ranger doesn't support policies to limit the number of records accessed. If there is enough interest from the user community, this enhancement would be considered.

Scenario 3d: De-Identification Policies

Says “unable to meet requirement” in Apache Ranger - which is incorrect. While Apache Ranger doesn’t talk about k-anonymity directly, the requirements can be implemented using Apache Ranger data masking policies - by setting up appropriate masking expressions for columns.

for columns that require NULL value to be returned, setup a mask policy with type as MASK_NULL
for columns that require a constant value, setup a mask policy with type as CONSTANT and specify desired value - like “NONE”
for columns that require a ‘generalized’ value based on the existing value of the column, use custom expressions as shown below. This does require analyzing the table to arrive at generalized values:
CASE WHEN {col} < 20 THEN 16
WHEN {col} BETWEEN 20 AND 29 THEN 26
WHEN {col} BETWEEN 30 AND 39 THEN 36
WHEN {col} BETWEEN 40 AND 49 THEN 46
WHEN {col} BETWEEN 50 AND 59 THEN 56
WHEN {col} BETWEEN 60 AND 69 THEN 66
WHEN {col} BETWEEN 70 AND 79 THEN 76
WHEN {col} BETWEEN 80 AND 89 THEN 86
WHEN {col} BETWEEN 90 AND 99 THEN 96
ELSE 106
END

What the report doesn't talk about?

It is important to take note of what the report doesn’t talk about. For example:

Extendability: Apache Ranger’s open policy model and plugin architecture enable extending access control to other applications, including custom applications within an enterprise.

Wider acceptance of Apache Ranger by major cloud vendors like AWS, Azure, GCP; and availability of support from seasoned industry experts who continue to contribute to Apache Ranger and extend its reach.

Performance: Apache Ranger policy-engine is highly optimized for performance, which results in only a very small overhead (mostly around 1 millisecond) to authorize accesses; and importantly, there are no overheads in the data access path.

Apache Ranger features like security zones that allow different sets of policies to be applied to data in landing, staging, temp, production zones. A security zone can consist of resources across applications, for example: S3 buckets/paths, Solr collections, Snowflake tables, Presto catalogs/schemas/tables, Trino catalogs/schemas/tables, Apache Kafka topics, Synapse database/schemas/tables.