Purnima Kuchikulla & Sharada Ramesh
The latest innovation to Databricks governance is here via Privacera Ranger cluster-based policies. Privacera’s cluster-based policies provide the latest innovation in governing Databricks analytics environments!
What Are Cluster-Based Policies?
A cluster-based policy enables dynamic security at a cluster level. Starting with Privacera version 3.5, this feature is directly integrated into our product. A cluster policy limits the ability to create clusters based on a set of rules. A policy defines those rules as limitations on the attributes used for the clusters. Cluster policies define restrictions to limit their use for specific users and groups. You can use cluster policies to limit users to create additional clusters with prescribed settings, simplify the user interface which enables more users to start their own clusters, and control costs by defining the maximum cost of the cluster all through a centralized governance interface. Privacera leverages Databricks built-in advanced security settings to further harden the cluster for Python and SQL workloads.
Creating Cluster Policies with Privacera Ranger
You can set up a Databricks cluster policy that allows clusters to be created only by specific users and groups. In addition, the cluster attributes can also be controlled via this policy. The screenshot below shows a sample cluster policy. Administrators start out by naming the policy. The users and groups are from Databricks. For the purpose of this example, we select Emily as the user. Emily is able to create Databricks clusters using explicitly authorized IAM roles from the Administrators. IAM Roles are directly associated with instance profiles. This way we can allow Emily to create Databricks clusters that allow her to interact with data and services in predefined ways based on the IAM roles she is allowed to use.
If you review the policy you can see that all the default properties to enable Privacera have been added. For a high-concurrency Databricks cluster, administrators can limit the cluster to only run SQL or Python. Additional properties like Databricks runtime versions allowed, the maximum number of workers to control the cost, and instance types, etc. can be also added to the cluster policies.
In the Databricks workspace, Emily is not an admin and normally should not have the ability to create a cluster but now with cluster policy functionality Emily will have the privilege to create a cluster with the limitations that were specified in the policy.
If you refer to the screenshot below, you can see that Apache Spark properties that are part of the Privacera policy are in sync. Moreover, modifications to these properties are not allowed. This feature enables you to allow your users to create clusters but in a very controlled, secure, and cost-efficient manner.
Databricks Cluster Policy Use Cases
As we previously saw, all cluster attributes are supported through cluster policies. There is a list of constraints that can be enforced in these cluster policies, as noted in the Databricks documentation Managing Cluster Policies:
- Fixed value with disabled control element
- Fixed value with control hidden in the UI (value is visible in the JSON view)
- Attribute value limited to a set of values (either allow list or block list)
- Attribute value matching a given regex
- Numeric attribute limited to a certain range
- Default value used by the UI with control enabled
There are also a few synthetic attributes that are supported by cluster policies:
- To help with individual cluster costs, the “max DBU-hour” metric can be used. This represents the maximum DBUs (Databricks Units) a cluster can use on an hourly basis
- Limits can also be placed on the source that creates the cluster. These sources include jobs service (job clusters), clusters UI, and clusters REST API (all-purpose clusters)
However, there are a few cluster attributes that cannot be controlled by cluster policies. These include libraries, the number of clusters created per user, and cluster permissions (ACLs). These are managed by a separate API.
Why Use Privacera To Create Cluster Policies?
Privacera helps enterprises in centralizing their authorization policies for both data access data and processing. By adding this feature to Privacera, we have now extended our support to cluster policies by using our centralized policy management console.