Privacera Expands Support for Databricks on Google Cloud


Yesterday, our partner Databricks announced their public preview for deployment on Google Cloud Platform. And today, we are thrilled to introduce Privacera’s extended support of GCP to include Databricks Spark Clusters, further broadening Privacera’s support previously including Google Cloud Storage, Big Query, Big Table, and Data Proc.

Recognizing the importance of Databricks as a leading Lakehouse platform, Privacera now offers robust access controls for the service across all three leading public clouds: Amazon Web Services (AWS), Azure, and GCP–the impacts of which are especially significant in a world where many organizations are opting for multi-cloud approaches and data services. Privacera answers this need by providing single pane visibility across cloud services and applications, reducing burdens on data infrastructure teams, and enabling secure data democratization across enterprises. 

As Privacera offers the same experience across three of the largest cloud platforms, the ability for customers to migrate from one cloud to the other–or run their big data and data science workloads in the cloud of their choice– just got more flexible. Coupled with this flexibility, Privacera ensures our customers never have to compromise privacy, compliance, or security of their data. 

How Privacera Platform Supports Databricks on Google Cloud Platform

Privacera supports fine-grained data access control for Databricks Spark clusters running in high concurrency mode with Python, R, and SQL languages. The Privacera Ranger plugin for Spark runs within the Databricks cluster and provides access control for all user requests. The architecture of the plugin is similar to the Ranger plugins for Apache Hive, Apache HBase, HDFS, and Apache Kafka and includes a library that is loaded at startup time that runs within the Spark Driver, ensuring all data is accessed via the Ranger plugin for authorization.

Privacera-Databricks Authorization Architecture

To extend the native access control for Databricks clusters, Privacera platform provides a plug-in model based on Apache Ranger. These plug-ins are lightweight distributed agents that act as gatekeepers to access various cloud resources. The Ranger plugin is embedded within the Spark Driver. When a user executes a SQL query or reads a file from the cloud storage such as Google Cloud Storage, the request is received by Spark Driver. Spark Driver parses the request and generates a logical plan to process the query. Ranger plugin embedded within the Spark driver performs an authorization check against the resources that the user is requesting to access. If the user has the required permissions, the plugin then essentially lets the Databricks cluster take over the processing of the query.

The Ranger plugin is installed in Databricks using init scripts provided by Privacera. These init scripts can be deployed globally for all the clusters or locally for the individual clusters. The init script downloads the appropriate Ranger and Privacera libraries and enables the Ranger plugin for the cluster. Databricks calls the init scripts automatically when the cluster is started for the first time to ensure that security is enabled for the lifetime of the cluster.

Key Features & Benefits for Databricks Users 

Coupled with the unified approach Databricks provides for data analysis, Privacera’s advanced security, privacy, and governance capabilities seamlessly integrate to provide Databricks users with a fully secure solution that controls data access, ensures consistent access policies across all systems and applications, and enables true data transparency across enterprises to maintain compliance with stringent industry and privacy regulations. Privacera provides Databricks users: 

Automated sensitive data discovery – Privacera automatically connects to cloud and on-premises storage services and databases that serve as the storage layer for Databricks deployments, which includes Google Cloud Storage. Once connected to the storage environment (including Delta tables), Privacera performs an initial scan of data stored at rest and continuously scans new data in near real-time as it enters the environment. When scanning, Privacera uses one of three methods to identify and tag sensitive data: pattern matching, machine learning models, and dictionary or lookup tables (depending on the use case and data type). 

Fine-grained access control – Privacera empowers administrators to implement access control policies at the column-, row-, and file-levels. This includes the ability to dynamically mask or redact data in columns, as well as filter data in table rows based on user attributes or conditions, enabling Databrick users to share authorized data across multiple users and comply with privacy and security mandates.

Single pane visibility – Privacera provides a centralized platform to ensure fine-grained access controls are applied across multiple cloud services and administered to data from a single location. With this single-pane view, Privacera alleviates the need for administrators to navigate multiple, disparate interfaces to govern users’ access. With Privacera’s granular control, more users get rapid access to the data they are authorized to access, and administrators can dynamically create, enforce, and scale policies across all their data sources.

Encryption and Masking/Filtering – Privacera’s Encryption Gateway (PEG) supports encryption and decryption of data at rest and in motion. With PEG, Databricks users can securely migrate encrypted data from on-premises data lakes to the cloud and safeguard it against breaches in the cloud until it is ready to be decrypted for analytical purposes.

Privacera Encryption Architecture

Comprehensive visibility into data access and usage – With Privacera’s rich auditing and reporting capabilities, Databricks users have complete visibility of their sensitive data, including: 

  • Real-time monitoring and logging
  • Detailed audit trails of data usage, including access, policy changes, precise queries executed, and more
  • Precise visibility of which data users accessed what data sources and for what purposes, so at any given time, compliance with industry and privacy regulations like GDPR, CCPA, HIPAA, LGPD, and more can easily be proven

With Privacera’s extended support for GCP and its advanced capabilities, Databricks teams get the best of both worlds: industry-leading governance, security, and the flexibility to deploy across any cloud service. 

To learn more about how Privacera supports Databricks on Google Cloud, contact us.


Contact Privacera for a Data Governance and Security Demo Today