AWS, Amazon Web Services, is a juggernaut in the public cloud market share. There are many reasons for its popularity, the important one being how easy and straightforward it is to store and query data in Amazon S3, AWS’s object storage service.
Working with our customers, we know that many use Amazon S3 as a landing zone for their data, a practice sometimes referred to as a data lake, which they then use to provide users access to the S3 buckets and files through various means. For example, EMR Hive is often used for processing and querying data stored in table form in S3. Databricks, based on Apache Spark, is another popular mechanism for accessing and querying S3 data.
Of course, any time you make data available to users you need to implement data governance and security controls. While S3 and other AWS services provide solid security basics, such as user authentication discussed in our previous blog, it is not a simple process to provide fine-grained access control to S3 data with AWS’s native capabilities. But worry not, this is what Privacera provides.
An Architecture Fully Compatible with AWS Services
Privacera adopts the concept of plug-ins to provide fine-grained access control at the file, row and column level. A plug-in architecture featuring native integrations with the data sources provides a lightweight footprint that is easy to layer into complex storage and compute systems. Because the plug-ins are natively built for the source systems, they don’t introduce added complexity, dependency, and overheads. They can swiftly authorize users to support the performance of thousands of users simultaneously accessing and querying data in production environments at petabyte scale.
Rapidly Enable Data Access Control, Governance, and Security
The use of plug-ins for access control enables IT and data platform teams to control the access to data stored in Amazon S3 without writing a single line of code while staying transparent to the end-user and not impacting query performance. With PrivaceraCloud, the industry’s first SaaS-based access control solution, data admins can easily set up access control on existing or new AWS clusters. Take EMR Hive for example, through only a handful of steps to configure the native plug-in for EMR Hive clusters, data admins are able to build fine-grained access control policies, grant or remove access rights, and obtain a holistic audit trail when data is accessed or moved. In the following 4-minute demo, you will see our product in action and how easy it is to install and enforce access control on a new EMR Hive cluster featuring Hue interface and Glue Metastore.
PrivaceraCloud provides the same level of simplicity and convenience to configure Databricks on AWS for access control. Moreover, for the portion of the data that was deemed off-limits to certain data consumers, they can now be given access to just the limited data they are allowed to see, with the off-limits data remaining off-limits. In the following PrivaceraCloud demo, we not only demonstrate the steps to configure the plug-in for a Databricks cluster on AWS, but also column-level data masking and row-level filtering to optimize the utilization and sharing of data without exposing sensitive information.
Authorization and access controls are important because they ensure relevant users can access and query data that the IT or data platform teams authorize them to. With PrivaceraCloud, you can deploy fine-grained access control in minutes and obtain the proper guardrails and data protection that liberate your data. The comprehensive set of data access governance and security capabilities that Privacera delivers sets your data scientists and analysts free to innovate with trusted and faster access to data without compromising data privacy and compliance.
Learn more about Privacera here, or contact us to schedule a call to discuss how we can help your organization meet its dual mandate of balancing data democratization with security to maximize business insights while ensuring privacy and compliance.