Securing Amazon S3, EMR, and Databricks on AWS in Minutes with SaaS-based Access Control

Securing Amazon S3, EMR, and Databricks on AWS in Minutes with SaaS-based Access Control

AWS, Amazon Web Services, is a juggernaut in the public cloud market share. There are many reasons for its popularity, the important one being how easy and straightforward it is to store and query data in Amazon S3, AWS’s object storage service.

Working with our customers, we know that many use Amazon S3 technology as a landing zone for their data, a key practice sometimes referred to as a data lake, which they then use to provide users access to the S3 buckets and files through various means. For example, EMR Hive is often used for processing and querying data stored in table form in S3, and this EMR/AWS pair up is very common. Databricks, based on Apache Spark, is another popular mechanism for accessing and querying S3 data.

This is the reason why Amazon S3 access control and S3 security is critical. Read on to learn why so many organizations trust us with their storage service needs.

Of course, before you make data available to users you need to implement data governance and security controls to protect it. While S3 and other AWS services provide solid security basics, such as user authentication discussed in our previous blog, it is not a simple process to include fine-grained access control to secure S3 data with AWS’s native capabilities. But worry not, comprehensive S3 security is what Privacera provides.

An Architecture Fully Compatible with AWS Services

Privacera adopts the concept of plug-ins to provide fine-grained access control and S3 security at the file, row and column level. A plug-in architecture featuring native integrations with the data sources provides a lightweight footprint that is easy to layer into complex storage and compute systems. Because the plug-ins are natively built for the source systems, they don’t introduce added complexity, dependency, and overheads. They can swiftly authorize users to support the performance of thousands of users simultaneously accessing and querying data in production environments at petabyte scale.

Rapidly Enable Data Access Control, Governance, and Security

The use of plug-ins for access control enables IT and data platform teams to control the access to data stored in Amazon S3 without writing a single line of code while staying transparent to the end-user and not impacting query performance. With PrivaceraCloud, the industry’s first SaaS-based access control solution, data admins can easily set up access control on existing or new AWS clusters. Take EMR Hive for example, through only a handful of steps to configure the native plug-in for EMR Hive clusters, data admins are able to build fine-grained EMR / S3 access control policies, grant or remove access rights, and obtain a holistic audit trail when data is accessed or moved. In the following 4-minute demo, you will see our product in action and how easy it is to install and enforce EMR security and access control on a new EMR Hive cluster featuring Hue interface and Glue Metastore.

PrivaceraCloud provides the same level of simplicity and convenience to configure Databricks on AWS for access control. Moreover, for the portion of the data that was deemed off-limits to certain data consumers, they can now be given access to just the limited data they are allowed to see, with the off-limits data remaining off-limits. In the following PrivaceraCloud demo, we not only demonstrate the steps to configure the plug-in for a Databricks cluster on AWS, but also column-level data masking and row-level filtering to optimize the utilization and sharing of data without exposing sensitive information.

Closing Remarks

Authorization and access controls are important because they ensure relevant users can access and query data that the IT or data platform teams authorize them to. With PrivaceraCloud, you can deploy fine-grained S3 security and access control in minutes and obtain the proper guardrails and data protection that liberate your data. The comprehensive set of data access, monitoring, governance and security capabilities that Privacera delivers sets your data scientists and analysts free to innovate with trusted and faster access to data without compromising data privacy and compliance.

Learn more about data access governance requirements for data science. And see how we can help your organization meet its dual mandate of balancing data democratization with security to maximize business insights while ensuring privacy and compliancerequest your demo.

Interested in
Learning More?

Subscribe today to stay informed and get regular updates from Privacera.