By: Rajiv Dholakia, Privacera VP of Product, and Matt Fuller, Starburst Co-Founder and VP of Product
Data mesh is an emerging concept that promotes data democratization by enabling enterprise-wide users to access any dataset across organizations, resulting in more business units to monetize data and drive business transformation. Coined by Zhamak Dehghani, data mesh continues to gain enterprise popularity as a new approach to building a modern data architecture that addresses today’s challenges of becoming truly data-driven.
At the crux of data mesh, however, lies a critical element: governance. Modern business initiatives drive the critical need for data integration across domains to provide a comprehensive, cross-functional view of data, but transforming into data-driven approaches like the data mesh can be a complex and difficult process, as it includes: migrating legacy systems; prioritizing data management; balancing the mandate of making data accessible to data analysts and data scientists, while ensuring data is used responsibly; and meeting stringent industry and sovereignty regulations.
Secure data democratization is the foundation on which data mesh rests, and it cannot be achieved without implementing a scalable, effective governance framework. Effective governance removes the technical complexities of managing domains, while enabling consistent data across them. Most importantly, an effective governance framework ensures data is controlled and secured at a granular level. These controls, when applied consistently across data sources, provide detailed visibility of how data is accessed and used, which eases the compliance process by removing uncertainty and being able to test and audit controls comprehensively.
Data mesh overview
Dehghani describes data mesh as a paradigm shift that respects the ubiquitous nature of data across a distributed architecture, enabled by a shared, self-service infrastructure. The data mesh platform is an intentionally designed, distributed data architecture, under centralized governance and standardization for interoperability, enabled by a shared and harmonized self-serve data infrastructure. In this platform, distributed data products are oriented around domains and owned by cross-functional teams who use a common infrastructure to host, prep, and serve their data assets.
“Data mesh from 30,000 feet,” courtesy of Zhamak Dehghani
As enterprises continue to collect and store vast amounts of data, assets and processes are increasing in complexity; so the concept of decentralizing data ownership and putting responsibility into the hands of people who are most familiar with each domain makes sense; however, decentralizing ownership can create risks if there is a lack of controls, since the compliance and regulatory landscape continues to evolve across geographies and standards like GDPR, LGPD, CCPA, and more have put significant pressure on enterprises to ensure their internal and external data respects the boundaries of customer privacy. Failure to do so can not only impact enterprises’ reputations and incur significant fines, but also, in some cases, can threaten enterprises’ continued operations.
Starburst’s recent announcement of Starburst Stargate helps bridge the gap to the new paradigm of data mesh by enabling cross-cloud analytics on data distributed globally, while leveraging data access governance partners like Privacera to ensure regulatory compliance is not compromised at the expense of enabling rapid access to analytics to drive business initiatives.
Privacera, specifically built for the purpose of defining and administering access control policies across multi-cloud data sources from a centralized interface, helps data teams address the problems that can arise from the complexities of the data mesh. Our centralized data access governance platform simplifies managing and enforcing data access controls consistently across the global analytics Starburst Stargate enables– without impacting query performance and while enabling high scalability, which is a vital part of the data mesh.
Privacera’s Apache Ranger plugin architecture and PolicySync capabilities enable data administrators to:
- Control and enforce fine-grained permissions and access policies natively within platforms like Starburst and its connectors like Stargate.
- Achieve optimized performance across data mesh domains, high scalability and availability, and no point of failure in the data architecture.
- Scale governance strategy without hindering operational performance, as data volume and client requests increase over time.
- Simplify access policy management by delegating responsibilities to data stewards or data owners who are more familiar with specific data or policies via Privacera’s Delegate Admin functionality – offering both centralized policy control and scalability across data teams.
What does this look like in the real world?
Let’s look at the use case from Starburst’s blog. One of their customers has data stored in: AWS East, AWS Frankfurt, AWS Paris, Azure Central US, and two on-prem data centers. Its analysts and data scientists need to derive insights and train models based on data in all of these regions. Until now, they were replicating data so analysts could get the data they needed in one place.
Stargate enables that customer to link catalogs and data sources supported by one Starburst cluster to other catalogs and data sources in remote Starburst clusters. Privacera’s automated data discovery inventories everything connected to the Starburst cluster and automatically applies a basic set of tags for personal identifiable information (PII) out-of-the box. Privacera’s discovery engine can also find sensitive data in locations data stewards can’t physically check (e.g., thousands of files, new tables created by analysts, etc.), even for enterprise-specific classifications (e.g., “everything in this system is constrained by GDPR” or “any identifier that matches this list is a customer account number.”) Privacera’s tag-based policies automatically protect sensitive elements in data against unauthorized access, extendable beyond Starburst to protect data even when it is accessed outside the distributed query environment, such as BI queries on data warehouses, or direct access to cloud storage like AWS, S3, Azure ADLS, or Google Cloud Storage.
What’s coming next
Concepts like Dehghani’s data mesh are not necessarily new. These concepts are a way to cut through the complexities as the data industry moves to the open cloud. Governance is the stitch that holds the fabric of data mesh together, simplifying the complexity of managing data access controls, driving self-service analytics, helping enterprises complete the journey of being data-driven– and most importantly, ensuring the trust our customers have in our technology is always maintained.
Stay tuned as we work closely with Starburst to provide more information about the Privacera’s integration with Starburst Stargate.