Announcing the private preview of Privacera AI Governance (PAIG), the industry’s most complete AI data security governance solution.
The launch of ChatGPT in November 2022 started a massive wave of interest in generative AI. In fact, according to Statista, ChatGPT got their first million users in five days—a feat that took Instagram 2.5 months and Netflix 3.5 years to achieve. The latest numbers show more than 100 million ChatGPT users.
Generative AI and large language models (LLMs) have the potential to revolutionize enterprise operations by improving communication, automating tasks, enhancing decision-making processes, and delivering personalized experiences. They enable enterprises to leverage the power of language processing to gain a competitive edge, optimize workflows, and provide better services to customers and stakeholders.
Concerns around AI
Amidst epic potential benefits, a wave of negative impacts has also arrived with the advent of generative AI LLMs. Most importantly, the rapid realization that all the training data going into these models can open up unprecedented and massive privacy and security risks.
Like Samsung, most corporations banned ChatGPT access from corporate devices and networks in an effort to mitigate the risks related to sensitive data being loaded into the models. What quickly became very clear was that, with enough prompting, generative AI applications will divulge any piece of information—sensitive or not—to the user if that sensitive data was loaded into the models at any time.
Generative AI is different from the machine learning (ML) models currently used. LLMs can process huge volumes of datasets. Enterprises looking to leverage LLMs can run central models, which can access all their data. This introduces risks when different data can be mixed together with different compliance and privacy requirements affecting the use of the data.
Here are the broad concerns around governance and privacy:
Bias: Including personal data attributes, such as race, sex, age, and ethnicity, might result in models building outcomes that could be based on these protected categories and represent biased or unethical decisions.
Intellectual property: LLMs use large amounts of data, which can infringe on trade secrets, proprietary information, or confidential data rules.
Privacy: Training models require vast amounts of data, which can contain personally identifiable information (PII), which must be safeguarded. If training-model data contains sensitive, classified, or private information, precautions must be taken to prevent breaches and non-compliant use.
Governance and security: There are a lot of internal and external governance and security rules regarding what data can be accessed or shared internally and externally. These rules may be infringed by the LLM models. There are also concerns the models themselves can be attacked by third-parties, and can be trained to leak information or cause damage to the company.
Emergence of data security governance for AI
There are a number of major data security and privacy considerations as it relates to AI and generative AI specifically. The following lists top considerations and how to safeguard.
Protect training data used to build models
Training generative AI models requires massive troves of data. Up to now, these models have been learning from internet data which, in itself, causes privacy problems. But, extending and fine-tuning the models through transfer learning on your own business’ data is where you will ultimately get company-specific value. If that data contains sensitive, classified, or private data, the model will learn on that data and use it to respond to questions.
- Safeguard required: Continuous scanning, classifying, and tagging of sensitive data as you move and load it into models for training. Based on your security and access policy controls, such as masking, encryption, or removal of these data elements, need to be performed before data is used in the model.
Access control linked to what a user can or should see
Applications accessing the model to generate answers or responses need to start with the right level of identity authentication. Then, both coarse-grained and fine-grained access controls need to be enforced on the model, as well as responses to ensure sensitive or private data is not divulged. For example, even if the model was trained and contained employee salary information, the application using the model should not display salary info to anyone without the proper security role.
- Safeguard required: Coarse-grained access control to determine who can access a model. Fine-grained access control to be applied at the data-item level to model output with the appropriate masking, encryption, and redaction as required for the role or attributes of the person asking the question.
Filtering what questions a person can ask
Pre-filtering questions based on security and privacy settings of the user enables an additional safeguard, which extends the previous aspect.
- Safeguard required: The system should respond immediately to a question that contains sensitive data aspects with an error indicating this question is not allowed.
Introducing Privacera AI Governance
Privacera AI Governance (PAIG) is the industry’s first, comprehensive AI data security governance solution that brings together comprehensive data security governance for relational data, non-structured data, as well as AI model training and access. PAIG is powered by Privacera’s Unified Data Security Platform, which has set the gold standard for modern data, analytics, and now AI estates.
Core capabilities:
- High-performance AI-driven data governance and security for AI use cases: Builds on the existing strengths of Privacera by combining purpose-built AI and LLMs to drive dynamic security, privacy, and access governance.
- Real-time data discovery, data classification, and tagging: Training data for generative AI models and embeddings are continuously scanned for sensitive data attributes, which then get tagged. More than 160 classifications and rules are pre-built, and organizations can expand on these based on their own requirements.
- Data access controls, data masking, and data encryption: Based on real-time discovery and data tagging, data-level controls are established to redact, de-identify, mask, encrypt, or even remove sensitive data, including data that could introduce vulnerabilities in the pre-training data pipeline.
- Allow/deny prompts or responses based on governance policies: Real-time scanning of user inputs and queries for sensitive data elements and applying appropriate privacy controls based on user identity and data access permissions.
- Redact/de-identify sensitive data in prompts and responses: Real-time scanning for sensitive data elements in model responses and applying appropriate controls based on user identity and data access permissions.
- AI-powered auditing and monitoring: Continuous monitoring and collection of model usage and user behaviors into large language models to power analytics into usage, security, and risk patterns.
Benefits
Privacera is the leading provider of unified data security governance solutions to streamline data access and security management across diverse data estates and consumption patterns. With PAIG, organizations tap into Privacera’s proven history of building massively scalable data and access security on diverse data estates.
PAIG is built on Privacera’s Unified Data Security Platform, meaning not only do you now have a common security admin and monitoring platform across all your data, but the same access policies, roles, and controls are now applied consistently for your AI models.
Getting started
PAIG is currently in Private Preview. Interested customers can contact your account representative to schedule a demo or apply to get access to the preview.