For the past decade and more businesses have embraced data as their new source of competitive advantage and they key ingredient to business success. We have heard the data is the oil theme to accompany this vision. Generative AI (GenAI) has now come along to not only continue on this theme, but to weaponize the unused 80% of data in our corporations. That’s right – GenAI is the catalyst to not just better utilize structured data from our systems and data lakes, but the troves of unstructured data in Sharepoint, S3 buckets, Box and many other repositories.
This opens up a whole new paradigm of governance as it relates to the data aspects but also managing the technology itself.
A quick definition of data governance relates to the tools, technologies, people and processes aimed at making data trustworthy and available for use in the organization. (Read our blog “What is GenAI Governance” to learn more.)
An important first step in your GenAI governance committee for your GenAI governance initiative. They are responsible for establishing your north star as it relates to privacy, compliance, legal, security and other dimensions you need to consider. This team is cross functional with members from across the enterprise to ensure proper representation.
This article will cover some of the key considerations and AI Governance best practices for implementing a governance program for your GenAI initiatives.
Governance policy development
One of the primary AI Governance best practices, and the key output is your GenAI policy documentation. This should provide all the guidance across the organization in how to build these applications, how to use them, who should be allowed to use them, approval processes etc. One question I often ask and often get asked the same: does the steering committee mandate or influence (i.e. sell compliance with the program). The reality is that it makes very little difference in the practical implementation. This group will mandate but will have to sell the other teams to comply.
This policy should contain most of the elements we will cover in the rest of this article around permitted and non-permitted use cases, security and privacy, ethical considerations, approval processes etc. One of the observations over the past year though is that the policy needs to be regularly reviewed and be adaptable. In the early days after November 2022 when ChatGPT took the world by storm and was opened to the world for anyone to use, we need to be able to change and keep on adapting to this new technology. Most organizations responded by blocking access to ChatGPT from corporate networks. Fast forward 12 months and we now have Co-pilots embedded in every SAAS application you use – Salesforce, Microsoft 365. As with ChatGPT, very few guardrails exist in these apps to prohibit malicious use or uploading of sensitive information.
Guidelines for Ethical Use of GenAI
One of the most important decisions of the steering committee will be to determine what your organization will view as ethical or non-ethical use cases. Especially as it pertains to usage of consumer data, personal healthcare data and other sensitive data categories. Some of the key principles within AI Governance best practices are as follows:
- Fairness and Non-Discrimination – Ensure equal treatment and avoid bias in AI outcomes. The challenges of built-in bias in traditional AI systems have been well documented, but GenAI will open another raft of these to consider and be prepared to deal with.
- Transparency and Explainability – Make AI decision-making processes understandable. This is another holdover from traditional AI/ ML scoring or profiling approaches and in GenAI this gets a lot more complex. Most GenAI systems consist of foundational models as well as retrieval augmented generation (RAG) systems like vector databases.. While the victor databases are somewhat auditable and one can see which context records informed a response from the LLM, this is much more difficult to understand which content or articles the foundational model based its part of the response on.
- Privacy and Data Protection – Safeguard personal information and respect user privacy. A key challenge for GenAI apps is how to handle sensitive or personal data in these kinds of applications and whether it is appropriate to use that information.
- Beneficence – Aim to do good and avoid causing harm with AI applications. It probably goes without saying and most organizations might gloss over this. But GenAI exploitation already has plenty of examples of bad outcomes, even if it was not intended.
- Accountability – Define who is responsible for AI actions and decisions. A major part of the ethical guidelines is who owns decisions or how one goes about getting clarification and approval of a use case if in doubt.
Stakeholder training and enterprise wide enablement
Once the Generative AI Governance frameworks and policy guidelines are in place, the next major task is to train the various stakeholder groups across the entire organization. The training program starts though with senior leadership buy-in and support across the entire organization. As you start structuring the training program, a couple of key considerations to keep in mind:
- Course content should cover some of the foundational aspects for each different audience grouping. Eg there might be foundational and basic training that is the same for every employee, but then developers building GenAI apps will need different training to assist with their key roles.
- Foundational elements of what GenAI is, how it differs from traditional AI
- Example use cases as it pertains to your particular business
- Ethics and legal considerations
- Data security and privacy
- Training delivery and frequency should be highly adjusted to your organization, culture and mechanisms in place to deliver in-person, vs on-demand, s virtual.
- Incorporating AI Governance best practices into your training program, getting feedback, iterating and adjusting your training on an ongoing basis will be critical to deal with both the rapid innovation at the technology stack as well as the new use cases or new risks identified.
Data privacy and security guardrails for GenAI
One of the key learnings we have had in our traditional world of analytics is that there is a growing awareness of the need to go beyond training people to do the right things and implement privacy and security enhancing and enforcing technologies. I.e. If someone should not be able to see customer data in France due to GDPR considerations, then let us create rules and access controls in our data systems that will actually take care of that automatically.
Building data access, security and safety guardrails into your GenAI application becomes an absolutely critical best practice, but this is easier said than done. Early stage guardrails have been released into the market, but unfortunately, most of them are binary in their approach and does not take into consideration that some individuals might need to be able to see sensitive data (eg. salary info of my employees) vs others that should not see it at all. A one size fits all approach will not work in the majority of use cases. In addition, your guardrails need to be applied at 3 points of interaction in your GenAI app:
- Prompts or inputs into the Co-pilot or chatbot need to be screened for sensitive data or malicious attempts to solicit toxic or bad behaviors out of the system.
- RAG queries that collect context information to make your response relevant to your business must be able to replicate original source access permissions and automatically filter query results based on the user and their original permissions.
- Responses from the GenAI app will have to be screened again similarly for the presence of sensitive or private data or even toxic or malicious responses that might have been generated by the LLM.
An additionalGenAI best practice to consider is that you will need to bring your guardrails to every new application you build. In this situation, hard coding some guardrail into your first app might be simple enough for the time being, but it is pivotal to design for scale and growth. Too many of these projects are built only to be blocked by security, privacy and legal teams upon completion due to insufficient security and privacy guardrails.
Continuous Monitoring
Most organizations we speak to often view this as the first governance activity they wish to invest in – simply because they have no idea what people are asking or inputting into these new systems. Monitoring, auditing and visualizing the presence of sensitive data in apps, in user prompts or responses across your entire estate is pivotal. In addition, you want to be able to provide fine grained audits to be able to track and understand patterns, and if needed, deeply understand how specific responses resulted considering the complexity of GenAI application interactions.
And of course, a key aspect of being able to monitor on a continuous basis is to assess risks, make adjustments or remediations and continue to iterate your use cases.
Embrace GenAI Governance Best Practices With Privacera AI Governance (PAIG)
Privacera AI Governance (PIAG) is the industry’s first open standards based GenAI governance solution that builds on years of experience building similar solutions for regular analytical workloads across hybrid data estates. PAIG specifically help implement data security and safety guardrails in the following manner:
- Scanning and protecting training and fine tuning data used as part of model training or RAG systems to ensure sensitive data is classified and protected via masking, encryption or redaction.
- Real Time scanning and securing of prompts and responses to protect against sensitive data leakage or toxic and malicious usage.
- Fine grained filtering and tag-based controls for RAG based queries based on user permissions and data classification access rights.
- Continuous monitoring, observability and auditing across all your GenAI apps.
What sets PAIG apart from other approaches is that PAIG is independent of your choice of LLM or vector databases. Most organizations will have multiple apps and apps itself are beginning to consist of multiple foundational models and multiple vector databases. PAIG is your centralized solution across your entire GenAI app estate and it allows you to accelerate innovation while providing observability and assurance to the security and privacy teams that you have proper controls in place for any use case.
For a deeper understanding, get our whitepaper on Privacera AI Governance (PAIG) and discover how we help organizations streamline governance for AI initiatives.