Is Your Data Leaking Via ChatGPT?

In November 2022, OpenAI released ChatGPT, a generative artificial intelligence (GAI) tool, which has since taken the world by storm. Only two months after its launch, it had over 100 million users, making it “the fastest-growing consumer application in history.” And ChatGPT is not alone.

While any new technology presents risks, it doesn’t mean the risks are all brand new. In fact, companies might find that they already have many of the people, processes, and technology in place to mitigate the risks of GAI tools but need to fine-tune them to include this new technology category.

That being said, companies should not assume they are already protected from these risks; instead, they should be intentional in their risk mitigation.

How might GAI risk sensitive information?

Most GAI tools are designed to take a “prompt” from the user, which might be entered/pasted text, a text file, an image, live or recorded audio, a video, or some combination of them and then perform some action based on the prompt used. ChatGPT, for example, can summarize the content of a prompt (“summarize these meeting notes…”) and debug source code (“identify what’s wrong with this code…”).The risk is that these prompts can be reviewed by the GAI tool developers and might be used to train a future version of the tool. In the case of public tools, the prompts become part of the language model itself, feeding it content and data it would not otherwise have. If these prompts only include information that’s already public, there may be little risk; however, in the example above, if the meeting notes included sensitive information or the source code was proprietary, this confidential data has now left the control of the security team and potentially put compliance obligations and intellectual property protections at risk.

In the most extreme situation, someone’s input might become someone else’s output.

And that’s not only theoretical.

In January 2023, Amazon issued an internal communication restricting ChatGPT use after it discovered ChatGPT results “mimicked internal Amazon data.” And while the initial data used to train ChatGPT stopped being collected in 2021, ChatGPT isn’t done training. In fact, in the future, when an engineer asks for help optimizing their source code it might look just like the proprietary data from Samsung’s semiconductor division.

With one report indicating two-thirds of jobs could be at least partially automated by GAI tools, what other sensitive information might end up being used as training data?

Did You Know?

The average cost of an insider incident is a shocking $15 million, according to the 2024 Data Exposure Report.

Download Report

It’s all in the balance

An intentional, balanced approach across people, process, and technology is the only way to truly mitigate the risk of data leaking via a GAI tool.

Assemble your stakeholders

Start by getting your core stakeholders together: Human Resources, Legal, IT, and Security to define the policies and procedures for GAI tool use in alignment with executive guidance.

Who can use it when?

What can be used as input?

How can the output be used?

Some companies, like Amazon, choose to allow GAI tool use as long as their employees are careful; others, like Samsung, have decided to ban its use altogether.

Communicate and educate

Provide explicit guidance in your Acceptable Use Policy

Require proactive training on the risks associated with GAI tool use

Let employees know where they can go for assistance on proper use and how to report improper use

Stop, contain and correct

Monitor employee activity to untrusted GAI tools

Block paste activity into unapproved GAI tools

Generate alerts for file uploads to GAI tools

Respond with real-time training for those employees who make a mistake

Many of the risks of GAI tools aren’t new, and with an intentional, holistic approach organizations mitigate GAI tool risk without losing the momentum of possibility.

Ready to learn how your company can protect itself without falling behind?

If so, you might be ready to set up a free consultation with our data protection advisors. You’ll get the tools and guidance needed to create a transparent, sustainable and measurable insider threat program…no strings attached. Just fill out this form and we’ll get in touch.

How to keep your sensitive information safe from generative AI tools

View AI Cheat Sheet

Incydr^™

Incydr Gov^™

Insider Threat

Data Loss Prevention

Is Your Data Leaking Via ChatGPT?

How might GAI risk sensitive information?

The average cost of an insider incident is a shocking $15 million, according to the 2024 Data Exposure Report.

It’s all in the balance

Assemble your stakeholders

Communicate and educate

Stop, contain and correct

Ready to learn how your company can protect itself without falling behind?

How to keep your sensitive information safe from generative AI tools

Incydr™

Incydr Gov™

Insider Threat

Data Loss Prevention

How might GAI risk sensitive information?

The average cost of an insider incident is a shocking $15 million, according to the 2024 Data Exposure Report.

It’s all in the balance

Assemble your stakeholders

Communicate and educate

Stop, contain and correct

Ready to learn how your company can protect itself without falling behind?

How to keep your sensitive information safe from generative AI tools

Incydr^™

Incydr Gov^™