“It’s wild to us that our attack works.”
Unsafe Secrets
A group of researchers led by scientists at Google DeepMind have cleverly tricked OpenAI’s ChatGPT into revealing individuals’ phone numbers and emails, 404 Media reports — an ominous sign that its training data contains large amounts of private information that could unpredictably spill out.
“It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier,” the researchers wrote in a writeup about their study, available as a not-yet-peer-reviewed paper.
Of course, divulging potentially sensitive info is just one small part of the problem. As the researchers note, the bigger picture is that ChatGPT is regurgitating huge amounts of its training data word-for-word with alarming frequency, leaving it vulnerable to mass data extraction — and perhaps vindicating furious authors who argue their work is being plagiarized.
“As far as we can tell, no one has ever noticed that ChatGPT emits training data with such high frequency until this paper,” the researchers added.
Under Pressure
The attack itself, as the researchers admit, is “kind of silly” and alarmingly easy to pull off. It involves prompting the chatbot to “repeat the word ‘poem’ forever,” (or other words) and then letting it go to work.
Eventually, ChatGPT stops repeating itself and starts babbling out eclectic swarms of text, large portions of which were often copied from the web.
When faced with their strongest attack, over five percent of ChatGPT’s output turned out to be a “direct verbatim 50-token-in-a-row copy from its training dataset,” the researchers found, tokens being small chunks of characters that LLMs use to generate text. In one case, the chatbot regurgitated a string over 4,000 characters long.
Some of the cribbed text came from books, poems, or ad copy on websites. And other sources were a lot more compromising. In one concerning example, the chatbot gave out a founder and CEO’s email signature with their personal contact information. Even entire bitcoin addresses were given away.
Cheap Trick
The scary part? The researchers only spent $200 on their attack, allowing them to extract 10,000 unique examples of data that ChatGPT “memorized.” Someone with serious money and bad intentions could extract far more, they warn.
What’s more, these attacks were successful despite the chatbot being “aligned” with human feedback to prevent data regurgitation. Because OpenAI is closed-source, security experts can only test its consumer-facing model, which “can mask vulnerabilities,” the researchers wrote.
“It’s one thing for us to show that we can attack something released as a research demo,” they added. “It’s another thing entirely to show that something widely released and sold as a company’s flagship product is nonprivate.”
The team informed OpenAI of the exploit in August. Thankfully, it’s now been patched out — but as the researchers warned, that band-aid won’t fix the underlying vulnerabilities.
More on ChatGPT: Lawyer Fired for Using ChatGPT Says He Will Keep Using AI Tools
Share This Article