OpenAI Threatening to Ban Users for Asking Strawberry About Its Reasoning

“Additional violations of this policy may result in loss of access to GPT-4o with Reasoning.”

Ban Hammer

OpenAI claims that its latest AI model, code-named “Strawberry” and released as o1-preview, is supposed to be capable of “reasoning.” But understanding how its thought process works, apparently, is something that the ChatGPT maker is serious about keeping off-limits.

As Ars Technica reports, OpenAI is now threatening to ban users that try to get the large language model to reveal how it thinks — a glaring example of how the company has long since abandoned its original vision of championing open source AI.

According to accounts on social media, users are receiving emails from the Microsoft-backed startup informing them that their requests made to ChatGPT have been flagged for “attempting to circumvent safeguards.”

“Additional violations of this policy may result in loss of access to GPT-4o with Reasoning,” the emails state.

Hush Hush

This clampdown is more than a bit ironic given that a lot of the hype around Strawberry was built around its “chain-of-thought” reasoning that allowed the AI to articulate how it arrived at an answer, step by step. OpenAI chief technology officer Mira Murati called this a “new paradigm” for the technology.

Reports vary on what triggers the violations. As Ars found, some users claim that using the term “reasoning trace” is what got them in trouble. Others say that even using the word “reasoning” on its own was enough to alert OpenAI’s systems. Users can still see what is essentially a summary of Strawberry’s thought process, but it’s cobbled together by a second AI model and is heavily watered-down.

In a blog post, OpenAI argues that it needs to hide the chain-of-thought so that it wouldn’t need to put a filter on how its AI thinks, in case it says stuff that isn’t compliant with safety policies while thinking out loud. That way, developers can safely see its “raw” thought process behind-the-scenes.

But as the company freely admits, this measure also helps it maintain a “competitive advantage,” staving off competitors from trying to ride its coattails.

Red Alert

The flipside of this approach, however, is that concentrates more responsibility for aligning the language language model into the hands of OpenAI, instead of democratizing it. That poses a problem for red-teamers, or programmers that try to hack AI models to make them safer.

“I’m not at all happy about this policy decision,” AI researcher Simon Willison wrote on his blog, as quoted by Ars. “As someone who develops against LLMs, interpretability and transparency are everything to me — the idea that I can run a complex prompt and have key details of how that prompt was evaluated hidden from me feels like a big step backwards.”

As it stands, it seems that OpenAI is continuing down a path of keeping its AI models an ever more opaque black box.

More on OpenAI: OpenAI’s Strawberry “Thought Process” Sometimes Shows It Scheming to Trick Users

Share This Article

Go to Source