OpenAI’s New ChatGPT Agent Tries to Do It All

Isa Fulford, the research lead for OpenAI’s new ChatGPT agent, needed to order a bunch of cupcakes, so she asked the AI tool to do it for her. “I was very specific about what I wanted, and it was a lot of cupcakes,” she says. “That one took almost an hour—but it was easier than me doing it myself, because I didn’t want to do it.”

OpenAI has launched a new agent for ChatGPT that uses a virtual browser to complete tasks and can generate downloadable files, specifically PowerPoint presentations and Excel spreadsheets. While not a full replacement for the Microsoft suite of workplace tools, the features included in this agent from OpenAI could obviate some users’ reliance on Microsoft’s enterprise software. The two companies are longtime partners and currently in contract negotiations over ongoing access to OpenAI’s models.

The release is part of OpenAI’s ongoing efforts to turn its nearly three-year-old chatbot into a money-making product. No small feat, despite the tool’s millions of users, when you factor in the costs to train and run powerful AI models as well as the exorbitant salaries required to retain top-tier staff members.

An agent, in this context, refers to an AI tool that is able to—or at least attempts to—navigate third-party software and websites and make decisions on its journey to complete digital tasks, following an initial set of instructions from the user. “Agent” is the buzziest of buzzwords right now for companies looking to sell generative AI tools, especially those with an eye on enterprise customers.

“We’ve tried to build a product with a whole lot of enterprise use cases,” says Yash Kumar, the product lead on the ChatGPT agent. In addition to its file-generating capabilities, the agent can fill out online forms, use a programming terminal, and make calls to public APIs to online services like Google Drive and SharePoint.

This isn’t the first agent released by OpenAI in 2025. The new ChatGPT agent brings together aspects of OpenAI’s web-browsing Operator and its long-processing deep research features, both released earlier this year and considered to be agents by the startup. “I was on the deep research team, and Yash was on the Operator team,” Fulford says. “We realized that the two products are very complementary, and basically decided to combine teams.” The ChatGPT agent can switch between interacting with a visual browser, where it can click around like Operator does, and a text-based browser, where it can process loads of websites like deep research does.

The rollout of the ChatGPT agent is coming first to Pro, Plus, and Team subscribers, starting today for Pro users. Enterprise and Education subs will likely receive access to the feature later in the summer. At launch, Pro users are generally capped at 400 agent prompts a month, with 40 prompts allowed for the other tiers of paying users. It’s unclear when this feature will roll out for free users of ChatGPT.

Fulford shared her experience of the agent taking an hour to procure cupcakes as an example of OpenAI’s tool needing a long time to complete a task during the testing process. Every request won’t require that kind of time investment. Still, users should be prepared to wait as agents frolic around the internet.

In a prelaunch demo for WIRED, Kumar used the ChatGPT agent to automate a range of tasks, from consumer uses like planning a date night, to enterprise-focused examples like parsing Excel sheets for a financial analyst and making a slide deck that unpacks Nvidia’s Q1 earnings. Whereas planning a night out with the ChatGPT agent—going through your calendar, finding a restaurant with availability—may take five minutes, generating an earnings-based slide deck is more research-intensive and may take around 25 minutes. “You can do as many things as you want in parallel,” Kumar says. According to him, an average task with the ChatGPT agent takes around 10 or 15 minutes.

From potentially knowing the types of cuisine my partner prefers, based on past chats, to building a slide deck with formatting that’s aligned with what I may usually request, many of these potential tasks could benefit from accessing ChatGPT’s memory feature. Even though OpenAI wants to integrate memory with the ChatGPT agent eventually, it won’t be part of the initial launch. “It’s not that we don’t think it’s safe,” Kumar says. “We’re just taking an extra precaution.” He mentions the potential for prompt injection attacks as one example of why OpenAI wants to learn more before hooking up the ChatGPT agent to stored user memories.

Both of the OpenAI staff members emphasized that having the user still feel like they are in control, even as the agent automates tasks, is critical. “We have a list of websites where we think it’s risky to go. These include things like social media or financial transactions,” Kumar says. Building upon the “watch mode” rolled out with Operator earlier this year, the agent has a similar setting where software tasks deemed to involve a high level of personal risk require the user to watch the AI tool actively and not swipe away from the web page.

After my call with OpenAI, the afternoon before the launch of this agent, it was a new “replay” feature that I couldn’t get out of my head. “You can replay the conversation,” he says. “Before agent, a lot of the conversations were not as long, relatively speaking.” I tried to imagine what the almost hour-long screen recording of Fulford’s agent looked like as it scrambled to find the perfect cupcakes. Where did it go first? Where did the tool potentially get lost? I pictured myself in five years, potentially speed-scrubbing through replays of my AI agent’s actions more often than clicking around the internet myself. If the era of AI agents sticks around, which is far from guaranteed, the way we use the web will fundamentally change.

Go to Source