OpenAI has introduced a new AI agent called Operator that's designed to make everyday tasks easier, from making dinner reservations and ordering groceries to filling out forms.
In a demo video posted Thursday, the company highlights how the AI agent can interact with web pages by typing, clicking and scrolling, when using a special browser. Users can describe the task they want to be done, but it can handle multiple requests at the same time, such as shopping on Etsy while making a dinner reservation elsewhere.
It can "see" via screenshots and "interact" in the same way a mouse and keyboard would allow within a browser, according to OpenAI. Operator, which OpenAI describes as "one of our first agents," is currently available in a limited preview.
With competitors like Google and Anthropic already offering similar AI agents, OpenAI is working to narrow the gap. It's also part of OpenAI's larger effort to make its generative AI even more useful by automating more aspects of daily life, potentially getting closer to delivering on the promise that it'll forever change the way we interact with technology.
"The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses," the company said in a blog post.
The tool is powered by a new model called the Computer-Using Agent (CUA), which combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. It's trained to interact with graphical user interfaces, including the buttons, menus and text fields people see on a screen.
If any problems arise, the company said Operator can use its reasoning capabilities to self-correct or return control to the user. It's also trained to ask the user to take over for tasks that require certain inputs, such as login credentials or payment details.
The tool is now available to paying Pro users in the US at operator.chatgpt.com.