Google's Gemini 3.1 Pro is here, and it just doubled its reasoning score

3 weeks ago 20

Follow ZDNET: Add us as a preferred source on Google.

ZDNET's key takeaways

Gemini 3.1 Pro is now available.
It builds on the benchmark progress Gemini 3 established for Google.
Model capabilities are ultimately relative, one expert said.

Another week, another "smarter" model -- this time from Google, which just released Gemini 3.1 Pro.

Gemini 3 outperformed several competitor models since its release in November, beating Copilot in a few of our in-house task tests, and has generally received praise from users. Google said this latest Gemini model, announced Thursday, achieved "more than double the reasoning performance of 3 Pro" in testing, based on its 77.1% score on the ARC-AGI-2 benchmark for "entirely new logic patterns."

Also: Gemini vs. Copilot: I compared the AI tools on 7 everyday tasks, and there's a clear winner

The latest model follows a "major upgrade" to Gemini 3 Deep Think last week, which boasted new capabilities in chemistry and physics alongside new accomplishments in math and coding, according to Google. The company said the Gemini 3 Deep Think upgrade was built to address "tough research challenges -- where problems often lack clear guardrails or a single correct solution and data is often messy or incomplete." Google said Gemini 3.1 Pro undergirds that science-heavy investment, calling the model the "upgraded core intelligence that makes those breakthroughs possible."

Late last year, Gemini 3 scored a new high of 38.3% across all currently available models on the Humanity's Last Exam (HLE) benchmark test. Developed to combat increasingly beatable industry-standard benchmarks and better measure model progress against human ability, HLE is meant to be a more rigorous test, though benchmarks alone aren't sufficient to determine performance.

According to Google, Gemini 3.1 Pro now bests that score at 44.4%. The Deep Think upgrade technically scored higher at 48.4%, but it's a mode, rather than an AI model itself, and therefore leverages longer inference times for stronger reasoning performance. Similarly, the Deep Think update scored 84.6% -- higher than 3.1 Pro's aforementioned 77.1% -- on the ARC-AGI-2 logic benchmark.

Also: The making of Gemini 3 - how Google's slow and steady approach won the AI race (for now)

Since 3.1 Pro is designed for daily use, its benchmarks are still notable when compared to Deep Think's, considering that mode is for heavier-weight science and engineering tasks.

All that said, Anthropic's Claude Opus 4.6 still tops the Center for AI Safety (CAIS) text capability leaderboard (for reasoning and other text-based queries), which averages other relevant benchmark scores outside of HLE. Anthropic's Opus 4.5, Sonnet 4.5, and Opus 4.6 also beat Gemini 3 in terms of safety, according to the CAIS risk assessment leaderboard.

Hype management

Benchmark records aside, the lifecycle of a model doesn't end with a splashy release. At the current rate of AI development, new models are impressive only in relative terms to their competition -- time and testing will tell where the 3.1 Pro excels or fails. Gemini 3 gives the new model a strong foundation, but that may only last until the next lab releases a state-of-the-art upgrade.

Also: Inside Google's AI plan to end Android developer toil - and speed up innovation

"The test numbers seem to imply that it's got substantial improvement over Gemini 3, and Gemini 3 was pretty good, but I don't think we're really going to know right away, and it's not available except to the more expensive plans yet," said ZDNET senior contributing editor David Gewirtz of the release. "The shoe hasn't yet fallen on GPT 5.3 either, and I think when it does, we'll have a more universal set of upgrades that we can readdress."

While we wait for that model to drop, Gewirtz looked into GPT-5.3-Codex, OpenAI's most recent coding-specific release, which famously helped build itself.

Try it yourself

Developers can access Gemini 3.1 Pro in preview today through the API in Google's AI Studio, Android Studio, Google Antigravity, and Gemini CLI. Enterprise customers can try it in Vertex AI and Gemini Enterprise, and regular users can find it in NotebookLM and the Gemini app.

Read Entire Article