Does Gemini 3.1 Pro Matter?

February 20, 2026 · Episode Links & Takeaways

HEADLINES

Weird Vibes at AI India Summit

The AI Impact Summit kicked off in New Delhi — the first time this event has been held in a developing country. The selection of India was symbolically important, platforming a political call to address AI inequality. UN Secretary General Guterres called for AI to "belong to everyone" and proposed a Global Fund on AI. India itself declared ambitions to become a global AI power, with Adani and Reliance each committing $100B+ to local data centers. But all eyes were on the Altman-Amodei moment — the two refused to hold hands during an awkward group photo with PM Modi, while a chart from Epoch AI went viral suggesting Anthropic is on pace to overtake OpenAI in revenue by mid-year. Dario read a generic speech off his iPhone; Altman was more eloquent on iterative deployment and democratic access. Swyx nailed it in a post called "Why do AI conferences keep not getting AI?" — lamenting that the powers that be "care more about bad photo ops and hobnobbing with celebrities than they care about the builders." Ultimately, the less time spent caring about what's said at events like this and the more time spent building, the better off you'll be.


Walmart Turns to AI After Soft Earnings

Walmart is leaning heavily into AI as a growth driver after a mixed quarter — they briefly hit trillion-dollar company status but lost the revenue crown to Amazon after 17 years. New CEO John Furner flagged shopping assistant Sparky as core to their strategy: about half of online customers have used it, and those users ordered 35% more. US CEO David Guggina said Sparky is helping them "evolve from traditional search to intent-driven commerce."

Amazon Tracks Employee AI Usage

Amazon is using an internal system called Clarity to measure AI tool adoption across the company — not just engineering but supply chain optimization too. Employee evaluations now include questions about AI usage, asking staff how they've "accomplished more with less" and for examples where they "remained innovative, force multiplied using AI and delivered results while reducing or not growing headcount."

Accenture: No AI, No Promotion

Accenture is telling senior managers that AI adoption is now a requirement for career progression. Per an email viewed by the FT, "use of our key tools will be a visible input to talent discussions" during the summer promotion cycle. The FT noted AI holdouts are a major problem across consulting — senior figures are more set in their ways than juniors. One source called the internal tools "broken slop generators." This is an interesting bellwether. The biggest issue across our surveys at AIDB is the problem of time — people don't have time to learn the technology that would save them time, and most companies don't create specific carve-outs for learning. To the extent it's a tool quality issue, the second most frequent complaint is "at home I'm using Opus 4.6, and at work I have a terrible old version of Copilot." To understand why Accenture is pushing this hard, you only need to glance at their stock — down 17% YTD and 45% over the past year.

MAIN STORY

Does Gemini 3.1 Pro Matter?

We are now firmly in an era where state-of-the-art on benchmarks rotates weekly between labs — and raw benchmark leadership feels less significant as a barometer of a model's importance than ever before. The question with any new release isn't "is it the best?" but "what does it do uniquely well, and where does it fit in your model portfolio?" Gemini 3.1 Pro matters not because it tops some leaderboards (though it does), but because it's pushing on the cost frontier and flexing multimodal capabilities that the other frontier models simply can't match.

GEMINI RISING?

Benchmarks: SOTA Across the Board, But Table Stakes
At first blush there's a lot to be impressed with — but benchmark leadership is now table stakes.
77.1% on ARC-AGI-2 (up from 31.1% for Gemini 3 Pro, vs. 68.8% for Opus 4.6). SOTA on TerminalBench 2.0 at 68.5%. Near-SOTA on SWE-bench Verified at 80.6% (Opus 4.6: 80.8%). Jumped from 6th to 1st on Artificial Analysis's overall intelligence index, leading 6 of their 10 evaluations. One weak spot: GDPval real-world agentic tasks, where it trailed Sonnet 4.6, Opus 4.6, GPT-5.2, and GLM-5.

The Cost Frontier Is the Real Story
Benchmark leadership lasts weeks, not quarters. The real game is making intelligence ambient and cheap.
Gemini 3.1 Pro achieved 77.1% on ARC-AGI-2 at less than a dollar per task — compared to $3.60 for Opus 4.6. Pricing stays at $2/$12 per million tokens (same as Gemini 3 Pro), meaning Google doubled the intelligence at zero incremental cost. Artificial Analysis noted it costs less than half as much as Opus 4.6 to run.

Multimodal Is Where Gemini Flexes
Digging deeper, the things people are using Gemini 3.1 Pro for are just a little bit different than the other models.
Alongside the model, Google Labs launched Photoshoot for the Pomelli app — product photography from a single image. Sundar Pichai's launch tweet got ~1M views; the Photoshoot tweet hit 12.2M. Replit launched Replit Animation powered by Gemini 3.1 Pro for vibe-coding infographic videos. And the community demos lean heavily into spatial and scientific work — Daniel Z vibe-coded a double wishbone suspension simulation, DeepMind shared a realistic city builder app, and Jeff Dean showed heat transfer analysis from a CAD file turned into visual representation.

First Impressions and Where It Fits
The greatest gains won't come from switching wholesale from one model to the next, but from understanding what each model does best.
Early coding impressions are positive but limited — the model is still rolling out across Google's ecosystem, which is itself a challenge (as Ethan Mollick has noted, the Google AI landscape is so diverse it's hard to know what lives where). Latent Space acknowledged there isn't much to say beyond a catch-up and partial overtake. The deeper point is that Gemini's multimodal capabilities enable products and use cases that aren't possible with other models — and that matters as we head deeper into the agent age.