Anthropic Accidentally Revealed Their Most Powerful Model Ever

March 27, 2026 · Episode Links & Takeaways

HEADLINES

Anthropic Confirms Claude Mythos After Data Leak

A data leak has revealed that Anthropic is testing a powerful new model called Claude Mythos. A draft blog post was found in an unsecured, publicly searchable database — part of a trove of nearly 3,000 unpublished assets. Anthropic confirmed the model is real, calling it a "step change" in performance and "the most capable we've built to date." Mythos represents a new tier above Opus — larger, more intelligent, and dramatically better at coding, academic reasoning, and cybersecurity. However, Anthropic says the model is expensive to serve and they're taking a slower, more cautious release approach, starting with early access customers focused on cybersecurity applications. For those wondering about the name "Capybara" floating around, it appears to be an alternate codename for the same model.

Gemini 3.1 Flash Live

Google has dropped a small voice model that could have big implications. Gemini 3.1 Flash Live brings real-time dialogue to voice models — most voice models until now have been turn-based, causing awkward stumbles and terrible interruption handling. Flash Live is designed to work like a human conversation with continuous back and forth. Home Depot already has the model deployed, citing major improvements in handling complex details like product codes and noisy environments. The obvious implication is for personal voice agents on mobile — and with Apple using Gemini to power the new Siri, the long winter of Siri not understanding a word we say may finally be ending.

Shopify Launches Tinker

One of my weirder predictions for 2026 was that Shopify would play an outsized role in the positive normalization of AI — and Tinker is exactly what that looks like. It's a free mobile app with over a hundred AI tools for e-commerce, letting merchants generate logos, product photos, advertising videos and more in a playful, experimental canvas. The entire concept is about flattening the learning curve: apps are arranged by outcome, prompts are handled on the back end, and Shopify's director of product Rousseau Kazi put it perfectly — "If you want more artists, lower the cost of paint."

OpenAI: Codex Plugins, Erotica Shelved, and IPO Pressure

Codex got a big upgrade with the integration of plugins, and OpenAI used the occasion to twist the knife on Anthropic's recent Claude Code rate limit changes — resetting Codex usage limits across all plans and telling developers to "build unlimited things." Meanwhile, OpenAI has shelved adult mode indefinitely, consolidating resources around coding and enterprise. The advisory council was unanimously against it, the age detection system had a 12% failure rate, and staff were departing over the issue. This is part of a broader pattern of OpenAI killing side quests — Sora, instant checkout, now erotica — and that's a good thing, not flailing. Nothing kills a business faster than sunk cost fallacy.

On the competition front, Anthropic is reportedly eyeing a Q4 IPO, possibly as soon as October, which puts OpenAI on the clock since Altman has said he'd prefer to go first.

MAIN STORY

The Era of Vertical AI Models

A persistent question in AI has been how much custom models would matter. Early evidence pointed aggressively toward the Bitter Lesson — Rich Sutton's famous 2019 essay arguing that brute-force compute always beats carefully encoded human knowledge. Bloomberg GPT, the poster child for domain-specific training, got smoked by general models. But coming into 2026, a new type of data started to change the equation: last-mile usage data, the millions of real interactions that companies like Cursor and Intercom sit on. This week brought two major proof points suggesting the era of vertical AI models may have arrived.

POST TRAINING FOR SUCCESS

Cursor's Composer 2
"Just" Kimi K2.5 — except it beats Opus 4.6.
Cursor released Composer 2 earlier this month, matching GPT-5.4 and beating Opus 4.6 on coding benchmarks at much lower cost. Controversy erupted when an X user revealed the model was built on Kimi K2.5 with reinforcement learning applied. Cursor didn't deny it — dev relations rep Lee Robinson confirmed it started from an open-source base, with three-quarters of compute coming from their own training. The takeaway: reinforcement learning on quality interaction data can vault an adequate base model into the top tier.

Intercom's Fin Apex
The highest-performing, fastest, and cheapest customer service model.
Intercom CEO Eoghan McCabe announced Fin Apex, a dedicated customer service model that beats GPT-5.4 and Opus 4.5 on resolution rates — with 65% fewer hallucinations and dramatically lower cost. CPO Paul Adams framed the implications sharply: vertical models can and will outperform general models, durable differentiation will move down the stack to the model layer, and successful companies will need to be full-stack across app, AI, and model layers.

Decagon
Over 80% of model traffic now runs on in-house models.
Decagon co-founder Ashwin Sreenivas revealed they've built a network of specialized models handling different parts of the customer service interaction — detection, orchestration, response generation, and evaluation. Rather than relying on a single general model, they optimize each layer independently for speed and quality.

The Bigger Picture
Post-training from experience — the next phase of the Bitter Lesson.
Clém Delangue from Hugging Face noted this is part of a broader wave, with Pinterest, Airbnb, Notion, Cursor, and now Intercom all finding it better, cheaper, and faster to train open models in-house. Eoghan McCabe's companion post referenced Andrej Karpathy's prediction of "speciation" in AI — smaller models with the cognitive core, optimized for specific tasks. Interestingly, Richard Sutton himself may have predicted this exact dynamic on the Dwarkesh podcast last year, suggesting that systems trained from experience could supersede those built on human knowledge. These new vertical models are post-trained from experience — exactly as Sutton described. This doesn't mean every company with customer data will spin up their own model, but these results are encouraging enough that many more will try.