Sonnet 4.6 Changes the Agent Math

February 18, 2026 · Episode Links & Takeaways

HEADLINES

Apple Is Ramping Up AI Wearables

Glasses, a Pendant, and Camera AirPods
Bloomberg's Mark Gurman reports Apple is fast-tracking a trio of AI wearables: smart glasses competing directly with Meta Ray-Bans, a pendant wearable as pin or necklace, and AirPods with expanded AI capabilities. All three connect to an iPhone as a hands-free interface for AI Siri. The pendant and AirPods are the low-end offering — low-res cameras providing context to the assistant but not good enough for photos. The glasses are more upscale, with high-res cameras, speakers, and microphones — December is reportedly the target for production start, public release next year. Between this, the March 4th hardware event, and Mac Mini proliferation as the device of choice for OpenClaw agents, there's been a huge discussion about Apple's AI strategy. The hyperscalers spend hundreds of billions on data centers while Apple guides a 19% CapEx drop — shipping Mac Minis, licensing Google's models for a billion a year, and potentially getting this trifecta of AI wearables to market alongside functional AI Siri.

Spotify's Top Devs Haven't Written a Line of Code Since December

During last week's earnings call, Spotify co-CEO Gustav Söderström reported his most senior engineers haven't written a single line of code since December. He gave a concrete example: a developer gives Claude instructions for a bug fix over Slack on their phone during the morning commute, and Spotify's internal platform validates it and pushes to production before they arrive at the office. Söderström emphasized: "This is a big change. It is real. It is happening fast. We are retooling the entire company for this age."

Meta Signs Massive NVIDIA Deal for "Millions" of AI Chips

Big tech partnering with NVIDIA is an everyday occurrence — the story here is the scale. The largest data centers currently contain several hundred thousand GPUs; a purchase of millions implies Meta plans to build multiple new world-leading-scale facilities. NVIDIA only produced around 5 million AI chips last year, so this could be a strategic move to corner the market. The multi-year deal covers current Blackwell GPUs, next-gen Rubin chips, standalone Grace CPUs, and next-gen networking. Analysts estimate tens of billions, soaking up a large portion of Meta's $135B 2026 CapEx plan. For Meta, it's an interesting commitment to paying the NVIDIA tax rather than pursuing alternatives — though it may simply be about volume, since NVIDIA is the only chipmaker with a proven track record at this scale. Jensen Huang said "No one deploys AI at Meta's scale."

Wall Street Pauses the Software Selloff

Both major indices eked out slight gains on Tuesday as sentiment began a cautious turnaround. The Mag 7 is at 5-month lows, and AI-exposed software firms like Salesforce and Adobe are down 20%+ on the year. ServiceNow CEO Bill McDermott announced he'd buy $3M in company stock — the first major SaaS CEO to buy during this bloodbath — while multiple executives cancelled future selling plans. Several private software companies (McAfee, Rocket Software, Perforce) released earnings early to show they haven't been disrupted by AI. Way too early to say the SaaSpocalypse is over, but this week is giving investors a slight breather.

China's AI Giveaway War for Chinese New Year

This Chinese New Year, the AI companies are the ones handing out the red envelopes. Alibaba, Tencent, and ByteDance are all running massive giveaways to capture chatbot users — part of the push to get people onto nascent AI shopping agents. ByteDance said their Monday night promotion garnered 1.9 billion chatbot interactions; Alibaba reported 130 million first-time users so far this month. But Chinese AI companies face a tough monetization path — all the major labs still offer advanced features for free. Beijing-based AI founder Leon Fan noted: "In China, consumers know they can always find most online services for free. If one major AI chatbot started charging, people would immediately migrate."

MAIN STORY

New AI Models: Sonnet 4.6 and Grok 4.20

This was a big week for model releases — and the thing that's notable is just how different model evaluation is getting. It's much more discrete, much more specific, and honestly much more useful. Increasingly the discourse isn't about raw capability but about what specifically a new model adds to the capability set and how it plugs into people's model stack. The questions are about cost, contextual performance, discrete capabilities, and how those add up to new value around specific use cases.

WHY NEW MODELS MATTER

Sonnet 4.6 — "Opus-Level Intelligence at Sonnet Pricing"
"A different category of what you can build."

Sonnet 4.6 is Anthropic's clearest statement yet that the "plan with Opus, execute with Sonnet" architecture is becoming the optimal design. The model delivers Opus-level intelligence at $3/$15 per million tokens (vs. Opus at $5/$25) and is available to free users. It's the first Sonnet-class model with a million-token context window — opening the question of how much Opus usage was really just about that context window.

Computer Use Is Becoming a Marquee Capability
From 14.9% to 72.5% in 18 months

In the 18 months since Anthropic started tracking computer use via OS World benchmarks, Sonnet models have jumped from 14.9% to 72.5% — with the latest jump from 61.4% (Sonnet 4.5) to 72.5% (Sonnet 4.6), essentially matching Opus 4.6 at 72.7%. The model still lags behind the most skilled humans, but the rate of progress is remarkable. The point that we're on the verge of models that can use computers like humans — without APIs — is a powerful one. The demo showed Claude renewing someone's license plate at the DMV, signaling Anthropic isn't just going after enterprises but targeting everyday consumer use cases too.

Benchmarks and Early Impressions
"A 15 percentage point jump in performance and accuracy"

On coding: 79.6% on SWE-bench verified (vs. 77.2% for Sonnet 4.5, roughly in line with Opus 4.5). Now SOTA in agentic financial analysis and office task benchmarks, beating Opus 4.6. Artificial Analysis named it the new leader in GDPval-AA, slightly ahead of Opus 4.6 on agentic performance of real-world knowledge work tasks. Users preferred Sonnet 4.6 over Sonnet 4.5 70% of the time and even preferred it to Opus 4.5 59% of the time, finding it less prone to over-engineering, laziness, and hallucinations. One caveat from Artificial Analysis: Sonnet 4.6 used significantly more tokens than previous Sonnets and more than Opus 4.6, meaning the cost story isn't as simple as "cheaper model does better."

The Cost Dimension in Agent World
"Cost was the real bottleneck."

With agents looping hundreds of times per task, the pricing matters far more than people realize. Kailzer wrote: "Dropping to sonnet-tier pricing while staying near opus-level means the same budget goes 5x further. That's not a minor upgrade, that's a different category of what you can build." Zach Chmael added: "Opus-class reasoning at Sonnet pricing means you can actually afford to let agents think harder on every step without blowing through your API budget."

Kailzer (X) Price point analysis 
Zach Chmael (X) Cost bottleneck

OpenClaw Implications
"The best model for OpenClaw ever."

Alex Finn wrote: "This is the best model for OpenClaw ever. It is human level at computer use — the most important part of Claw — for a fraction of the price." Toma reported burning through a stupid amount of money on Opus 4.6, then switching: "It feels almost the same but costs a fifth as much. For pure coding, Opus is still better, but for agentic workflows inside OpenClaw, Sonnet 4.6 performs nearly as well." OpenClaw quickly pushed an update to officially support the new model.

Was This Supposed to Be Sonnet 5?
"They'd rather relabel than overpromise."

There's been a surprising amount of conversation around rumors this was originally Sonnet 5. Veer Masrani wrote that either Anthropic is saving "5" for a real generational shift (conservative branding), or we're entering the era of smaller, harder-won improvements. Sean Sullivan offered a more strategic read: "Sonnet 5 has been done for some time now, but Anthropic still has market leadership in API usage, meaning they don't have to drop it until someone comes up to compete." Without privileged information, it's very clear that across all the companies, we are in the era of smaller, harder-won improvements — though that might not be about scaling constraints so much as a response to consumer expectations after the cuddling OpenAI took when GPT-5 wasn't big enough to get people excited.

Veer Masrani (X) "Sonnet 5" rumors
Sean Sullivan (X) Strategic hold theory

Grok 4.20 Public Beta — The Elon Rorschach Test
"Unlike prior versions of Grok, 4.2 is able to learn rapidly."

Grok 4.20 is different from a standard release: it's a public beta without a benchmark scorecard, and it's not a fixed state — Musk says it will improve weekly. Musk claimed it'll be "an order of magnitude smarter and faster than Grok 4 when the public beta concludes next month." The signature feature is a multi-agent approach where four separate agents think independently, debate, and converge on an answer. The challenge evaluating any xAI release is the Elon Rorschach test: the X algorithm feeds you either endless praise or endless dunking depending on your priors. Among the few people who seem to exist between those paradigms, impressions are that it is, at minimum, improved — Dr. Derya Unutmaz reported strong results on biomedical questions. Benjamin De Kraker found the agent teamwork system "cool and well done" but noted the real value is when agents aren't all the same model. There's not quite enough to really know what to make of it yet — the thing to watch is whether it actually gets better rapidly as promised.

Elon Musk (X) Grok 4.2 beta announcement 
Elon Musk (X) "Order of magnitude smarter"
Derya Unutmaz (X) Biomedical testing
Benjamin De Kraker (X) Multi-agent critique

Dreamer — A New Platform for Personal Agents
"The most ambitious full stack consumer and coding agent startup I've ever seen."

A new platform called Dreamer launched in beta, focused on abstracting away all the complexity around agent design. Ben Tossell from Ben's Bites wrote: "2026 is the year of the personal agent. Dreamer is the closest I've seen to making that accessible to everyone." The platform lets you build agentic apps by talking — describe what you want, and an AI agent called Sidekick builds it in minutes with no infrastructure fuss. Swyx called it "the right form factor for mass-adopted personal software agents" and noted that "very unexpected things happen when you let normies build their own AI apps." Joanna Stern, formerly of the WSJ, called it "the vibe coding/agent tool for normies — super simple to build little tools without deploying anything to a server."

Dreamer (X) Launch Thread
David Singleton (Co-Founder) Announcement Blog Post
Ben Tossell (X) "Year of the personal agent"
Ben’s Bites Dreaming up Personal Agents
Swyx (X) AI Engineer POV
Joanna Stern (X) Agents for Normies

ALSO REFERENCED