All of AI's New Models and Tools

April 9, 2026 · Episode Links & Takeaways

HEADLINES

The Spud Scare — And Why It Wasn't True

Thursday morning brought a brief panic: Axios reported that OpenAI also plans a staggered rollout of their new Spud model, citing cybersecurity risks, apparently matching Anthropic's Mythos approach. The AI world had thoughts. Dan Shipper's take landed best: "The new status symbol is making a model so powerful you can't release it." Within hours though, Shipper updated — he'd spoken to OpenAI and the Axios story conflated two things. OpenAI does have a cyber product being tested with a trusted group, but that's not Spud. The story has been corrected. We are playing with live ammunition out here.

Perplexity's Revenue Has Gone Vertical

Perplexity's bet on personal agents is paying off fast. Between shifting to usage-based pricing and launching Perplexity Computer in February, the company effectively doubled revenue in a single quarter — hitting $450M ARR with 100 million monthly active users and tens of thousands of enterprise clients. The finance world in particular seems to have found their tool: Geiger Capital summed up the vibe on X. One skeptic's counterpoint: Cowork and the GPT SuperApp will eventually "mog" this. Maybe — but the growth rate itself is hard to argue with.

GitHub Is Straining Under the Agentic Coding Wave

Last year, GitHub celebrated hitting 1 billion code commits for the first time. This year they're seeing 275 million commits per week — on pace for 14 billion by year end, with numbers still climbing. COO Kyle Daigle: "Since January, every month, every week almost now has some new peak stat for the highest usage rate ever." Commits to public repos from Claude Code alone have swelled 25x in six months. The surge is revealing limits in GitHub's infrastructure — outages are becoming more frequent. OpenClaw creator Peter Steinberger complained that the API wasn't designed with agents in mind. Daigle says they're pushing hard on CPUs and scaling. One more piece of evidence for just how fast things are changing.

Anthropic Loses the DC Round — But the Fight Continues

A federal appeals court in DC denied Anthropic's application to suspend their Pentagon supply chain risk designation while the case works through the courts. The three-judge panel ruled the equitable balance favors the government — "judicial management of how the Department of War secures vital AI technology during an active military conflict" outweighs the financial harm to a single private company. The important nuance: there are two separate lawsuits. The California injunction still stands, meaning non-Pentagon agencies don't have to cancel Anthropic contracts, and Anthropic's models have already been restored to usai.gov. Legal analyst Charlie Bullock told The Information he was unsurprised, noting two of the three DC Circuit judges have been highly sympathetic to the Trump administration's executive authority claims. He predicts Anthropic will ultimately succeed at the Supreme Court — and the case is moving fast enough to get there this year.

MAIN STORY

All of AI's New Models and Tools

A huge part of this week's discourse has been about models we don't actually have access to — Mythos, and briefly Spud. But the rest of the AI industry is not slouching, and Anthropic themselves gave us something genuinely powerful to play with. Let's run through everything that actually shipped.

PLENTY TO PLAY WITH

Meta Muse Spark
"We're back" — but the benchmarks tell a more nuanced story.
Meta's first model release in over a year is Muse Spark, the first model from Meta Superintelligence Labs — the division assembled under Alexandr Wang, who came aboard through the partial acquisition of Scale AI. The Llama name is gone; the Muse family is natively multimodal, with tool use, visual chain of thought, and multi-agent orchestration. On SWE-Bench Pro it scored 52.4, putting it within a few points of Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 for coding — in the mix, but not leading. Where Meta buries those numbers and leads instead is visual comprehension: 86.4 on CharXiv Reasoning, beating Gemini 3.1 Pro by six points. This matches their stated purpose — unlike the other labs' increasing focus on coding and enterprise, Muse Spark is designed to drive personal agents. Zuckerberg on Threads: the model is "particularly strong in areas related to personal superintelligence like visual understanding, health, social content, shopping, games, and more." And notably, even in the personal realm: "We are building products that don't just answer your questions, but act as agents that do things for you." Ethan Mollick's honest assessment after trying it: "Fine so far, but really doesn't match the current Big Three models." François Chollet was harsher — "overoptimized for public benchmark numbers at the detriment of everything else." Wang responded directly, noting they were "pleasantly surprised by users' feedback in areas like visual coding, writing style, and reasoning queries." Former Meta AI researcher Vasuman put the most charitable frame on it: "Never fade Zuck."

Z.AI's GLM-5.1
The first open source model to beat the Western leaders on coding — and it was trained on Huawei chips.
Getting completely overshadowed by the Mythos announcement, Z.AI released GLM-5.1 — the first open source model to overtake leading Western models on coding benchmarks. It scored 58.4 on SWE-Bench Pro, beating GPT-5.4 at 57.7 and Opus 4.6 at 57.3. It's a 754B parameter model (not running on your Mac Mini), but it's a full open source release with commercial licensing — meaning developers can now build on top of current-generation state-of-the-art models for the first time. The long-horizon capabilities are the headline: Z.AI claims the model spent eight hours autonomously building a Linux desktop using a self-review loop, and carried out over 600 iterations using more than 6,000 tool calls on a database optimization benchmark to deliver 6x the performance of a standard session. Z.AI leader Lou: "Agents could do about 20 steps by the end of last year. GLM-5.1 can do 1,700 right now." It was trained entirely on Huawei chips — another demonstration that the Chinese hardware stack can produce powerful results. The US is months ahead, not years.

Claude Managed Agents
"Go from prototype to launch in days" — Anthropic builds the deployment stack.
The biggest practical release of the week from Anthropic isn't Mythos — it's Claude Managed Agents, launched Wednesday and seen 16 million times on their announcement tweet. The pitch: everything you need to build and deploy agents at scale, without having to build and maintain all the backend infrastructure yourself. Wired's description is the clearest: managed agents give developers a pre-built agent harness (the software tools, memory system, and other infrastructure that wraps around a model to help it work agentically), plus a built-in sandboxed environment, cloud-based autonomous running for hours, permission controls, and multi-agent monitoring. Anthropic's Katelyn Lessee: "A lot of customers we talked to previously had a whole bunch of engineers whose job it would have been to build and run those systems at scale. Now that we're giving them that out of the box, those engineers can focus on core competencies." The Notion demo was telling: product manager Eric Liu could drop a managed agent into Notion to handle client onboarding — running natively, with full access, in a virtual session instead of days of setup. Early builders note the current limitation is persistent memory across sessions, which means the best use cases right now are transactional and discrete: event-triggered patches, scheduled daily briefs, fire-and-forget tasks via Slack. Jared Orkin's clear-eyed take: "You no longer need an engineer to run an overnight marketing analysis. You need one sharp operator and an afternoon. Someone still has to tune the prompt every Friday and act on the brief by 9 AM Monday. That's the job."

Notebooks in Gemini
Bringing NotebookLM's magic directly into the Gemini app.
It sounds like a small quality-of-life update, but this is a more significant shift than it first appears. Google introduced notebooks as a native feature in Gemini — allowing users to collate documents, sources, and custom instruction sets organized by project, much like Projects in ChatGPT or Claude. Previously, Gemini's GEMS feature was sort of-but-not-exactly equivalent. This brings NotebookLM's resource management directly into the Gemini app itself. Google VP Josh Woodward: "Most AI chatbots give you basic projects. Gemini just built you a second brain." The broader point: one of the most persistent criticisms of Google's AI suite is that even if people like the models, the product surface is so spread out across different apps that it becomes confusing. This is Google starting to make features portable across those surfaces — so effectively any door you walk in gets you to the same room. For many Gemini users' day-to-day experience, this will be a bigger improvement than a new model version would have been.