The Models Trying to Replace Fable

Jun 18, 2026 · Episode Links & Takeaways

HEADLINES

What the G7 Taught Us About AI and Global Power

The G7 meeting in France this week was the most AI-heavy gathering of world leaders in the forum's history — and the subtext was impossible to miss. For the first time, US allies are reckoning with the fact that access to frontier AI isn't guaranteed. Dario Amodei and Demis Hassabis led the call for international cooperation, pushing for structured model access, chip deals excluding China, and a unified approach to AI risk. But when it came to the rubber meeting the road — specifically, any concrete movement on the Mythos and Fable ban — the US gave nothing. Trump's entire public comment amounted to: "Going fine."

The European mood was described as "particularly sour." Leaders who came expecting to strategize against China found themselves instead pleading for access to US models. UK Prime Minister Keir Starmer reportedly requested a carve-out for British nationals and was denied. And reporting from Wired added context to why: Anthropic had previously granted Mythos access to Korean telecom giant SK Telecom, and US government concern about its supposed China ties triggered the revocation — and apparently accelerated the broader ban. Whether that concern is justified is another matter: one analyst flatly said SK Telecom has nothing to do with Huawei or China, while a DC observer noted the phrase "China-linked" can sometimes be a thought-terminating move. Either way, the net result from the G7 is that most watchers came away with their timelines for Fable's return extended, not shortened.

Noam Shazeer Leaves Google for OpenAI

OpenAI has signed legendary AI researcher Noam Shazeer away from Google — a move that's hard to overstate. Shazeer was a lead author on "Attention Is All You Need," the 2017 paper that introduced the transformer and kicked off the LLM revolution. Google paid $2.7 billion to bring him back in 2024 via the Character AI acqui-hire, and now less than two years later he's walking out the door. Sam Altman called it a move he'd wanted since OpenAI's founding. As for Google: the rumor mill has been quiet on Gemini 3.5 Pro, which was supposed to land in June, and developer Yuchen Jin writes that Noam's departure "makes Gemini's future feel uncertain."

ChatGPT Sunsetting Pulse

The removal of side quests continues: OpenAI is killing Pulse, the personalized daily AI briefing it launched last year. The replacement pitch is scheduled tasks, which will now roll out to all paid tiers including the cut-price Go plan. One subscriber on the announcement thread put it plainly: "After sunsetting 4.5 and Pulse, will there be any reason to keep Pro subscription for someone who is not a coder and has zero interest in Codex?" The short answer, as noted on air, is that OpenAI probably isn't too worried about that right now.

MAIN STORY

The Models Trying to Replace Fable

We're now a week into the Fable 5 shutdown, and the AI builder community has moved from mourning to MacGyvering. The question animating builders, enterprises, and investors alike is the same: what combinations of models can approximate Fable-level performance — at a price that actually works? What's emerged isn't a single answer but a set of converging experiments, all pointing toward a future where inference optimization and smart routing are first-class competitive advantages rather than afterthoughts.

CAN OPEN SOURCE FILL THE FABLE GAP?

Kimi 2.7 Code
Strong benchmarks, mixed real-world results.
Moonshot's latest coding model claims a 22% improvement on Kimi CodeBench and 30% fewer reasoning tokens than its predecessor. But early practitioner reports are less enthusiastic than past Kimi releases — VentureBeat noted that teams already running K2.6 in production will benefit, but it's unlikely to convert people who weren't already Kimi users. On the Agent Arena leaderboard, the model ranked 19th overall and only 6th among open models.

VibeThinker 3B
A tiny model with outsized reasoning — and a design philosophy worth watching.
WeiboAI's VibeThinker 3B is generating buzz for what it represents more than what it delivers today: a 3-billion-parameter model posting benchmark scores in the range of Claude Opus 4.5. The key is that it's been super-tuned for reasoning while offloading knowledge to external databases. As researcher Drew Black put it: "Take a small model and crank its reasoning power up to 11. Then knowledge can live outside the model in a database." It's not an enterprise solution yet, but it points in a clear direction — capable reasoning on local hardware, at a fraction of the compute cost.

GLM 5.2 (ZAI)
The Chinese open model getting the most serious attention right now.
ZAI's GLM 5.2 dropped within days of the Fable ban, and the timing wasn't lost on anyone. On the frontend code arena it ranked behind Fable 5 but ahead of all Opus models; on Design Arena it went ahead of Fable. The headline number getting shared: six cents versus 49 cents to build a comparable landing page versus Opus 4.8. There's evidence of some benchmark tuning — internal evals have it behind GPT 5.5 and Opus 4.8 — and there's plenty of chatter about distillation (the model apparently insists it's actually Claude). But the broader sentiment is clear: the gap between this and frontier proprietary models is smaller than most people expected, and the cost difference is enormous. The question isn't whether everyone switches to GLM 5.2; it's that there's more reason to seriously consider it than ever before.

Microsoft x DeepSeek
The irony writes itself.
Microsoft is reportedly considering a locally hosted fine-tune of DeepSeek V4 to power Copilot CoWork — and it's not just theoretical. Axios reports the company expects to make a lower-cost model available in coming weeks. The move is partly driven by CoWork's shift to usage-based pricing and the pressure that creates to reduce per-token costs. CNBC's Deirdre Bosa framed the bigger strategic question: does this give the Chinese AI stack a foot in the door in the US, given that DeepSeek is optimizing for Huawei chips? And as Gail Weiner noted, the irony is sharp: the US government bans Fable and Mythos because the weights are a national security asset — and simultaneously, the most embedded US enterprise software company on Earth is quietly shipping a Chinese model inside the productivity stack of every Fortune 500 running Microsoft 365.

Composer 2.5
Cost-efficient coding, with caveats.
Cursor's Composer 2.5 — post-trained on a Kimi foundation for coding tasks — has been out a couple of weeks now, and the ground-level picture is mixed. Some practitioners are raving: Ryan Shaw says he hasn't used anything else in weeks and finds it stronger than GPT 5.5 medium. Engineer Yasser's framing cuts right to it: 65% score for $1 versus 70% for $12 with Fable — why pay 12x for a 5% improvement? But after Artificial Analysis updated its benchmarks to emphasize agentic coding, Composer fell meaningfully relative to prior rankings, landing closer to the open Chinese models than to GPT 4.7 as previously positioned. The model seems to shine most for high-volume coding work where cost efficiency compounds; it's less compelling as a like-for-like Fable substitute.

OpenRouter Fusion
"Fable-level intelligence at half the price" — via model panels.
OpenRouter has shipped Fusion, a compound model API that fans prompts out to a panel of models in parallel, has a judge model evaluate each response, and then synthesizes a final answer. Their benchmark: a panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro fused together beat solo GPT 5.5 and solo Opus 4.8 outright, and came within 1% of Fable 5 at roughly half the cost. Investor Anisha Charia's take is worth sitting with: compound workflows match the way serious AI users already work — using adversarial models to generate, review, iterate, and test. The implication is that model panels or councils may become the default, with routing logic as the real competitive differentiator.

Harvey
The vanguard of what enterprise model routing looks like in practice.
Harvey provides perhaps the clearest window into where all of this is headed for serious enterprise deployments. President Gabe Pereyra laid out the challenge: the original thesis was that token costs would halve every six months. Instead, the shift from chat to agents caused costs to explode — more agents, longer runs, pricier frontier models. Harvey's answer is to build the infrastructure layer that helps law firms manage this complexity rather than just buying the most powerful model available and hoping for the best. The concrete experiment: Harvey worked with Fireworks to run an open-weight GLM 5.1 as the primary worker model, with Opus 4.7 as a closed frontier advisor for high-stakes tasks. The result was both cheaper and more performant than using Opus across the board. As Patrick Ojo put it: "The insight isn't that open source beat frontier — it's that smart routing beat brute force."