Harness-as-a-Service

Nominally today's topic is Cursor's new Cursor SDK — a platform where, as Cursor's Lee Robinson put it, you can build local hackable agents with any model or ship products on top of managed cloud agents. But it's actually part of a broader phenomenon. In the past few weeks alone we've seen OpenAI update their Agents SDK, Anthropic release Claude managed agents, and Microsoft release hosted agents in Foundry. None of these are the same product, but they're all playing in similar space. I'm proposing a new name for the category: Harness-as-a-Service — a new infrastructure category where companies sell access to their agent runtime, the engine that turns an LLM into something that can actually do work, the same way AWS sells access to compute and Stripe sells access to payment rails.

Cursor (X) Cursor SDK announcement thread
Futurum Group Cursor 3.2 Reframes the IDE as an Agent Execution Runtime
OpenAI The next evolution of the Agents SDK

AGENT BUILDERS KIT

Three Phases of Agent Development
Weights, then context, now harness — each phase layered on the last.
"Aha" on Twitter wrote a nice summary of how the agent landscape has evolved. Phase one was weights — bigger models, more data, RLHF, fine-tuning. Phase two was context — prompt engineering, few-shot, chain of thought, RAG. The same frozen model could behave completely differently based on what was put in front of it. Phase three is the harness engineering phase, and the shift is fundamental: the question changed from what should we tell the model to what environment should the model operate in. The model now sits inside a harness that includes persistent memory, reusable skills, standardized protocols like MCP and A2A, execution sandboxes, approval gates, and observability layers. Each phase didn't replace the previous — it layered on top.

Sam Altman on Harness vs Model
"I no longer think of the harness and the model as entirely separable things."
In a recent Ben Thompson interview, Altman was asked how important the harness is to making agents actually work. He said it's hard to overstate how critical it is, adding that even he doesn't always know when he fires something off in Codex and it does something amazing how much credit goes to the model versus the harness. We now have two very different vectors of increasing AI capability — the underlying models, and improvements in the harness that surrounds them.

The OpenClaw Hobbyist Era
Like 1970s computer kits — for the few willing to solder it together.
Part of what made this year feel so different was an open harness — OpenClaw — even though no one was calling it that. But OpenClaw was not plug and play. You had to pick the model, write the system prompt, define the tools, wire the agent loop, manage context, handle errors, orchestrate sub-agents, store state, deploy and monitor it. Anders Carlson recently wrote on LinkedIn about a forgotten era of computing in the 1970s — between the Altair 8800 and the Apple II — when people interacted with computers by ordering a kit, soldering iron in hand, and assembling it themselves. The OpenClaw era of harnesses, in the ancient days of two and a half months ago, was structurally similar.

What Harness-as-a-Service Actually Is
You bring three things; everything underneath is handled.
With these new tools the agent loop is prebuilt, tool dispatch is prebuilt, sandboxing is prebuilt, streaming, error handling, context compression — all prebuilt and tuned by teams whose full-time job is making those layers excellent. You bring three things: which model you want, what tools the agent has access to, and what task you're handing it. This isn't a shift in scale, it's a shift in kind. Just like the PC era didn't destroy the hobbyists who liked control over components — the productivity revolution of the 1990s happened because users got Dell desktops, not because more people learned to assemble motherboards. That's the promise of this new phase. And the interesting twist: while these tools look on the surface like they're only for developers, because agents now handle the coding and infrastructure, the audience of people who can build with something like the Cursor SDK has expanded dramatically — including non-developers like me.

Mike Piccolo (X) "The harness is the backend"
Kayvon Jafarzadeh (X) On Cursor turning an "AI wrapper" into an actual moat
Gagan Saluja (X) "Building a usable agentic IDE is years of harness work, not weeks of model fine-tuning"
Dan McAteer (X) "The agent harness IS the platform"
Dan Shipper (X) On the rise of "Codex-native, Cowork-native, Cursor-native" apps

Harnesses Change Model Performance
Same model, same week, two harnesses, two completely different results.
A new report from Endor Labs found that GPT-5.5 operating within Cursor's harness set a new record on their security correctness benchmark — 23.5%, narrowly beating the previous leader of Cursor + Opus 4.7 at 22.9%. Both were a few percentage points higher than Opus 4.7 in its native Claude Code harness or GPT-5.5 in its native Codex harness. The functionality test was even more stark: Opus jumped from 87.2% to 91.1% by switching to Cursor, while GPT-5.5 went from 61.5% to 87.2%. Alex Volkov from ThursdAI confirmed similar results on the WolfBench coding benchmark with an entirely different setup.

Endor Labs GPT-5.5 Sets a New Code Security Record with Cursor, not Codex, in Agent Security League
Alex Volkov (X) WolfBench AI results comparing harnesses

What People Are Already Building
Cursor agents in Gmail, in Chrome plugins, watching their own browser windows.
With the SDK just launching yesterday, people are already building MVPs. Jack Driscoll showed off a Cursor agent embedded directly into Gmail — share an email into chat, have the agent read the thread into context, edit code, fix problems, stream results back into the chat window. As he put it, the biggest difference is that the Cursor SDK isn't just calling an LLM with tools — it's exposing the same coding agent runtime Cursor already uses (repo context, edit, search terminal workflow, streaming status, model choice, locally hosted execution). Tejas Haveri built a bug-catching agent that works on his production code base and can see how the app is performing in its own browser window — a potentially big step toward fully autonomous coding agents, given that human verification is currently a massive bottleneck. Robert Bouschery embedded a Cursor agent in a Chrome plugin for IT triage, helping non-technical users dump code from the browser into a ticket instead of describing the bug and hoping for the best.

Jack Driscoll (X) Demo of Cursor agent embedded in Gmail
Jack Driscoll (X) On what makes the Cursor SDK different from a generic LLM-with-tools wrapper
Eric Zakariasson (X) Roundup thread of MVPs being built on the SDK
Tejas Haveri (X) Bug-catching agent that watches the app in its own browser window
Tejas Haveri (X) "It's mainly about closing the feedback loop"
Robert Bouschery (X) Cursor agent embedded in a Chrome plugin for IT triage

How to Try It (Even If You're Not a Developer)
Drop the cookbook into Claude or ChatGPT and see what unlocks.
If you want to check this out for yourself but you're not a developer, go to the Cursor announcement, click through to their GitHub cookbook, drop it into either Claude or ChatGPT (especially with context about your particular projects), and ask it to give you ideas for how this new harness-as-a-service product could change the way you build or think about something. Then when you realize it unlocks something you've been wanting to do forever, you have full permission to call in sick on Friday and dive all the way in.

Harness-as-a-Service

April 30, 2026 · Episode Links & Takeaways

HEADLINES

Big Tech Earnings: AI Demand Is Unquestionable

Google Wins the Night

Amazon and AWS Find Their Stride

Microsoft's Perfectly Average Quarter

Meta Posts a Record Quarter, Stock Tanks Anyway

MAIN STORY

Harness-as-a-Service

AGENT BUILDERS KIT