The Harness Era
What AI agents actually are, why the wrapper matters more than the model, and where crypto fits
For the past three years, every user interaction with AI started the same way. Someone typed a question. A model answered. The user typed a follow up. The chatbot was useful, and it taught a billion people what large language models could do. It is also the version of AI we are about to leave behind.
The next phase will not feel like asking a model to explain something. It will feel like giving a system a goal and watching it gather context, use tools, take steps, check its own work, and come back with something done.
That is the shift from chatbots to agents.
An AI agent takes a goal, picks the next step, calls a tool, observes the result, and keeps going until the task is complete or it hits a stopping point. A chatbot answers prompts. An agent works toward an outcome. If you ask a chatbot how to book a flight, it explains the steps. If you ask an agent to book the flight, it searches options, applies your preferences, pauses for approval at the right moment, and completes the booking.
Agentic tooling has already eaten software development. It got there first because coding has the tightest feedback loop in knowledge work: edit code, run tests, read errors, try again. The labs also poured enormous effort into coding because better coding agents help them build the next generation of AI. The same pattern is now spreading into research, sales, operations, legal, and finance; anywhere the work is repetitive, information-heavy, and reversible.
What a harness is and why it now matters more than the model
A large language model on its own is a prediction engine. Input goes in, output comes out. The harness is everything around the model that turns it into a worker. Jensen Huang put it directly in a recent conversation with Michael Dell: these systems “put a harness around the large language model so that it can access memory, access the network, use tools, have local scratchpad memory, working memory, access long-term memory. And so that harness basically turns the brain into an agent, into a digital robot.”
The simplest version is a loop. The user gives a goal. The model decides what to do next. The harness checks whether the action is allowed. A tool runs. The result goes back into the model’s context. The model decides again. The loop continues until the task is finished, the agent gets stuck, or a limit is reached.
The agent loop. The model proposes: the harness gates the action; the tool runs; the result feeds back. Continues until the task completes, the agent gets stuck, or a limit is reached.
It sounds basic, and the structure is basic, but this is where most of the leverage in AI now lives. The model supplies the reasoning. The harness supplies the working environment: memory, files, tool access, permissions, context management, logging, and the rules about when to pause for human approval. The harness chooses what information the model sees on each step. It exposes web search, databases, code execution, file editing, email, calendars, internal docs, and APIs. It enforces guardrails and stops the model before it does something risky.
The harness wraps the model in concentric layers. The model is replaceable; the layers around it, tools, memory, permissions, observability, are where the durable value sits.
A subagent pattern usually sits on top of this. A parent agent assigns one subagent to research a market, another to inspect a codebase, another to verify the final answer. Each subagent gets a narrow task, a restricted toolset, and its own context. The parent collects results and makes the final call. This is great when the work separates cleanly — research, verification, data gathering, testing, document review. It works less well when each step needs the full picture, because coordination costs creep up fast.
An important observation, increasingly shared by people who build with these systems: a well-designed harness running on a mid-tier model usually beats a frontier model running on a thin scaffold. In practice, well‑designed coding harnesses have routinely beaten weaker setups, even when both run the same underlying model, and in some cases have matched or beaten raw frontier models on SWE‑bench‑style tasks just by improving retrieval and scaffolding. The value in new model releases now lives in how the model and the harness work together.
The underlying capability is moving fast. METR’s “time horizon” research measures how long a task an AI agent can complete autonomously. The doubling time on that metric has accelerated from about seven months to roughly four months in the post-GPT-4o era. Models are getting more capable at sustained work, but the work itself only happens inside a harness that gives them somewhere to do it. This reframes where the value sits.
The model has to be swappable
If the harness is where the leverage lives, the most important property of a harness is that the model is a replaceable component.
Different models are good at different things. Some are better at code. Some are better at long reasoning. Some are better at images, video, voice, design, or fast low-cost classification. Some are open source and run locally on a laptop. Others are frontier systems that cost more but handle the hard work. A good harness should route a request to the model that fits the job.
Use a small local model for summarization, tagging, formatting, search cleanup, or routine drafts. Use a strong frontier model for complex reasoning, planning, contract review, or anything where the cost of being wrong is high. Use a specialist for media generation. Use a cheaper model when latency and cost matter more than depth. Use a private local model when the data cannot leave the building.
The reason this matters more for agents than for chatbots is volume. A chat exchange is a handful of tokens. An agent that researches, plans, calls tools, and verifies its own work can burn through hundreds of thousands of tokens in a single task. Flat consumer subscriptions worked fine for chat. They will not survive agent workloads. Anthropic has already started repricing programmatic usage, pulling back the deep subsidies that made power users’ bills look like nothing, on the grounds that compute supply is structurally short. Expect the other labs to converge to the same model once the rest of the industry has burned through its venture-funded subsidy phase. Cost-aware routing inside a harness goes from nice-to-have to required infrastructure.
There is a second reason model flexibility matters: nobody knows which models win.
DeepSeek showed how quickly the landscape can shift. A model that looked dominant in one quarter was matched by an open-weight competitor for a fraction of the price in the next. That kind of jump will keep happening. Building a workflow that depends on a single provider for cost, capability, latency, or privacy is a bet that the current ranking holds. It usually does not.
The right architecture is the one where the harness owns everything durable, the context, memory, files, permissions, workflows, tool access, user identity, and the model is the thing you can swap when something better shows up next quarter.
Why this matters for crypto
If agents become a primary interface for AI, they will need to transact. Researching the world is one thing. Acting on it requires moving money, accepting payments, paying for APIs, subscribing to services, settling between counterparties. Once agents move from answering questions to taking actions, they stop being only software tools and start becoming economic actors. That is where the payment layer becomes important.
The current internet does not have great rails for software-initiated payments. Cards assume a human at the checkout. Bank rails assume a treasury operator. Neither is built for an agent paying $0.02 for a single API call at 3 a.m.
Crypto already has those rails. Stablecoins are programmable, internet-native, and increasingly accepted across payment infrastructure. That makes them the default settlement layer for agent-initiated economic activity. Not because of ideology but because they are the only thing that fits the shape of the problem.
Three concrete predictions for the next 12 to 18 months:
Agent wallets become a product category. Stripe, Coinbase, and Visa have all shipped agent-payment SDKs in the last year. Stripe in particular is building the wiring for agentic commerce: one-time-use payment credentials for bots, partnerships with Cloudflare so an agent can autonomously buy and configure a domain, stablecoin infrastructure for microtransactions. The next step is purpose-built wallets with spending limits, approval rules, role-based permissions, revocation, and audit logs. A company will need to know which agent spent money, why, and whether the spend stayed inside policy.
Stablecoin volume from non-human accounts becomes meaningful. Small payments, API calls, cross-border settlement between agents, micro-payouts to data providers — the volume here is currently rounding-error, but it has the right shape to compound. By the end of 2026, “agent-initiated stablecoin transactions” will be a chart that exists and that people pay attention to.
Agent identity and reputation becomes a real problem. A wallet address is not enough. If agents transact with each other and with services, counterparties will want to know what they are dealing with — who deployed the agent, what permissions it has, what its history looks like, whether it is authorized to act on behalf of the human or company it claims to represent. Some of this gets solved with onchain attestations and verifiable credentials. Some of it gets solved by centralized registries. Most of it is unsolved today.
Beyond those three, the secondary effects are worth tracking. Agents managing operations could hedge currency exposure, route idle balances into conservative yield, buy targeted insurance through prediction markets, or price small risks the way no human treasurer would have the time to. DeFi primitives will pick up demand they were never originally designed for, programmatic counterparties that never sleep, never miss a margin call, and care about gas costs more than yield percentages.
The takeaway
Agentic AI has already changed software development. It is the second major wave of AI adoption, with the first being ChatGPT teaching the public what a language model is.
The chatbot interface helped people understand AI. The agent interface will help people use AI to get work done. The harness is what makes that possible without giving up control. It manages context, memory, tools, permissions, files, subagents, and model routing. It lets users and companies benefit from many models instead of betting everything on one.
That flexibility will matter more as the AI cost curve gets disciplined. Simple work moves to local and cheaper models. Sensitive work moves to private environments. Specialized work goes to specialized models. Hard problems keep their frontier compute, and the bill keeps showing up.
For crypto specifically, the near-term opportunity is plumbing. Wallets with controls. Stablecoin rails for agent-to-agent settlement. Identity and reputation for software counterparties. Permissioning that works at API speed instead of human speed.
The next phase of AI will be less about chatting with a model and more about directing systems of models that can act on your behalf. The winners will be the products that make that power useful, safe, and cheap enough to leave running while you sleep.



