Why They Rebuilt the CPU
NVIDIA and Microsoft rebuilt the CPU this month. Not the GPU. The CPU.
If you’d asked anyone a year ago which component these two would re-architect together, in the same week, in different cities, the answer would have been the GPU. The GPU is where the money is. Instead they both went and rebuilt the boring part.
And they rebuilt it for the same strange reason. Single-threaded latency.
Single-threaded latency is a metric the cloud era did not care about. The whole point of the cloud was throughput. You rented cores by the hour, ran lots of things in parallel, and if any one thing took fifty milliseconds longer, who noticed. Humans don’t notice fifty milliseconds.
Agents do.
When two competitors who don’t coordinate independently rebuild the same component for the same reason, the workload underneath has actually changed. Nobody benefits from making this up. They wouldn’t have done the work if they hadn’t already seen, in their data, what the workload now looks like.
The workload is agents. Programs that reason, plan, act, use tools. Picture one waiting for a database query while a GPU sits idle next door. That isn’t a small inefficiency. That’s the whole pricing model of compute being quietly rewritten, while most people are still watching the part that already changed.
If you want to know what’s happening in computing, ignore the GPU. Watch the CPU.
I think you can see what’s happening at a deeper level if you ask what the unit of computing is.
For sixty years, the unit was the application. You wrote one, sold it, ran it. Operating systems were scaffolding for applications. CPUs were tuned to run them. Then for fifteen years the unit was the service. Software became something you rented. Uptime, scale, the monthly invoice. Now the unit is something else. An agent. Not the application you use, but a process that uses applications on your behalf.
Once you accept the agent as the unit, almost everything downstream has to change. Including the CPU. Especially the CPU.
There’s a second thing worth noticing. Both companies now talk about data centers the way oil companies talk about wells. Not metaphorically. Performance per watt. Gigawatts. Cooling loops. Negotiating with the local power grid before they break ground.
This is a different industry than it was three years ago. The binding constraint used to be silicon. Now it’s electricity. Whoever turns watts into tokens most efficiently owns the next decade.
You don’t usually notice the new binding constraint until it’s already binding. Oil companies thought the constraint was finding oil. Then refining it. Then distribution. Then politics. Each transition felt sudden afterwards and was obvious in advance only to people who happened to be paying attention. We’re at one of those moments.
Then the third thing.
Both companies released frontier-grade open models. NVIDIA released Nemotron 3 Ultra with open weights and open training scripts. Microsoft released seven models, one of them competitive with the best closed systems. They’re partners with the labs whose products they’re undercutting. Nobody on either stage acknowledged the awkwardness, because they didn’t have to. The release was the acknowledgment.
When competitors who could afford to keep their models closed start opening them, the model layer has stopped being the moat. It’s becoming what the operating system became around 2005. Important. Hard. Not where the margin lives.
So where does the margin live next.
The two CEOs disagree, and the disagreement is the story.
NVIDIA thinks the margin lives in infrastructure. Building an AI factory that costs fifty billion dollars and works on day one is so hard that whoever absorbs the complexity captures the toll. Chips, racks, networking, cooling, power delivery, the software stack that makes it all behave like one computer. Everyone above pays rent.
Microsoft thinks the margin lives in everything around the model. Context. Grounding. Identity. Governance. The environments where models practice your specific work. As the model commoditizes, the differentiator becomes whose knowledge it touches, whose evals grade it, whose policies it obeys. The moat is the thing the model touches.
Both could be right at once. They’re describing different layers of the same stack. The harder question, which neither CEO addressed, is which moat compounds faster. Infrastructure moats are deep but slow. They take capital, time, execution discipline that’s hard to fake. Environment moats are shallow but compound on usage, the way Google search did, the way social networks did.
There’s a pattern in business history worth keeping in mind. When a stack is unstable, the companies that own the most layers absorb the most option value. When the stack stabilizes, the integrated players lose to a modular ecosystem. Ford built the whole car when nobody else could build a good one. Toyota beat Ford by owning the system. Apple beat the modular PC industry once phones got too complicated to assemble from parts. Each cycle takes about fifteen or twenty years, and at the inflection point everyone is sure the current configuration is permanent.
We’re deep in an unstable phase. Both companies are vertically integrating in opposite directions. NVIDIA is moving up the stack, from chips to systems to data centers. Microsoft is moving down, from applications to models to silicon. They’ll meet somewhere in the middle in about three years. Whoever meets the customer at that meeting point owns the next era. That isn’t a prediction. It’s arithmetic. Two companies expanding toward each other with enormous capital and no obvious boundary.
One more thing.
Both keynotes treated security and governance as architecture. Confidential computing on every GPU. Execution containers in the operating system. Identity layers for agents. None of this was there a year ago. It’s here now because agents do things, not just say things, and software that acts in the world has different liability than software that talks about it.
Both companies have quietly accepted that the agent era doesn’t reach scale unless the governance layer holds. Which is more grown-up than most of the discourse, which is still arguing about whether models should be allowed to write poems.
Nobody actually knows where the margin will live in five years.
The history of computing is the history of confident predictions about where the margin would live, and being wrong. PCs were going to commoditize hardware. Apple proved hardware was the margin. Cloud was going to commoditize infrastructure. NVIDIA proved infrastructure was the margin. AI was going to commoditize models. Now everyone is making the case that the margin is somewhere else, and they’re each making a different case, and they’re each partly right.
When two of the largest companies in the world disagree this clearly about where the value will accrue, picking a side is the wrong move. The disagreement is the information. The layer they’re fighting over is the layer that hasn’t been decided yet.
The decided layers are boring. The undecided layer is where the next ten years happens. Right now the undecided layer is the agent itself. What it is. Where it runs. Who owns its identity. How it gets better. Which of its parts is the actual moat.
Both companies have a thesis. Neither has a proof. The proof will come from whoever builds something on top of all this that nobody expected.
