Was this newsletter forwarded to you? Sign up to get it in your inbox.
As I rode in my Uber to Microsoft’s annual Build conference on Monday, I fondly recalled a time when you could get anywhere in San Francisco for $5. Those days are long gone. Venture capitalists lost their appetite to supply unlimited funding in a viciously competitive market, and Uber needed to show a path to profitability ahead of its 2019 IPO.
There are signs that the “$5 Uber era” of LLMs is over now, too. AI labs are subsidizing subscriptions to the tune of thousands of dollars, which can’t continue forever. This year Anthropic, OpenAI, and SpaceXAI are all going public—and like Uber seven years ago, they’ll need to take a hard look at their books. On June 1, the eve of the event, Microsoft sparked outrage by switching to token-based billing on GitHub Copilot. Some users said their bills jumped from $39 to over $3,000 per month.
Rather than backtracking on billing, Microsoft used the conference stage in California to make the case for using AI more pragmatically in the face of rising costs. I came away from the event thinking that Microsoft is the first company to get real about a world where intelligence is available on tap, but constrained by how many coins you can put in the meter. Here is what the company’s vision looks like in practice, and what it might tell us about how we’ll be paying for and pricing AI in the future.
Intelligence on and off the meter: A product approach
In his opening speech, CEO Satya Nadella addressed pricing concerns head-on. He promised “unmetered intelligence to every desk and every home,” an AI-era update to Bill Gates’s vision of “a computer on every desk.”
The most tangible way to experience that vision is with the RTX Spark, a new laptop Microsoft designed for AI workloads with Nvidia. The device is able to run a medium-sized 128-billion-parameter model locally (frontier models are in the trillions of parameters) so developers can get a lot of work done without paying a penny for tokens. Microsoft is taking advantage of the fact that the leading open-source models like Kimi-K2.6, which have a trillion parameters, are too big to fit on most laptops, and is betting that budget-conscious coders might not mind being a year or two behind the frontier and use a smaller model. The device will be released in the fall.
The RTX Spark laptop follows earlier feature announcements that show that Microsoft wants to decrease switching costs for customers by being the place where you can use any model, agent, or harness. The laptop has a rebuilt smart terminal app that allows you to run any coding agent harness and has adopted popular terminal commands from the Mac ecosystem to make the shift easier for developers.
Even the GitHub Copilot Desktop app, also released at the conference, makes it easy to switch providers between OpenAI-built, Anthropic-built, and local open-source models running on your device.
When questioned about the affordability of agentic coding, Mario Rodriguez, GitHub’s chief product officer, cited the automatic model routing feature in GitHub Copilot, which can delegate less complicated tasks to cheaper models. In my interview with Kyle Daigle, GitHub’s chief operating officer, he lamented that developers tend to choose “the model of the day, or week, or hour,” even when the task doesn’t merit that kind of power. A person probably will not manually switch to a cheaper model for that final step, “but the tools could.” I’ve also long argued that not every task needs to be done by a frontier model.
I get the feeling the team built this model router feature for themselves after facing the same problem everyone else is right now—Microsoft itself has been cancelling Claude Code licenses to reduce costs.
Features like automatic model routing show that Microsoft understands how runaway costs hurt enterprises that need tighter control over spending. The AI labs won’t let large companies buy highly subsidized individual “Max” plans, so big companies end up paying full freight on every token they burn. One that wasn’t properly monitoring usage is rumored to have spent an eye-watering half a billion dollars on Claude tokens in a single month.
That wasn’t the only news that day: Microsoft’s research lab, led by Mustafa Suleyman, released a set of new (cheaper) smaller models spanning image, voice, transcription, coding, and reasoning.
Tackling costs through model optimization
But when you don’t use the latest models to save cost, there’s a higher risk of making a costly mistake. One answer was a phrase I heard over 100 times at the one-day event: “hill climbing.” This is the idea that you can set an evaluation metric for a task—a lookup to check your AI customer service bot is giving the right answers to common questions, for example—and then keep automatically testing new instructions until the smaller model gets an acceptable score.
This is the thesis behind optimization frameworks like GEPA in DSPy, Andrej Karpathy’s autoresearch, and Codex’s /goal feature. In this fashion, you can use a smart model to train a dumb one, a process called distillation, bringing down costs while maintaining reliability. In a podcast appearance recorded at the conference, Nadella called private eval benchmarks for use in hill climbing the “greatest IP” that a company could have.
Mistakes will still be made, even with frontier models, so Microsoft also focused on security. To make AI less likely to make costly mistakes, the company released MXC, or Microsoft eXection Containers, an operating system-level sandbox designed to securely contain untrusted code, plugins, and autonomous AI agents. In a surprise appearance, OpenClaw creator Peter Steinberger appeared on stage during a demo in which a team instructed their agent to delete all their files on their computer, only to be thwarted by the protections their IT department had put in place through MXC. The message was: “OpenClaw is safe for work now.”
To prove the point, Microsoft launched Autopilot, its take on hosted long-running agents, with the first (of many) agents Scout inspired by internal testing and experimentation. Autopilot runs agent frameworks like OpenClaw and Hermes Agent, but is hosted in a secure Microsoft environment, with access to all of the context available in your documents and applications. Executives also pointed to MDash, their multi-agent code review system, which, according to Nadella, caught bugs that even Anthropic’s Mythos model missed.
What Silicon Valley is missing
While many enterprise clients are trying to manage costs or struggling to measure return on investment, there will always be a thirst for the most expensive and highly performing frontier models among AI-pilled developers. And for those, we will need data centers. Nadella said the company will keep up its brisk pace of building, yet also acknowledged the societal cost better than any other leader I’ve seen in the space.
Tech Insider reported that half of announced U.S. data center capacity for 2026 has been cancelled or delayed, in part due to concerns about rising electricity prices and water usage. Nadella framed continued construction as something the tech industry needs to earn permission for, by committing to keeping electricity and water usage self-contained and by providing jobs and opportunities for local residents. A small protest against data centers formed at the entrance of the conference, and at one point I looked up and saw an airplane trailing a sign with the same anti-data center message.
Microsoft Build is normally held in Seattle, and hosting it in San Francisco allowed for a stark contrast between the token maxxing AI-pilled engineers and the relentlessly pragmatic enterprise leaders who are trying to get this technology to work in their companies. Richly compensated AI researchers spending hundreds of billions of free tokens per month aren’t living in the same world as a junior IT consultant for an enterprise healthcare company in Seattle––and that person swears by Microsoft’s products.
As I rode my Uber back to the airport at the end of the conference, I read a story about how Uber had capped its engineers’ token budget at a sensible $1,500 per month. If this represents about 10 percent of a typical Uber engineer’s salary, managers expecting to improve productivity by 10 times will make up the difference through more pragmatic usage.
Mike Taylor is the head of tech consulting at Every and a co-author of Prompt Engineering for Generative AI (O’Reilly). To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.
For sponsorship opportunities, reach out to sponsorships@every.to.

