Microsoft AI CEO Mustafa Suleyman says the AI industry’s next chapter won’t be written by whoever builds the smartest model. It’ll be written by whoever can afford to run one at scale. And right now, that’s a very short list. In a post on X, Suleyman laid out a sharp, economics-first thesis—arguing that inference compute scarcity, not model intelligence, will define winners and losers for the next two to three years. The companies with the margins to buy tokens pull ahead. Everyone else gets rationed out.“For the next couple years at least, the entire AI industry is going to be defined by this fact: demand is going to wildly outstrip supply, and so what matters is which companies / products have margin to pay for tokens,” he wrote. The products that can pay, he added, will improve fastest—because lower latency drives retention, retention generates data, and that data spins a flywheel of model improvement and adoption.
Why inference compute, not AI model training , is the real bottleneck in 2026
Suleyman’s argument flips the dominant AI narrative. For years, the industry obsessed over training bigger foundation models. But the acute crisis in 2026 is on the serving side—running those models for millions of users in real time.Inference workloads now eat up roughly two-thirds of all AI compute spending, per Deloitte’s 2026 TMT Predictions. GPU lead times have stretched to nearly a year. High-bandwidth memory from major suppliers is sold out through 2026. And of the 16 GW of global data-centre capacity slated for this year, only about 5 GW is actually under construction—the rest remains announcements on paper.
How Mustafa Suleyman’s AI ‘flywheel’ gives high-margin products a compounding edge
This scarcity is where Suleyman’s flywheel logic takes over. Products with fat gross margins—enterprise legal tools, healthcare SaaS, Microsoft 365 Copilot—can absorb premium inference costs. That buys them lower latency. Lower latency keeps users coming back. Returning users generate rich, proprietary workflow data. That data fine-tunes and improves models. Better models drive more adoption and revenue. Repeat, faster each cycle.Suleyman has used this exact framing before—at the October 2024 IA Summit, he said the winners in vertical AI would be those who “nailed the fine-tuning loop” and got their data flywheel spinning. Microsoft’s own numbers back it up: paid Copilot seats hit 15 million in Q2 FY2026, up 160% year-on-year, though still just 3.3% of the 450 million M365 commercial user base.
Consumer AI apps and low-margin AI startups face a token rationing problem
The uncomfortable corollary is that consumer AI apps and cash-strapped startups face a squeeze. Without the margins to buy premium inference, they get slower responses, weaker retention, and a flywheel that never starts spinning.
Poll
Which type of AI applications do you believe will struggle the most due to token rationing?
Some in the thread pushed back—arguing intelligence-per-dollar matters more, or that open-source and on-device models could crash inference costs entirely. But Suleyman’s bet is clear and well-funded. With Microsoft pouring over $80 billion a year into AI infrastructure, he’s banking on the idea that for the next couple of years, the business that can pay for tokens wins the intelligence race first.