- Most AI GPUs run at shockingly low utilization throughout manufacturing programs
- Firms are paying for twenty occasions extra GPU capability than wanted
- Overprovisioning is rising sharply as an alternative of enhancing 12 months after 12 months
Firms throughout the tech trade are racing to purchase large quantities of AI infrastructure, however most of it does barely any helpful work in any respect.
A report from Solid AI, primarily based on tens of 1000’s of Kubernetes clusters throughout AWS, Azure, and GCP, discovered that common GPU utilization sits at simply 5%.
Many groups deploy refined AI instruments to handle their purposes, but those self same instruments should not used to optimize the underlying infrastructure.
Article continues beneath
You could like
The numbers are getting worse, not higher
Organizations pay for roughly 20x extra GPU capability than their workloads truly use at any given second.
The numbers come from direct measurements of manufacturing clusters and thousands and thousands of compute assets earlier than any optimization was utilized.
“That is the third 12 months we have printed this report. The numbers are worse,” mentioned Laurent Gil, co-founder and President of Solid AI. “CPU utilization fell to eight%, down from 10%. Reminiscence dropped from 23% to twenty%.”
The report additionally measured one thing referred to as overprovisioning, which is the hole between what workloads really need and what groups allocate to them.
CPU overprovisioning rose from 40% to 69% 12 months over 12 months, whereas reminiscence overprovisioning now stands at 79%.
This implies organizations reserve practically twice as many CPU assets and 4 occasions as a lot reminiscence as their workloads truly eat.
Briefly, organizations pay for infrastructure that their workloads don’t even request, and the pattern is accelerating as an alternative of enhancing.
What to learn subsequent
The scenario will get much more costly when evaluating CPU and GPU prices instantly. A CPU core sitting idle prices solely cents per hour, however a GPU sitting idle prices {dollars} per hour.
For the primary time since EC2 launched in 2006, GPU costs are rising as an alternative of falling.
In January 2026, AWS raised H200 Capability Block costs by 15%, citing provide and demand, which broke a two-decade precedent.
“At 5% utilization, the maths would not work,” the report states. The hoarding intuition is smart as a result of lead occasions are lengthy, but that very same hoarding feeds the shortage loop that drives costs even greater.
Not each cluster performs this badly, and one group hit 49% utilization on H200s and 30% on H100s, properly above the 5% common.
The distinction comes right down to automation slightly than luck or higher {hardware}. The instruments to repair this exist already, together with automated rightsizing, GPU sharing or time slicing, and Spot administration.
Nonetheless, most groups by no means get there as a result of overprovisioning feels safer than operating out of capability, however that security comes at a steep worth.
The groups that closed the hole stopped treating useful resource effectivity as a guide, one-time process and began treating it as an automatic, steady course of.
However Solid AI knowledge reveals that almost all firms appear prepared to maintain paying massive charges slightly than change their habits.
Comply with TechRadar on Google Information and add us as a most popular supply to get our skilled information, evaluations, and opinion in your feeds.

