- GPUs deal with prefill operations by changing prompts into key-value caches
- SambaNova RDUs generate tokens at excessive throughput and low latency
- Intel Xeon 6 processors handle workload distribution and execute compiled code
Intel and SambaNova Methods have launched a joint {hardware} blueprint combining GPUs, SambaNova RDUs, and Intel Xeon 6 processors for large-scale inference workloads.
The system assigns GPUs to prefill operations, RDUs to decoding, and Xeon CPUs to execution and orchestration duties throughout agent-driven environments.
“Agentic AI is transferring into manufacturing — and the profitable sample we’re seeing is GPUs to begin the job, Intel Xeon 6 to run it, and SambaNova RDUs to complete it quick,” mentioned Rodrigo Liang, CEO and co-founder of SambaNova Methods.
Article continues beneath
You could like
CPU is the execution and management layer
This design is scheduled to be out there within the second half of 2026 for enterprises, cloud suppliers, and sovereign deployments.
The structure locations Intel Xeon 6 processors on the middle of system management, the place they handle workload distribution, execute code, and coordinate instrument interactions.
It contains dealing with compilation, validating outputs, and sustaining communication between simultaneous processes.
“When hundreds of simultaneous coding brokers are producing instrument calls, retrieval requests, code builds, and encrypted inter-agent messages, the CPU shouldn’t be a background part — it’s the system’s govt and motion layer,” mentioned Harry Ault, CRO of SambaNova.
The assertion defines the CPU as the first layer accountable for system conduct quite than a supporting part.
In response to SambaNova, Xeon 6 delivers greater than 50% sooner LLVM compilation occasions in contrast with Arm-based server CPUs.
It additionally delivers as much as 70% sooner vector database efficiency in contrast with different x86-based techniques.
What to learn subsequent
These figures relate to execution velocity inside coding and retrieval workflows, and on this configuration, GPUs course of the prefill stage by changing prompts into key-value caches.
SambaNova RDUs function because the decoding layer, producing tokens at excessive throughput and low latency.
Xeon 6 processors operate as each host CPUs and execution engines, managing system-level operations and working compiled workloads.
“Manufacturing inference is transferring towards heterogeneous {hardware} — no single chip kind is perfect for each stage of an agentic workflow,” mentioned Banghua Zhu, co-founder and CTO at RadixArk.
He added that combining RDUs with Xeon CPUs permits techniques to keep up compatibility with current software program environments.
The system is designed to run inside current air-cooled information facilities with out requiring new builds.
In response to the businesses, this enables scaling of inference workloads with out extra pressure on water and vitality sources.
As Nvidia and Groq proceed to give attention to enhancing inference throughput and latency, this announcement provides a layer of competitors.
It gives another strategy that distributes workloads throughout a number of {hardware} layers quite than counting on a single processing mannequin.
Comply with TechRadar on Google Information and add us as a most popular supply to get our knowledgeable information, opinions, and opinion in your feeds. Make certain to click on the Comply with button!
And naturally you may as well observe TechRadar on TikTok for information, opinions, unboxings in video type, and get common updates from us on WhatsApp too.

