Intel Bets You'll Stack Cheap Gpus To Avoid Spending Top Dollar On Nvidia Pros

Computex When it comes to AI accelerators, Intel isn’t very competitive, and its newly announced Battlemage workstation cards don’t do much to change that. But at least they’re cheap. Really cheap.

For the purposes of AI, we can mostly ignore the $299 Intel Arc B50, which is positioned as a more traditional workstation GPU for graphics-intensive workloads.

But the x86 giant is pushing the more performant (and power-hungry) B60 for both graphics and AI inference applications. It hasn’t set an official price yet, but Vivian Lien, vice president and general manager of client graphics at Intel, expects the cards will account for about $500 of the overall cost of a PC. On the open market, we suspect real-world pricing will come in a bit higher than that, though.

On paper, the B60 falls somewhere between Nvidia’s RTX 4000 Ada and 4500 Ada Generation GPUs, which currently cost between $1,250 and $2,400.

But if AI inference is your primary focus, you’re probably looking at something like Nvidia’s RTX Pro 6000 workstation cards announced back in March at GTC, which boast roughly 4.5-5x higher INT8 performance, and 4x the memory capacity and bandwidth of Intel’s B60. Those cards are currently retailing in the neighborhood of $8,565 each. That means the B60 would come in at about 1/17th the price of a typical inference-focused Nvidia GPU.

Here’s a brief rundown of how the two compare. We’ve included the B50 in there just for reference.

	Arc Pro B50	Arc Pro B60	RTX Pro 6000
Memory Capacity	16 GB	24 GB	96 GB
Mem Bandwidth	224 GB/s	456 GB/s	1,792 GB/s
INT8 Perf	170 TOPS	197 TOPS	877-1,007 TOPS
FP4 Perf	NA	NA	1,755-2,015 TFLOPS
TDP	70W	120W-200W	300W-600W
Price	$299	~$500	~$8500

Note: all performance figures given are for dense integer / floating point performance without sparsity enabled.

Competition through parallelism

As you can see, on its own the B60 can’t hold a candle to Nvidia’s latest workstation cards. But, if you cram four of them into a workstation chassis, you’re at least in the same ballpark. And that’s exactly what Intel expects customers to do. In fact, it envisions systems with as many as eight of these chips on board. Intel is calling this concept Project Battlematrix.

Intel’s Project Battlematrix promises to cram up to eight cheap and cheerful Arc Pro B60s into a box for 192GB of vRAM and 1.5 petaOPS of compute – Click to enlarge

For an eight-GPU system, you’re looking at about 1.5 petaOPS of dense INT8 performance, 192GB of vRAM, and 3.6TB/s of aggregate memory bandwidth. More importantly, assuming Lien’s $500-per-card estimate actually plays out, you’re looking at about $4,000 worth of GPUs. Even if the cards end up being closer to $750 apiece, that’s still far cheaper than buying a pair of RTX Pro 6000s.

In fact, you could conceivably rack up two eight-GPU Intel systems for less than the price of one dual-GPU Nvidia workstation.

We say rack because if you plan to deploy more than one of these, at least in the US, a single system would already be pushing the limits of a 15 amp circuit (1,800W). In such a configuration, we expect you’ll need to adjust the TDP of the B60s to something closer to 120W to avoid tripping a breaker under load.

Each individual B60 consumes less power (120W-200W) than the RTX Pro 6000 workstation cards (300W-600W), but they’re also less power efficient. This is especially true when you take into consideration that Nvidia’s latest chips offer native support for 4-bit data types, which the B60 does not.

While an RTX Pro 6000 may be 4.5x-5x faster at INT8, it’s closer to 9-10x faster at FP4, and 18x-20x faster if you can take advantage of sparsity.

But for an inference workstation, that might not be as big a deal as you think. That’s because, as impressive as 4 petaFLOPS of sparse FP4 might sound, inference workloads tend to be memory-bandwidth-bottlenecked as opposed to performance limited. Inference doesn’t really benefit from sparsity, either.

More compute can be helpful during the inference prefill stage, when the model is processing your prompt. However, this is most noticeable for workloads like summarizing a report, or systems that need to serve large quantities of concurrent requests.

What’s more, the B60 might not natively support 4-bit datatypes, but it can still run many 4-bit quantized models. For example, Llama.cpp, which popularized GGUF quantization, has supported Intel GPUs via Sycl for at least a year now. Just because the weights are stored at lower precision doesn’t mean the activations — the compute heavy bit — have to be.

While Intel is mostly marketing the B60 as an inference card, they’re still GPUs and could just as easily be used for model fine-tuning — a topic we’ve explored in detail before.

Multi-user environments

Having multiple GPUs in a box also presents a couple of unique opportunities for deployment, particularly in a lab environment, where you might have multiple users sharing resources.

While there are numerous ways of sharing GPU resources, like temporal slicing and resource partitioning, each comes with its own drawbacks. When you’ve got eight GPUs in a box, you could have eight different users each running their own workloads on a dedicated GPU.

We’re told Intel is currently working on adding SR-IOV support. The tech enables a PCIe device to appear as multiple virtual devices and should vastly simplify the process of partitioning GPU resources or just passing them through to virtual machines.

Technically, this is also possible on Nvidia’s RTX Pro graphics cards – at least on the 6000-series models anyway. Each chip supports Nvidia’s multi-instance GPU partitioning tech, which allows you to split up the chip into either two 48GB partitions or four 24GB partitions.

However, unlocking that tech may require an Nvidia vGPU license on top of the chips’ already high price.

Stepping up its software game

Multi-GPU systems do involve some compromises. Speeds and feeds don’t always translate into real-world performance if the software isn’t up to snuff.

But the software situation around Intel’s Xe graphics architecture has improved steadily over the past year. And while getting workloads to run across multiple GPUs can introduce complexity, it’s a pretty well understood problem at this point.

With the launch of its B60 platform, Intel has committed to further improvements to the software ecosystem, including the introduction of pre-baked container images, which will ship with everything you need to get a particular framework up and running. As we understand it, vLLM will be among the first container environments offered, but we may also see containers for Ollama and Llama.cpp before long.

As we’ve previously explored, these kinds of container environments can make deploying AI workloads and development environments considerably easier, as rather than having to wrangle various dependencies, package managers, and then sort out bugs in version support, this work can be automated and deployed in a sandboxed environment.

These container environments are expected to start rolling alongside the cards in Q3, with SR-IOV, virtual desktop infrastructure, and other management functionality rolling out in the fourth quarter. ®

Original Source

A considerable amount of time and effort goes into maintaining this website, creating backend automation and creating new features and content for you to make actionable intelligence decisions. Everyone that supports the site helps enable new functionality.

If you like the site, please support us on “Patreon” or “Buy Me A Coffee” using the buttons below

Buy Me A Coffee

Patreon

To keep up to date follow us on the below channels.

Tags: cybersecurity, OSINT, threatintel