Microsoft deploys world's first 'supercomputer-scale' GB300 NVL72 Azure cluster — 4,608 GB300 GPUs linked together to form a single, unified accelerator capable of 1.44 PFLOPS of inference

5 hours ago 5
Microsoft deploys GB300 NVL72 supercluster inside Azure
(Image credit: Microsoft / Nvidia)

Microsoft has just upgraded its Azure cloud platform with Nvidia's Blackwell Ultra, deploying what it calls the world's first large-scale GB300 NVL72 supercomputing cluster. This cluster includes several racks housing exactly 4,608 GB300 GPUs connected by NVLink 5 switch fabric, which is then interconnected via Nvidia’s Quantum-X800 InfiniBand networking fabric across the entire cluster. This allows a single NVL72 rack to have a total memory bandwidth of 130 TB/s, with each rack providing 800 Gb/s of interconnect bandwidth per GPU.

The world's first large-scale @nvidia GB300 NVL72 supercomputing cluster for AI workloads is now live on Microsoft Azure. The deployment connects 4,600+ NVIDIA Blackwell Ultra GPUs using next-gen InfiniBand network—built to train and deploy advanced AI models faster than… pic.twitter.com/CmmDtcrlwnOctober 9, 2025

At rack level, each NVL72 system is said to deliver 1,440 petaflops of FP4 Tensor performance, powered by 37 terabytes of unified “fast memory,” which breaks down to 20 TB HBM3E for the GPU, and 17 TB LPDDR5X for the Grace CPU. As mentioned earlier, this memory is pooled together using NVLink 5 to make each rack work as a single, unified accelerator capable of 130 TB/s of direct bandwidth. The memory throughput is one of the most impressive parts of the GB300 NVL72, so it’s important to understand how it works.

The Quantum-X800 InfiniBand platform enables each of the 4,608 intra-connected GPUs to have a bandwidth of 800 Gb/s at the rack-to-rack level. In the end, every single GPU becomes connected, across racks and within.

Microsoft deploys GB300 NVL72 supercluster inside Azure

(Image credit: Microsoft / Nvidia)

The GB300 NVL72 cluster is liquid-cooled, using standalone heat exchangers and facility loops designed to minimize water usage under intense workloads. Nvidia says Microsoft needed to reimagine every layer of its data center for this deployment, and Microsoft happily points out how this is just the first of many clusters to come that will spread GB300 across the globe, taking it to full hyperscale potential. OpenAI and Microsoft already use GB200 clusters for training models, so this serves as a natural extension of their exclusive partnership.

Nvidia itself is heavily invested in OpenAI, with both recently signing a Letter of Intent (LoI) for a major strategic partnership that will see the chipmaker pour $100 billion into OpenAI, progressively. On the other hand, OpenAI will use Nvidia GPUs for its next-gen AI infrastructure, deploying at least 10 gigawatts (GW) worth of accelerators, starting with Vera Rubin next year. So, this GB300 NVL72 supercluster can be looked at as a precursor, almost materializing that investment since Microsoft is the one deploying the cluster for OpenAI, using Nvidia hardware.

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.

Read Entire Article