Elon Musk is doubling the world's largest AI GPU cluster — expanding Colossus GPU cluster to 200,000 'soon,' has floated 300,000 in the past

15 hours ago 2
Four banks of xAI's HGX H100 server racks, holding eight servers each.
(Image credit: ServeTheHome)

Billionaire Elon Musk has taken to Twitter / X to boast that his remarkable xAI data center is set to double its firepower “soon.” He was commenting on the recent video exposé of his xAI Colossus AI supercomputer. In the highlighted video, TechTuber ServeTheHome was stunned when he saw the gleaming rows of Supermicro servers packed with 100,000 state-of-the-art Nvidia enterprise GPUs.

So, the xAI Colossus AI supercomputer is on course “Soon to become a 200k H100/H200 training cluster in a single building.” Its 100,000 GPU incarnation, which only just started AI training about two weeks ago, was already notable. While we think “soon” might indeed be soon in this case. However, Musk’s prior tech timing slippages (e.g., Tesla's full self-driving, Hyperloop delays, SolarCity struggles) mean we should be generally cautious about his forward-looking boasts.

The xAI Colossus has already been dubbed an engineering marvel. Importantly, praise for the supercomputer’s prowess isn’t limited to the usual Musk toadies. Nvidia CEO Jensen Huang also described this supercomputer project as a “superhuman” feat that had “never been done before.” xAI engineers must have worked very hard and long hours to set up the xAI Colossus AI supercomputer in 19 days. Typically, projects of this scale and complexity can take up to four years to get running, indicated Huang.

Soon to become a 200k H100/H200 training cluster in a single building https://t.co/2YvdmqXp1WOctober 28, 2024

What will the 200,000 H100/H200 GPUs be used for? This very considerable computing resource will probably not be tasked with making scientific breakthroughs for the benefit of mankind. Instead, the 200,000 power-hungry GPUs are likely destined to train AI models and chatbots like Grok 3, ramping up the potency of its machine learning distilled ‘anti-woke’ retorts.

This isn’t the hardware endgame for xAI Collosus hardware expansion, far from it. Musk previously touted a Colossus packing 300,000 Nvidia H200 GPUs throbbing within.

At the current pace of upgrades, we could even see Musk Tweeting about reaching this 300,000 goal before 2024 is out. Perhaps, if anything delays ‘Grok 300,000,’ it could be factors outside of Musk’s control, like GPU supplies. We have also previously reported that on-site power generation had to be beefed up to cope even with stage 1 of xAI's Colossus, so that’s another hurdle - alongside complex liquid cooling and networking hardware.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

Read Entire Article