Elon Musk shows off Cortex AI supercluster — first look at Tesla's 50,000 Nvidia H100s

1 month ago 15
Elon Musk inside the Giga Texas AI supercluster, Cortex.
(Image credit: Elon Musk via X)

Elon Musk’s supercomputing exploits continue to press forward this week, as the technocrat shared a video of his newly renamed “Cortex” AI supercluster on X. The recent expansion to Tesla’s “Giga Texas” plant will contain 70,000 AI servers and will require 130 megawatts (MW) of cooling and power at launch, upscaling to 500 MW by 2026.

Video of the inside of Cortex today, the giant new AI training supercluster being built at Tesla HQ in Austin to solve real-world AI pic.twitter.com/DwJVUWUrb5August 26, 2024

Musk’s video of the Cortex supercluster shows off the in-progress assembly of a staggering number of server racks. From the fuzzy video, the racks seem to be laid out in an array of 16 compute racks per row, with four or so non-GPU racks splitting the rows. Each computer rack holds 8 servers. Somewhere between 16-20 rows of server racks are visible in the 20-second clip, so rough napkin math estimates 2,000 GPU servers can be seen, less than 3% of the estimated full-scale deployment. 

Musk shared in Tesla’s July earnings call that the Cortex supercluster will be Tesla’s largest training cluster to date, containing “50,000 [Nvidia] H100s, plus 20,000 of our hardware.” This is a smaller number than Musk previously shared, with tweets from June estimating Cortex would house 50,000 units of Tesla’s Dojo AI hardware. Previous remarks from the Tesla CEO also suggest that Tesla’s own hardware will come online at a later date, with Cortex expected to be solely Nvidia-powered at launch. 

The Cortex training cluster is being built to “solve real-world AI,” per Elon’s Twitter. In Tesla’s Q2 2024 earnings call, this means training Tesla’s Full Self Driving (FSD) autopilot system for Tesla—which will power consumer Teslas and the promised “Cybertaxi” product—and training AI for the Optimus robot, an autonomous humanoid robot expected to begin limited production in 2025 to be used in Tesla’s manufacturing process. 

Cortex first turned heads in the press thanks to the massive fans under construction to chill the entire supercluster, shown off by Musk in June. The fan stack cools the Supermicro-provided liquid cooling solution, built to handle an eventual 500 MW of cooling and power at full power. For context, an average coal power plant may output around 600 MW of power.  

We're nothing without our fans.

(Image credit: Elon Musk via X)

Cortex joins Elon Musk's stable of supercomputers in development. So far, the first of Musk's data centers to become operational is the Memphis Supercluster, owned by xAI and powered by 100,000 Nvidia H100s. All of Memphis' 100,000 servers are connected with a single RDMA (remote direct memory access) fabric, and are likewise cooled with help from Supermicro. Musk has also announced plans for a $500 million Dojo supercomputer in Buffalo, New York, another Tesla operation. 

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

The Memphis Supercluster is also expected to upgrade its H100 base to 300,000 B200 GPUs, but delays on Blackwell's production due to design flaws have pushed this massive order back by several months. As one of the largest single customers of Nvidia AI GPUs, Musk seems to be following Jensen Huang's CEO math: "The more you buy, the more you save." Time will tell whether this rings true for Musk and his supercomputer collection.

Dallin Grimm is a contributing writer for Tom's Hardware. He has been building and breaking computers since 2017, serving as the resident youngster at Tom's. From APUs to RGB, Dallin has a handle on all the latest tech news. 

Read Entire Article