DeepSeek AI has made its Fire-Flyer Fire System (3FS) parallel file system fully open-source this week, as part of its Open Source Week event. The disruptive AI company from China brags that 3FS can hit 7.3 TB/s aggregate read throughput in its own server data clusters, where DeepSeek has been using 3FS to organize its servers since at least 2019.
3FS is a Linux-based parallel file system designed for use in AI-HPC operations, where many data storage servers are being constantly accessed by GPU nodes for training LLMs. 3FS is unique from other file systems thanks largely to its almost singular prioritization of random read speeds above all else, and almost completely ignoring read caching.
When training AI models, compute units need to access random training data constantly, and reading this data is a one-time-only process. Therefore, a read cache is nearly useless and is largely done away with by 3FS. In fact, using the read cache when training LLMs may be potentially harmful; as LLMs are basically just super-tuned inference machines, reading the same data in the same order repeatedly has the potential to link completely different data as a set to the language model.
The team responsible for operating one of DeepSeek's deep learning clusters, Fire-Flyer 2, published this paper last August outlining using 3FS in the custom-built system. In Fire-Flyer 2, DeepSeek utilized 180 storage nodes, each loaded with 16 16TB SSDs and two 200Gbps NUCs. These nodes served 10,000 PCIe Nvidia A100 GPUs, built out in much cheaper servers than Nvidia's proprietary DGX-A100 products.
Across the whole array, DeepSeek claims it benchmarked 3FS's performance at 6.6 TB/s, while also running training tasks in the background that added an additional 1.4TB/s of read throughput. In comparison, competitor file system Ceph only reached speeds of 1.1 TB/s read throughput (on a server with 68 nodes, loaded with 10 16TB SSDs and 2 x 100 Gbps networking) for the first time in early 2024.
3FS was credited as a crucial part of DeepSeek's software stack for training DeepSeek AI in the above paper, as tested on the Fire-Flyer 2 HPC solution that achieved 80% of the performance of Nvidia's DGX-A100 server solution for 50% of the price and 60% of the power draw.
Those curious about trying out the Fire-Flyer File System and its random-read-forward style for AI-HPC solutions can find the full download on DeepSeek's Github page. We'd be surprised if this new open-source system does not become a hit for enthusiasts and enterprise AI-HPC users alike, though it may have to overcome some level of anti-Chinese tech fear to hit blockbuster status.