- US - English
- China - 简体中文
- India - English
- Japan - 日本語
- Malaysia - English
- Singapore - English
- Taiwan – 繁體中文
“Just give me all the bacon and eggs you have."
Wait … I worry what you heard was, 'Give me a lot of bacon and eggs. ' What I said was, give me all the bacon and eggs you have.”
-Ron Swanson
Substitute bacon and eggs for high-capacity NVMe™ SSDs, and you’ll get a good picture of the current reality of the data center storage market. AI is consuming all the bacon and eggs we have, and it’s only going to get hungrier from here.
In this blog, I’ll discuss three drivers for the near-term consumption of faster storage for AI.
- AI accelerators are… accelerating.
- Industry adoption of generative AI: No more cold data.
- Storage software innovation to optimize for TCO.
AI workloads don’t need fast storage, our old rickety HDD platform is doing fine.
Yes, it’s true that many AI workloads have been designed to use large block reads, mostly sequential, which is the optimal use case for HDDs. But that was with Gen3 and Gen4 AI accelerators, as Gen5 AI accelerators like the NVIDIA H100 become widely deployed and the industry readies for B100 and beyond, the pace of bandwidth increase for HBM is far greater than data center system architectures.
Here I’m comparing the bandwidth of 1 unit of AI compute; HBM bandwidth of 1 accelerator, the DRAM bandwidth of 1 CPU at 1 DPC with 8 channels, the bandwidth of 4 NVMe SSDs, and the bandwidth of 24 EAMR HDDs at their max transfer rate. I chose 24 HDDs because it’s typical to overprovision HDDs 6 to 1 versus NVMe SSDs to hit bandwidth requirements. I also chose the max transfer rate to model the best case for HDD. Note that the y-axis scale is log 2.
AI accelerators are rapidly increasing their compute capabilities, enabled by the advancement of HBM. This trend isn’t slowing down, the delta between HBM bandwidth and DRAM and NVMe SSDs is widening with every generation.
This acceleration is driving the historically HDD-based AI workloads to high-capacity NVMe storage, like the Micron 6500 ION. We’re seeing this take place across many customers as the H100-class GPUs are finally starting to become available and are being deployed into more enterprise environments.
While the advancement of AI accelerator capabilities is driving general storage use cases, faster storage can also address emerging AI workloads.
Industry adoption of generative AI: No more cold data
While the initial creation and training of generative AI models like LLMs are done by a few organizations on massive clusters of AI systems, the everyday use case of on-prem inference and fine-tuning is driving AI system adoption into most companies.
Taking a trained model (like a chatbot) and then fine-tuning that model on a company’s proprietary data is becoming a common practice. Here at Micron, we’re using a variety of chatbot-focused tools along with code-generation tools that are trained on our data. Because of the sensitivity of this training data, fine-tuning must be done on-prem and kept within the local infrastructure.
Where is that training data most likely to be stored? Generally, disparate HDD-based storage hardware from any number of vendors. In the past, the common data flow is from hot tier (SSD) to warm tier (HDD w/ SSD cache) to cold tier (slow HDD, possibly powered off) to archive (tape). As AI models advance, the new models will need to be re-trained on proprietary data repeatedly, which means that pulling data from the cold tier and below will cripple the ability to fine-tune effectively. Data is going to warm up, driving the adoption of faster, high-capacity storage systems.
Storage software innovation to optimize for TCO
For generative AI use cases to be viable, efficient utilization of AI systems is critical. Large language models are large and use a lot of HBM and DRAM. In many cases it is optimal to throw increasing amounts of clustered AI resources at a LLM to complete training as fast as possible. For cases like fine-tuning or large-scale inference, solving larger problem sets with less hardware, at the cost of time, will be the right play for TCO.
This optimization is driving the development of innovative AI storage software stacks to effectively use fast NVMe SSDs to expand HBM or DRAM and to optimize the data path. Here are a few examples we’ve tested:
- Big accelerator memory: Research project enabling the replacement of the NVMe driver to allow AI accelerators to directly access NVMe SSDs. Currently works with GNN workloads and can squeeze every bit of small block I/O performance from the fastest NVMe SSDs.
- DeepSpeed ZeRO-Inference: Software for offloading kv-cache for inference to NVMe SSDs, enabling efficient use of system GPU, memory and storage for large-scale inference workloads.
- NVIDIA GPUDirect Storage: Technology that enables data path from GPU to NVMe SSD storage, bypassing the CPU bounce buffer. Greatly improves storage performance on busy systems.
All the bacon and eggs
Storage requirements for AI workloads have lagged behind HBM and memory. Each generation of AI accelerator over the past 8 years has increased performance by five times or more over the previous generation. Early AI workloads were limited by compute resources and memory bandwidth, but with the quick rate of advancement of GPUs, it’s only a matter of time before most AI workloads will require some form of SSD storage.
With generative AI becoming a common enterprise workload, solutions for efficient fine-tuning and training are driving innovations in storage software. High-performance NVMe SSDs can play a role as “slow” memory to enable efficient use of expensive and rare AI system resources.
Micron is working closely with our industry partners to understand the unique requirements of AI workloads. We’re excited about the possibilities our technology will enable across the entire AI system architecture, HBM, memory and data center storage.