On Wednesday, Feb. 28, micron.com will be upgraded between 6 p.m. - 12 a.m. PT. During this upgrade, the site may not behave as expected and pages may not load correctly. Thank you in advance for your patience.

Storage

The Micron 9400 NVMe SSD is the top PCIe Gen4 SSD for AI Storage

By Wes Vaske - 2023-09-19
According to the their website, MLCommons was started in 2018 “…to accelerate machine learning innovation and increase its positive impact on society...” Today, MLCommons maintains and develops 6 different benchmark suites and is developing open datasets to support future state-of-the-art model development. The MLPerf Storage Benchmark Suite is the latest addition to the benchmark collection.

As a member of the MLCommons Storage Working Group, I’ve helped develop benchmark rules and processes to help ensure that benchmark results are meaningful to researchers, customers, and vendors alike and we’ve just published the first round of submissions including results for the Micron 9400 SSD.

But why do we need a new benchmark utility that’s specific to AI workloads?

Characterizing the storage workload for AI Training systems faces two unique challenges that the MLPerf Storage Benchmark Suite aims to address – the cost of AI accelerators and the small size of available datasets.
The first is obvious, AI accelerators can be expensive, complex compute systems and most storage vendors won’t have enough AI systems available just to analyze their products’ scalability in storage solutions. 

The second issue is that the openly-available datasets are small compared to what is commonly used in AI industry. Whereas the datasets available to MLCommons and its participants may get as large as 150 Gigabytes, datasets used in production are frequently 10s to 100s of Terabytes. Modern servers can easily have 1 to 2 Terabytes of DRAM which has the effect of caching the small benchmark datasets in system memory after the first training epoch then executing subsequent runs from that in-DRAM data. But production datasets would not see the same behavior due to their size.

MLPerf Storage addresses the first issue by emulating the accelerators in standard CPU-based servers. At the low level, MLPerf Storage is using the same AI frameworks as the commonly-used workloads (pytorch, tensorflow, etc.) but MLPerf bypasses the compute portion of the platform with a “sleep time” that is found experimentally by running the real workload on systems with the actual AI accelerators.

Comparisons of the emulated accelerators and real accelerators show that the workloads are extremely similar.

MLPerf Storage addresses the second issue by creating datasets that are similar to actual, production datasets but replicated to be much larger. The benchmark supports various data storage technologies like filesystems and object storage as well as multiple data types like serialized numpy arrays, TFRecord files, HDF5 files, and more.

In addition to solving these problems, in a previous blog post with John Mazzie, we showed that the AI training workload is more complex than many expect – the workload is both bursty and latency sensitive.

The MLPerf Storage Benchmark Suite is a great way to exercise storage systems in a way that represents real AI Training workloads without requiring expensive AI accelerators while also supporting dataset sizes representative of real-world datasets.

Now we’re proud to announce that the Micron 9400 NVMe SSD supports 17x accelerators in the 3D Medical Imaging benchmark (Unet3D). This translates to 41 samples per second or 6.1 GB/s of IO throughput.

Armed with this benchmark that’s easy to run and representative of real AI Training environments the Micron Data Center Workloads Engineering team will be presenting data across storage devices and solutions so that we can all better understand how to tune and design storage to increase accelerator utilization.

Micron 9400 NVMe SSD



Wendy Lee-Kadlec

Wes Vaske

Wes Vaske is a Senior Member of Technical Staff on the Micron Data Center Workloads Engineering team in Austin Texas. He analyzes enterprise workloads to understand the performance effects of Flash and DRAM devices on applications and provides 'real-life' workload characterization to internal design & development teams. Wes's specific focus is Artificial Intelligence applications and developing the tools for tracing and system observation.

+