- US - English
- China - 简体中文
- India - English
- Japan - 日本語
- Malaysia - English
- Singapore - English
- Taiwan – 繁體中文
In the dynamic technology landscape, computing system architectures are undergoing a profound transformation. As we approach an era of unprecedented computational demands coupled with the slowing progress of Moore’s Law, computer architects and design engineers are now more fundamentally reimagining how we design, build, and configure systems. The future of computer architecture and energy-efficient hardware is being shaped by substantial new workload requirements and several enabling technologies are emerging to address these needs through modularity.
New Demands Driving Changes in System Architectures
As so many have already noted, the AI revolution is here. Artificial Intelligence (AI) and specifically Deep Learning (DL) have transcended the realm of research labs and are now integral in the growth plans of cloud companies and various industries. From healthcare diagnostics and screenings to financial predictions, AI algorithms are driving decision-making processes. Storing and processing all that data in an efficient manner has become increasingly difficult, with the massive scale of LLMs and GenAI training taking first place in pushing capabilities.
One especially interesting challenge as the volume of the data keeps growing is how to feed it to the main computational engines powering the DL model training – typically the GPU or in some cases, AI-specific processors. The data sets are becoming so large that they not only don’t fit in the capacity of the GPU’s directly attached High Bandwidth Memory (HBM) and can even outstrip the local system memory capacity. In addition to DRAM, NAND Flash storage devices are going to soon become another critical component in enabling greater AI capabilities. Scaling available capacity in systems with the proper performance seems increasingly to be about delivering cost-effective bandwidth while minimizing communication steps and intermediate stages that reduce performance and waste power. Related to this is a general decoupling of the data path, the main route where data flows, from the control path in order to better optimize both. Similar patterns of these optimizations have been made in the past in other fields such as networking with SDN and OpenFlow, but now applied to AI-specific architectures.
Another significant difficulty in meeting the near-term demands of AI is also similar to previous industry challenges – how to keep up with the frequent technological changes and new optimized system architectures. While the time required to develop new, complex system hardware designs has not dramatically improved – the demand has greatly increased for the newest advancements and newly optimized systems, providing competitive advantage through better performance or efficiency. As major workload requirements have started becoming apparent in only the past one or two years, the frequent change makes it exceedingly difficult to deliver the best known solutions possible in a timely manner.
New Modular Technologies Enabling System Architectures
In order to help keep pace with the dynamic tech landscape, organizations have started developing modular computing models such as the Open Compute Project (OCP) Modular Hardware System (MHS) to break down some of the complexity of systems into distinct parts. While there have always been pluggable modules and cards that enable various functionality, the traditional approach in modern systems has been to integrate core computing sub-systems together with the processor in a main system board. This integration provides a single, low-cost implementation with fixed ratios of functionality which may provide a beneficial solution in high-volume. However, the time to build and test systems like this along with the development cost (NRE) required make it prohibitive to frequently build new configurations with latest components as would be ideal for highest performance running critical workloads.
By dividing complex system designs into smaller, interchangeable boards for each sub-system, especially including a standard footprint Host Processor Modules (HPM) that only include the CPU and memory, new system designs can be assembled from existing common modules. Boards delivering new technology can also be more quickly developed and used with existing common boards to significantly shorten the time to deploying new system capabilities. For example, you can imagine a new memory expansion module that provides higher capacity being assembled in a given chassis with the same storage backplane, HPM, networking and storage as used with the previous memory expansion module. This common footprint modularity allows systems to be configured in much more flexible ways that can much better align to specific workload needs – a greater value when next generation needs are not always well known.
With some similar benefits as modularity of system boards, another beneficial technology is the recent standardization of common interfaces for chiplets. By breaking down monolithic chips into separate die, they allow for efficient fabrication and cost-effective production. Chiplets enable advanced fabrication methods for specific components while using older methods for others, expanding product ranges, and improving efficiency. As new techniques are discovered and developed to advance the state of the art for computing performance, each chiplet can be independently designed and upgraded. This fosters flexibility, faster adoption, and easier maintenance of the new functions including those that can accelerate new AI models and emerging workloads.
Supporting Future Systems Needs with EDSFF Standards
The Enterprise and Datacenter Standard Form Factor (EDSFF) industry standards as published in SNIA’s SFF Technology Affiliate group are also playing a role in meeting the evolving demands of system architectures. The EDSFF standards are a set of interdependent specifications that enable connector-compatible pluggable modules in specific form factors (FF), the E1 and E3. The E1 FFs, E1.S (short) and E1.L (long), fit vertically in 1U rack system space. The E3 FFs, E3.S (short) and E3.L (long), fit vertically in the 2U rack system space.
When we first started developing these standards in the industry, several of us shared a belief that they should be versatile to work optimally for our main target of datacenter-optimized storage devices and to also support new technology adoption and applications.
For storage, EDSFF NVMe drives offer several advantages over legacy form factors:
- They enable higher storage density, allowing for more storage capacity and performance in the same physical space.
- EDSFF drives possess better thermal characteristics with slimmer profiles, smaller profiles, and greater surface area.
- Support simplified power support with +12V as the main power voltage rail from system.
- Improved signal integrity for high-speed interfaces and higher power support through common EDSFF standard inexpensive connector that supports up to 112Gbps signaling.
These advantages are particularly valuable for AI workloads, which often require abundant amounts of high-performance storage for model training data. Smaller EDSFF drives provide high storage performance density through scaling out the number of drives in a system, reducing the time that a GPU or processor must wait for the next set of data. Larger form factors can support higher capacities per drive and thus are ideal to provide a performant high-capacity storage tier for the very large training data sets often housed in external storage systems.
Due to the better thermal characteristics of EDSFF drives, system designs can be optimized to further improve GPU cooling and gain higher performance at a given airflow. Smaller form factor drives can also add storage functionality to fit in less space in the front of a system, providing room for front air ducts and openings for fresh air to downstream system components.
The flexibility of configuring systems for various different workloads with many pluggable EDSFF storage devices has already been leveraged in server systems that are in production now. Based on particular workload requirements, the storage capacity, performance, and power can be tuned in the same system to meet a wide variety of needs.
Beyond storage, we’ve also recently seen the first introduction of new devices leveraging the EDSFF family form factors and standard high-speed interface. The new CXL® protocol has provided a means to connect devices with low-latency over the same physical layer signaling and interconnect as PCIe. Since system processors and other chips can support both CXL and PCIe on the same pins, the EDSFF device slots in systems can often already support new devices that connect through the CXL protocol. In the JEDEC industry organization driving memory device and many other semiconductor-related standards, we recently released the industry’s first memory module specification for CXL called the CMM. These CMM devices attach standard DRAM devices through an on-module controller with an CXL interface and plug into EDSFF-compliant system slots to expand system memory capacity without re-designing the system boards.
Thus, the system configuration flexibility has already expanded to memory and storage with EDSFF pluggable modules and there may likely be more types of devices including processing and networking also attached by PCIe or CXL. These devices enhance system flexibility, capability, and performance, making them an ideal choice for AI-driven system architectures.
Looking Forward to Future System Architectures
In summary, the future of system architectures lies in adaptability, scalability, and innovation. As we embrace AI, modular designs, and cutting-edge technologies, system designers and architects play a pivotal role in shaping the digital landscape – more innovation must come at a system level, from chip package to system chassis and rack. Modular system designs, chiplets and pluggable modules like EDSFF drives serve as a bridge between frequently evolving demands and robust, top-performance system designs. They are a key enabler for a more optimal evolution of systems since flexibility is the key to supporting the future demands of AI and other advanced workloads.