
Disaggregated NVMe Scratch Pad: Breaking the GPU Memory Barrier
Corespan Team
Disaggregated NVMe Scratch Pad: Breaking the GPU Memory Barrier
As AI models grow larger and data pipelines become more demanding, one constraint continues to limit performance: GPU memory. Even with advances in HBM and GDDR, workloads frequently exceed on-board capacity—forcing compromises in model size, performance, or cost.
A new approach is emerging to address this challenge: disaggregated NVMe scratch-pad storage.
The Problem with Traditional Architectures
In conventional server-centric designs, NVMe storage is tightly coupled to individual GPUs or hosts. This creates several inefficiencies:
- Overprovisioning for peak demand, leaving capacity underutilized
- Static resource allocation, limiting flexibility across workloads
- Unpredictable performance, especially under variable load
Scaling storage often means scaling compute—adding GPUs or entire nodes just to gain more NVMe capacity. This is both inefficient and costly.
A Disaggregated Approach
Corespan rethinks this model by decoupling NVMe storage from compute resources and turning it into a shared, intelligent scratch-pad layer.
Instead of being tied to a single server, NVMe flash becomes part of a pooled resource, accessible by any GPU, CPU, or accelerator across the cluster.
This enables:
- Dynamic allocation of scratch storage based on real-time demand
- Independent scaling of storage and compute
- Higher utilization across infrastructure
The result is a more flexible and efficient architecture that aligns resources with workload needs.
Extending GPU Memory with NVMe
NVMe scratch pads act as a high-speed extension of GPU memory, sitting just below HBM in the hierarchy.
This is critical for workloads such as:
- Large-scale AI training
- High-throughput inference
- Ultra-high-resolution video processing
When data exceeds GPU memory, the scratch pad absorbs overflow without forcing:
- Model sharding
- Aggressive compression
- Performance degradation
While NVMe is slower than HBM, it offers significantly higher capacity at a lower cost—making it an effective balance between performance and scalability.
From Local SSDs to a Shared Fabric
Traditional approaches—like local PCIe SSDs or on-board flash—provide limited benefits:
- Capacity is locked to a single GPU
- Scaling requires hardware changes
- Performance varies across nodes
Corespan’s architecture replaces this with a shared NVMe fabric that is:
- Provisioned as a unified resource
- Scheduled across workloads
- Monitored for consistent performance
This ensures predictable throughput and balanced utilization at scale.
Intelligent Data Placement and Lifecycle Management
Scratch-pad workloads are highly write-intensive and short-lived, placing stress on SSD endurance and performance.
Corespan addresses this with an intelligent orchestration layer that:
- Uses flexible data placement (FDP) to optimize writes
- Reduces garbage collection overhead
- Applies over-provisioning strategies to extend drive life
Because the storage is disaggregated, the system can:
- Route the most demanding workloads to the most suitable media
- Reclaim and repurpose capacity instantly when jobs complete
Direct, Low-Latency Access
Modern architectures increasingly enable GPUs to access NVMe directly, bypassing the CPU to reduce latency.
Corespan builds on this by providing:
- Optimized data paths for accelerators
- Low-latency, high-bandwidth access to shared storage
- Seamless reallocation across clusters and tenants
This eliminates the need for manual reconfiguration or hardware changes when workloads shift.
Key Considerations
While NVMe scratch pads offer significant benefits, several factors must be managed:
- Performance Gap: NVMe is slower than GPU memory, requiring intelligent orchestration
- Endurance: High write cycles demand durable SSDs and smart placement strategies
- Thermals: Proper cooling is essential to prevent throttling
- Software Stack: Efficient configuration and orchestration are critical for optimal results
The Bottom Line
Disaggregated NVMe scratch-pad storage represents a fundamental shift in how infrastructure supports modern workloads.
By separating storage from compute and managing it as a shared, intelligent resource, organizations can:
- Overcome GPU memory limitations
- Improve infrastructure utilization
- Scale more efficiently
- Deliver consistent, predictable performance
As AI and HPC workloads continue to evolve, this architecture provides a practical path forward—balancing performance, flexibility, and cost in a way traditional designs cannot.