NVME Scratch Pad Diagram

Disaggregated NVMe Scratch Pad: Breaking the GPU Memory Barrier

Corespan Team

Disaggregated NVMe Scratch Pad: Breaking the GPU Memory Barrier

As AI models grow larger and data pipelines become more demanding, one constraint continues to limit performance: GPU memory. Even with advances in HBM and GDDR, workloads frequently exceed on-board capacity—forcing compromises in model size, performance, or cost.

A new approach is emerging to address this challenge: disaggregated NVMe scratch-pad storage.

The Problem with Traditional Architectures

In conventional server-centric designs, NVMe storage is tightly coupled to individual GPUs or hosts. This creates several inefficiencies:

  • Overprovisioning for peak demand, leaving capacity underutilized
  • Static resource allocation, limiting flexibility across workloads
  • Unpredictable performance, especially under variable load

Scaling storage often means scaling compute—adding GPUs or entire nodes just to gain more NVMe capacity. This is both inefficient and costly.

A Disaggregated Approach

Corespan rethinks this model by decoupling NVMe storage from compute resources and turning it into a shared, intelligent scratch-pad layer.

Instead of being tied to a single server, NVMe flash becomes part of a pooled resource, accessible by any GPU, CPU, or accelerator across the cluster.

This enables:

  • Dynamic allocation of scratch storage based on real-time demand
  • Independent scaling of storage and compute
  • Higher utilization across infrastructure

The result is a more flexible and efficient architecture that aligns resources with workload needs.

Extending GPU Memory with NVMe

NVMe scratch pads act as a high-speed extension of GPU memory, sitting just below HBM in the hierarchy.

This is critical for workloads such as:

  • Large-scale AI training
  • High-throughput inference
  • Ultra-high-resolution video processing

When data exceeds GPU memory, the scratch pad absorbs overflow without forcing:

  • Model sharding
  • Aggressive compression
  • Performance degradation

While NVMe is slower than HBM, it offers significantly higher capacity at a lower cost—making it an effective balance between performance and scalability.

From Local SSDs to a Shared Fabric

Traditional approaches—like local PCIe SSDs or on-board flash—provide limited benefits:

  • Capacity is locked to a single GPU
  • Scaling requires hardware changes
  • Performance varies across nodes

Corespan’s architecture replaces this with a shared NVMe fabric that is:

  • Provisioned as a unified resource
  • Scheduled across workloads
  • Monitored for consistent performance

This ensures predictable throughput and balanced utilization at scale.

Intelligent Data Placement and Lifecycle Management

Scratch-pad workloads are highly write-intensive and short-lived, placing stress on SSD endurance and performance.

Corespan addresses this with an intelligent orchestration layer that:

  • Uses flexible data placement (FDP) to optimize writes
  • Reduces garbage collection overhead
  • Applies over-provisioning strategies to extend drive life

Because the storage is disaggregated, the system can:

  • Route the most demanding workloads to the most suitable media
  • Reclaim and repurpose capacity instantly when jobs complete

Direct, Low-Latency Access

Modern architectures increasingly enable GPUs to access NVMe directly, bypassing the CPU to reduce latency.

Corespan builds on this by providing:

  • Optimized data paths for accelerators
  • Low-latency, high-bandwidth access to shared storage
  • Seamless reallocation across clusters and tenants

This eliminates the need for manual reconfiguration or hardware changes when workloads shift.

Key Considerations

While NVMe scratch pads offer significant benefits, several factors must be managed:

  • Performance Gap: NVMe is slower than GPU memory, requiring intelligent orchestration
  • Endurance: High write cycles demand durable SSDs and smart placement strategies
  • Thermals: Proper cooling is essential to prevent throttling
  • Software Stack: Efficient configuration and orchestration are critical for optimal results

The Bottom Line

Disaggregated NVMe scratch-pad storage represents a fundamental shift in how infrastructure supports modern workloads.

By separating storage from compute and managing it as a shared, intelligent resource, organizations can:

  • Overcome GPU memory limitations
  • Improve infrastructure utilization
  • Scale more efficiently
  • Deliver consistent, predictable performance

As AI and HPC workloads continue to evolve, this architecture provides a practical path forward—balancing performance, flexibility, and cost in a way traditional designs cannot.

Download Brief