NVME Scratch Pad Diagram

Disaggregated NVMe Scratch Pad: Breaking the GPU Memory Barrier

Corespan Team

Beyond the Hype: The Hard Specs Driving Corespan’s Disaggregated NVMe Architecture

In our previous post, we explored the conceptual benefits of using a disaggregated NVMe scratchpad to offload the GPU vRAM. There are strong arguments for using SSD/NVMe, with much higher capacity, serving as a high-speed "staging area" or "scratch disk" for massive datasets that cannot fit in the GPU's limited vRAM or on-chip memory. We discussed how separating storage from compute allows for shared pools of high-speed memory, ultimately driving better cost efficiency for AI and ultra-high-resolution video processing.

The Benefits of Using SSD/NVMe as a GPU Scratchpad

Using SSDs as a scratchpad or cache layer provides several advantages for data-heavy applications like AI training and scientific simulations:

  1. Massive Capacity: NVMe drives offer terabytes of space, allowing you to handle datasets far larger than what the GPU's internal memory can hold.
  2. Direct Data Path (GPUDirect Storage): Technologies like NVIDIA GPUDirect Storage allow data to move directly between NVMe storage and GPU memory via the PCIe bus, bypassing the CPU and system RAM to reduce latency and CPU load.
  3. Reduced vRAM Bottlenecks: You can use the NVMe drive as a secondary cache for model weights or large assets, effectively extending the "working memory" of the GPU.
  4. Improved Life Span: Offloading frequent "temporary" writes to a dedicated scratch SSD prevents premature wear on your primary system drive.

But concepts only get you so far. Enterprise architects and infrastructure leaders need to know how the hardware performs under the hood.

Customers have told us they want less marketing jargon and more hard data. You need to know exactly how our architecture handles maintenance, scales to meet massive datasets, and routes data without bottlenecks. Here are the core specifications of the Corespan architecture and the direct business outcomes they deliver.

1. Zero-Downtime Serviceability for Maximum Cluster Uptime

When training large-scale AI models, taking down a compute node—or an entire cluster—for routine storage maintenance is an unacceptable disruption. Your infrastructure must support continuous operation.

  • Hot-Swappable Architecture: Corespan supports hot-swapping at both the individual SSD/NVMe drive level and the PCIe RAID card level.
  • Continuous Access: Administrators have access to all NVMe/SSD drivers without incurring system downtime (Note: The chassis simply needs to be slid out on rails for physical access).
  • Intelligent Recognition: The system utilizes dynamic enumeration (via the 3.4 Amber driver) across all PCIe cards and drives, ensuring that newly introduced hardware is instantly recognized and provisioned into the shared pool without requiring system reboots.

The Business Outcome: Drastically reduced maintenance windows and higher ROI on GPU compute, as your accelerators are never left sitting idle waiting for storage nodes to reboot.

2. Massive Scalability & Hardware Flexibility

AI workloads scale exponentially, and your storage tier needs to scale with it—without forcing you into rigid vendor lock-in or premature hardware obsolescence.

  • Extreme Density: Each chassis features 10 PCIe slots, and we support up to eight SSD/NVMe drives per Gen5 x16 slot.
  • Independent Power: The architecture includes external power capabilities to comfortably support over 80 drives.
  • Media Agnostic: The system natively supports U.2, U.3, and E3.S NVMe SSD form factors.
  • Simultaneous Operations: You don’t have to choose between standards; the chassis supports both U.2 (SFF-8639) Gen5 x4 and U.3 (SFF-TA-1001) Gen5 x8 drives simultaneously in the same environment.

The Business Outcome: Future-proofed infrastructure. You can maximize your existing hardware investments while seamlessly integrating the next generation of NVMe form factors, scaling capacity exactly when and where your workloads demand it.

3. Uncompromised Data Paths & Throughput

High capacity is meaningless if the data gets bottlenecked on its way to the GPU. Corespan’s architecture is engineered to guarantee the latency and throughput required for persistent, high-write scratchpad use.

  • Direct GPU Communication: We feature full NVMe DMA (Direct Memory Access) support between internal GPUs.
  • Dedicated Bandwidth: The architecture is configurable, provide an oversubscribed and non-oversubscribed path between the CPU and NVMe/SSD drives.

The Business Outcome: Faster time-to-insight. Utilizing DMA storage kernel drivers, Corespan scratchpad, and GPUs, data can bypass the CPU, guaranteeing dedicated lane bandwidth, and data flows directly to the accelerators, eliminating the I/O stalls that typically plague data-heavy operations.

4. Enterprise-Grade Data Management at the Edge

Disaggregated scratch data still requires robust management and redundancy, especially when dealing with mission-critical pipelines.

  • Advanced RAID Support: Native support for RAID 0, RAID 1, and RAID 10 configurations.
  • Granular Control: Administrators can configure up to 4 separate RAID arrays per single PCIe slot.

The Business Outcome: Superior risk mitigation and workload optimization. You have the granular flexibility to balance raw speed (RAID 0 for pure scratch data) with redundancy (RAID 1/10 for critical persistence) on a per-workload or per-tenant basis within the exact same chassis.

Ready to build?

Corespan’s disaggregated NVMe scratch fabric isn’t just a conceptual shift; it’s a rigorously engineered hardware solution built for the reality of modern data centers.

Download brief