NXON

Engineered by NXON for RoCEv2-optimized, multi-petabyte AI training environments.

Modern AI training requires storage that keeps pace with the world's fastest GPUs. NXON integrates WEKA's high-performance file system with our expertly engineered RoCEv2 RDMA fabric, enabling:

•Consistent sub-millisecond latency, even during metadata-heavy checkpoint bursts
•3 TB/s reads & 1.5 TB/s writes aggregate throughput
•Up to 40 GB/s per GPU node, no manual tuning
•Scaling to 140+ GPU clients with zero performance degradation
•Predictable checkpoint times → better GPU utilization
•<2 rack footprint for 4.7 PB usable Tier-1 NVMe

•Distributed, parallel metadata architecture
•Single NVMe tier (no tiering overhead)
•Kernel-bypass I/O client for maximum bandwidth
•Unified POSIX namespace for datasets & checkpoints
•2 × 400 GbE uplinks per backend node
•Fully lossless, RoCEv2-enabled RDMA fabric

Designed, engineered, and validated by NXON

NXON's team delivered:

•RoCEv2 RDMA fabric tuning (PFC, ECN, buffer optimization)
•Lossless Ethernet configuration
•GPU node I/O optimization (CPU pinning + NIC affinity)
•Non-disruptive expansion capability

Cluster Throughput

•~3 TB/s Reads
•~1.5 TB/s Writes

Per-Node GPU Throughput

•Up to 40 GB/s from dual-200 GbE GPU servers
•~80% link efficiency

Latency

•Stable <0.5 ms, even under heavy metadata load
•Millions of IOPS with no tail-latency spikes

Scalability

•Sustained performance across 140+ GPU nodes
•Zero loss in per-node throughput
•Automatic backend load balancing

•High throughput with lower power consumption
•Achieved 4.7 PB usable in under 2 racks
•Lower cooling and datacenter footprint

•Predictable checkpointing, enabling higher training cadence
•Reduced GPU idle time → higher GPU utilization
•Smoother operations, no manual tuning
•Non-disruptive scaling as datasets and models grow

NXON provides end-to-end GPU infrastructure expertise:

•Storage architecture design
•RoCEv2 fabric engineering
•Deployment in under four days
•Data migration + parallel copy strategies
•Ongoing tuning for scale-out AI workloads

Ready to Accelerate Your AI Training Pipeline? NXON builds storage architectures designed for the next generation of AI.

WEKA-Powered Tier-1 NVMe Storage for Extreme GPU Training

Designed, engineered, and validated by NXON

Cluster Throughput

Per-Node GPU Throughput

Latency

Scalability

Download Full White Paper

Ready to Implement?