NXON

When one of Asia's fastest-growing AI video generation platforms faced unprecedented demand for large-scale model training, they needed more than just hardware — they needed a partner capable of designing, delivering, and operationalizing a hyperscale AI compute environment faster than anyone else in the region could commit to.

They selected NXON.AI.

Their requirements were clear:

•
Train next-generation text-to-video and multimodal models
•
Migrate petabyte-scale datasets with zero data loss
•
Achieve state-of-the-art training speed and scaling efficiency
•
Complete end-to-end delivery in under 45 days

What followed became a benchmark project for the region — and a defining proof of capability for NXON.AI.

The Challenge

The customer's existing H100 environment could no longer support training cycles fast enough to keep up with commercial product releases. Their goal was to leap ahead of competitors by deploying an NVIDIA H200-based supercluster — at a scale never before delivered in Thailand.

Key hurdles:

Technical Barrier	Customer Constraints
GPUs in shortage globally	Full 128-node H200 cluster procurement in < 21 days
Cross-border data logistics	Migration of multi-PB datasets within 14 days
High-density networking	3.2 Tbps per node, zero packet loss under load
Storage scaling	Expand from 2 PB → 4 PB in < 3 weeks, no downtime
Delivery pressure	Cluster must enter production in 45 days from PO

This was not a normal deployment. This was precision engineering under extreme time pressure.

NXON.AI's 128-node H200 supercluster deployed for AI video generation platform

NXON.AI Solution

NXON.AI co-designed and delivered a 960-GPU NVIDIA H200 supercluster, powered by:

✓
128 × Dell XE9680 GPU servers
✓
8-rail 400 GbE RoCEv2 fabric (3.2 Tbps per node, 1:1 non-blocking)
✓
WEKA high-performance storage with sub-120 µs latency
✓
Multi-domain architecture: compute, storage, management, boundary security
✓
GPU‒NIC 1:1 binding + GPUDirect RDMA for full training efficiency
✓
Custom-tuned NCCL + UCX stack for lossless distributed training

Result:

AllReduce: 369.6 GB/s, AllToAll: 47+ GB/s, zero packet-loss, near-perfect scaling from 64 → 120 nodes.

Delivery Highlights

Phase	Duration	Note
Hardware procurement	14 days	Despite global GPU supply constraints
Racking & cabling	10 days	3,000+ cables, 0.1% error rate
Data migration	3 + 14 days	On-prem + overseas multi-PB transfer
Cluster commissioning	12 days	Compute, storage, NCCL, fabric tuning
Storage expansion (2 PB → 4 PB)	21 days	Live, no service interruption

✓
Full go-live in 45 days — ahead of schedule
✓
Fastest known H200 cluster deployment in Asia at launch

Precision racking with 3,000+ cables at 0.1% error rate

High-speed RoCEv2 network fabric installation

8-rail 400 GbE RoCEv2 fabric providing 3.2 Tbps per node

NXON.AI commissioning and engineering team

Expert team conducting final cluster tuning and validation

Business Impact

Before NXON.AI	After
Training bottlenecks due to H100 limits	4× faster end-to-end training throughput
Long iteration loops → slow product release	New model versions deployed weekly
Scaling limited to 256 GPUs	Seamless scaling to 960 GPUs, linear efficiency
Data lake performance < 40 GB/s	> 310 GiB/s read throughput, 11.6M IOPS

The platform now supports:

•
Multi-tenant AI workloads (fine-tuning, large-batch training, generative pipelines)
•
On-demand GPU Pod provisioning via NXON MaaS
•
Enterprise SLA with 99%+ uptime guarantees
•
Future-ready expansion to GB200 or B200 clusters with no redesign required

Why This Project Matters

✓
Fastest hyperscale AI cluster buildout in Thailand
✓
One of the first H200 clusters globally to surpass target NCCL performance
✓
Proves NXON.AI as the region's most advanced sovereign GPU cloud builder
✓
A reference platform for national-level AI R&D, enterprise workloads, and LLM training

"NXON.AI proved that speed, scale, and engineering precision can coexist. What normally takes 4‒6 months was executed in 45 days — without compromise."

— Customer CTO, AI Video Research Division

At a Glance

Metric	Value
GPU Nodes	128 (960 H200 GPUs)
Node Bandwidth	3.2 Tbps (8 × 400 GbE)
Storage Performance	310 GiB/s read, 11.6M IOPS
SLA	99–99.5% uptime guarantee
Time to Production	45 days

From concept to a 960-GPU H200 supercluster in just 45 days