When one of Asia's fastest-growing AI video generation platforms faced unprecedented demand for large-scale model training, they needed more than just hardware — they needed a partner capable of designing, delivering, and operationalizing a hyperscale AI compute environment faster than anyone else in the region could commit to.
They selected NXON.AI.
Their requirements were clear:
- •Train next-generation text-to-video and multimodal models
- •Migrate petabyte-scale datasets with zero data loss
- •Achieve state-of-the-art training speed and scaling efficiency
- •Complete end-to-end delivery in under 45 days
What followed became a benchmark project for the region — and a defining proof of capability for NXON.AI.
The Challenge
The customer's existing H100 environment could no longer support training cycles fast enough to keep up with commercial product releases. Their goal was to leap ahead of competitors by deploying an NVIDIA H200-based supercluster — at a scale never before delivered in Thailand.
Key hurdles:
| Technical Barrier | Customer Constraints |
|---|---|
GPUs in shortage globally | Full 128-node H200 cluster procurement in < 21 days |
Cross-border data logistics | Migration of multi-PB datasets within 14 days |
High-density networking | 3.2 Tbps per node, zero packet loss under load |
Storage scaling | Expand from 2 PB → 4 PB in < 3 weeks, no downtime |
Delivery pressure | Cluster must enter production in 45 days from PO |
This was not a normal deployment. This was precision engineering under extreme time pressure.

NXON.AI's 128-node H200 supercluster deployed for AI video generation platform
NXON.AI Solution
NXON.AI co-designed and delivered a 960-GPU NVIDIA H200 supercluster, powered by:
- ✓128 × Dell XE9680 GPU servers
- ✓8-rail 400 GbE RoCEv2 fabric (3.2 Tbps per node, 1:1 non-blocking)
- ✓WEKA high-performance storage with sub-120 µs latency
- ✓Multi-domain architecture: compute, storage, management, boundary security
- ✓GPU‒NIC 1:1 binding + GPUDirect RDMA for full training efficiency
- ✓Custom-tuned NCCL + UCX stack for lossless distributed training
Result:
Delivery Highlights
| Phase | Duration | Note |
|---|---|---|
Hardware procurement | 14 days | Despite global GPU supply constraints |
Racking & cabling | 10 days | 3,000+ cables, 0.1% error rate |
Data migration | 3 + 14 days | On-prem + overseas multi-PB transfer |
Cluster commissioning | 12 days | Compute, storage, NCCL, fabric tuning |
Storage expansion (2 PB → 4 PB) | 21 days | Live, no service interruption |
- ✓Full go-live in 45 days — ahead of schedule
- ✓Fastest known H200 cluster deployment in Asia at launch

Precision racking with 3,000+ cables at 0.1% error rate

8-rail 400 GbE RoCEv2 fabric providing 3.2 Tbps per node

Expert team conducting final cluster tuning and validation
Business Impact
| Before NXON.AI | After |
|---|---|
Training bottlenecks due to H100 limits | 4× faster end-to-end training throughput |
Long iteration loops → slow product release | New model versions deployed weekly |
Scaling limited to 256 GPUs | Seamless scaling to 960 GPUs, linear efficiency |
Data lake performance < 40 GB/s | > 310 GiB/s read throughput, 11.6M IOPS |
The platform now supports:
- •Multi-tenant AI workloads (fine-tuning, large-batch training, generative pipelines)
- •On-demand GPU Pod provisioning via NXON MaaS
- •Enterprise SLA with 99%+ uptime guarantees
- •Future-ready expansion to GB200 or B200 clusters with no redesign required
Why This Project Matters
- ✓Fastest hyperscale AI cluster buildout in Thailand
- ✓One of the first H200 clusters globally to surpass target NCCL performance
- ✓Proves NXON.AI as the region's most advanced sovereign GPU cloud builder
- ✓A reference platform for national-level AI R&D, enterprise workloads, and LLM training
"NXON.AI proved that speed, scale, and engineering precision can coexist. What normally takes 4‒6 months was executed in 45 days — without compromise."
— Customer CTO, AI Video Research Division
At a Glance
| Metric | Value |
|---|---|
GPU Nodes | 128 (960 H200 GPUs) |
Node Bandwidth | 3.2 Tbps (8 × 400 GbE) |
Storage Performance | 310 GiB/s read, 11.6M IOPS |
SLA | 99–99.5% uptime guarantee |
Time to Production | 45 days |
