Hero Banner
Success Story
Thailand45 days

From concept to a 960-GPU H200 supercluster in just 45 days

NXON.AI co-designed, delivered, and operationalized a 128-node NVIDIA H200 cluster with 3.2 Tbps per node networking, WEKA storage, and near-perfect NCCL scaling to power a fast-growing AI video platform's next-generation models.

Asia's Fastest-Growing AI Video Generation Platform

by NXON.AI

When one of Asia's fastest-growing AI video generation platforms faced unprecedented demand for large-scale model training, they needed more than just hardware — they needed a partner capable of designing, delivering, and operationalizing a hyperscale AI compute environment faster than anyone else in the region could commit to.

They selected NXON.AI.

Their requirements were clear:

  • Train next-generation text-to-video and multimodal models
  • Migrate petabyte-scale datasets with zero data loss
  • Achieve state-of-the-art training speed and scaling efficiency
  • Complete end-to-end delivery in under 45 days

What followed became a benchmark project for the region — and a defining proof of capability for NXON.AI.

The Challenge

The customer's existing H100 environment could no longer support training cycles fast enough to keep up with commercial product releases. Their goal was to leap ahead of competitors by deploying an NVIDIA H200-based supercluster — at a scale never before delivered in Thailand.

Key hurdles:

Technical BarrierCustomer Constraints
GPUs in shortage globally
Full 128-node H200 cluster procurement in < 21 days
Cross-border data logistics
Migration of multi-PB datasets within 14 days
High-density networking
3.2 Tbps per node, zero packet loss under load
Storage scaling
Expand from 2 PB → 4 PB in < 3 weeks, no downtime
Delivery pressure
Cluster must enter production in 45 days from PO

This was not a normal deployment. This was precision engineering under extreme time pressure.

960-GPU H200 Supercluster Overview

NXON.AI's 128-node H200 supercluster deployed for AI video generation platform

NXON.AI Solution

NXON.AI co-designed and delivered a 960-GPU NVIDIA H200 supercluster, powered by:

  • 128 × Dell XE9680 GPU servers
  • 8-rail 400 GbE RoCEv2 fabric (3.2 Tbps per node, 1:1 non-blocking)
  • WEKA high-performance storage with sub-120 µs latency
  • Multi-domain architecture: compute, storage, management, boundary security
  • GPU‒NIC 1:1 binding + GPUDirect RDMA for full training efficiency
  • Custom-tuned NCCL + UCX stack for lossless distributed training

Result:

AllReduce: 369.6 GB/s, AllToAll: 47+ GB/s, zero packet-loss, near-perfect scaling from 64 → 120 nodes.

Delivery Highlights

PhaseDurationNote
Hardware procurement
14 days
Despite global GPU supply constraints
Racking & cabling
10 days
3,000+ cables, 0.1% error rate
Data migration
3 + 14 days
On-prem + overseas multi-PB transfer
Cluster commissioning
12 days
Compute, storage, NCCL, fabric tuning
Storage expansion (2 PB → 4 PB)
21 days
Live, no service interruption
  • Full go-live in 45 days — ahead of schedule
  • Fastest known H200 cluster deployment in Asia at launch
Data center racking and cabling process

Precision racking with 3,000+ cables at 0.1% error rate

High-speed RoCEv2 network fabric installation

8-rail 400 GbE RoCEv2 fabric providing 3.2 Tbps per node

NXON.AI commissioning and engineering team

Expert team conducting final cluster tuning and validation

Business Impact

Before NXON.AIAfter
Training bottlenecks due to H100 limits
4× faster end-to-end training throughput
Long iteration loops → slow product release
New model versions deployed weekly
Scaling limited to 256 GPUs
Seamless scaling to 960 GPUs, linear efficiency
Data lake performance < 40 GB/s
> 310 GiB/s read throughput, 11.6M IOPS

The platform now supports:

  • Multi-tenant AI workloads (fine-tuning, large-batch training, generative pipelines)
  • On-demand GPU Pod provisioning via NXON MaaS
  • Enterprise SLA with 99%+ uptime guarantees
  • Future-ready expansion to GB200 or B200 clusters with no redesign required

Why This Project Matters

  • Fastest hyperscale AI cluster buildout in Thailand
  • One of the first H200 clusters globally to surpass target NCCL performance
  • Proves NXON.AI as the region's most advanced sovereign GPU cloud builder
  • A reference platform for national-level AI R&D, enterprise workloads, and LLM training

"NXON.AI proved that speed, scale, and engineering precision can coexist. What normally takes 4‒6 months was executed in 45 days — without compromise."

— Customer CTO, AI Video Research Division

At a Glance

MetricValue
GPU Nodes
128 (960 H200 GPUs)
Node Bandwidth
3.2 Tbps (8 × 400 GbE)
Storage Performance
310 GiB/s read, 11.6M IOPS
SLA
99–99.5% uptime guarantee
Time to Production
45 days

Project Highlights

Project

960-GPU H200 Supercluster Deployment

Timeline

45 days

Location

Thailand

Organization

NXON.AI

Key Achievements

  • Fastest H200 cluster deployment in Asia at launch
  • 960 NVIDIA H200 GPUs across 128 nodes
  • 3.2 Tbps per node networking
  • AllReduce: 369.6 GB/s performance
  • Zero packet loss under load
  • 45-day delivery from PO to production

Share this story

Help others discover this success story

Ready for Similar Success?

Contact our team to discuss how we can help you achieve breakthrough results.