NXON

This white paper unveils NXON.AI's engineering principles and implementation strategy that deliver high performance, scalability, and sustainability for sovereign AI cloud ecosystems.

Training large language models (LLMs) with trillions of parameters demands synchronized communication between thousands of GPUs. Traditional TCP/IP networks introduce latency through CPU processing, context switching, and retransmission overhead, creating a bottleneck that limits scaling efficiency.

NXON.AI's solution: Integrate Remote Direct Memory Access (RDMA) to bypass CPU intervention, enabling direct data movement between GPU memories and reducing latency by orders of magnitude.

TCP vs RDMA Architecture Comparison

RDMA Protocol Comparison

Protocol	Transport Layer	Ecosystem	Cost	Performance	Adoption
InfiniBand	Proprietary	Closed	High	Excellent+	Limited
iWARP	TCP	Open	Medium	Weak	Declining
RoCEv2	UDP/IP	Open Ethernet	Low	Excellent	Growing

RoCEv2 achieves InfiniBand-like performance while maintaining Ethernet openness and flexibility—a decisive factor for scalable AI networks.

NXON.AI's data center architecture leverages RoCEv2 to achieve ultra-efficient interconnects for its 400G‒800G GPU clusters.

Key Benefits:

•Open Standard: Interoperable across multi-vendor ecosystems.
•Low Cost: Uses standard Ethernet hardware and software.
•Scalable: Supports >32,000 ports in 800G networks.
•Rapid Deployment: Lead time reduced to 1‒2 months.
•Dynamic Latency Control: Consistent sub-5µs operational latency.

NXON.AI's RoCE network integrates:

•PFC & ECN for lossless and efficient traffic management.
•AI-driven Load Balancing (AILB) and Rate-Aware Load Balancing (RALB).
•Telemetry-based Congestion Control with gRPC and NETCONF.
•1:1 Oversubscription Ratio using 400G Ruijie RG-S6990-128QC2XS switches.

Performance Metrics:

•97% bandwidth utilization
•<5µs latency
•128x400G per chassis

RoCEv2 Packet Header Structure

PFC and ECN Comparison

Feature	PFC	ECN
Function	Stop traffic preemptively	Slow down senders intelligently
Scope	Hop-by-hop	End-to-end
Mechanism	Reactive pause	Proactive adjustment
Analogy	Traffic cop halting vehicles	GPS rerouting before congestion

Together, these mechanisms maintain a perfect flow equilibrium across AI workloads.

NXON.AI monitors CNP and NAK packets for real-time congestion insight:

•CNP (Congestion Notification Packets) trigger flow reduction before queue overflow.
•NAK (Negative Acknowledgement) packets highlight retransmission causes.

These are continuously monitored via ERSPAN mirroring and gRPC streaming, enabling predictive maintenance and adaptive performance tuning.

In AI training and high-performance computing (HPC) environments, RoCEv2 enables deterministic performance under extreme workloads.

Use Cases:

•Multi-node LLM training (1‒10 trillion parameters)
•AIGC and multimodal inference
•Scientific simulation and industrial rendering

Results:

•Sub-5µs latency
•97% link utilization
•Zero-loss inter-GPU communication

Through the deployment of RoCEv2, NXON.AI transforms Ethernet into a high-performance, lossless backbone for AI infrastructure. This achievement represents a leap forward in network innovation, sustainability, and sovereignty.

NXON.AI stands at the forefront of Asia's AI future—delivering speed, stability, and sovereignty in every packet transmitted.

NXON.AI builds next-generation AI infrastructure, integrating GPU clusters, high-speed networks, and orchestration technologies to empower sovereign and enterprise AI ecosystems across Asia.

For more information, visit www.nxon.ai.

Accelerating AI Infrastructure with RoCE for a Lossless Future