Cloud&AI

Huawei Unveils New AI Chips and SuperPod Solutions

September 27, 2025 209

At HUAWEI CONNECT 2025, Huawei Rotating Chairman Xu Zhijun unveiled an ambitious and well-structured technology roadmap. In a high-profile move, he announced four of its latest AI chip developments alongside a SuperPod solution plan, which has been hailed as the world's most powerful computing SuperPods and clusters.

Prior to this, Huawei had already launched the Cloud Matrix 384 (CM384) SuperPod, which connects 384 GPUs simultaneously through optical interconnection enabled by 6,912 LPO optical modules. At the conference, Huawei stated that over 300 units of the CM384 have been accumulatively deployed, serving more than 20 customers. The upcoming Atlas 950 SuperPod and Atlas 960 SuperPod will support large-scale parallel computing with 8,000 and 15,000 cards respectively, further pushing the boundaries of SuperPod scale and computing power—blurring the lines with the traditional concept of "clusters".

As an industry media outlet focusing on optical communications, CFOL is particularly concerned with three questions: What impact will SuperPods and the Lingqu Bus have on the optical communications industry? What innovations do Huawei's latest AI chips feature? And how do they compare to NVIDIA's solutions? With these questions in mind, we reviewed Mr. Xu's speech and share the key takeaways below.

The concept of "SuperPod" was first proposed by NVIDIA. Technically, it adopts a Scale-Up model that tightly couples a large number of computing chips into a single high-speed interconnection domain, effectively addressing the key challenge of coordinated scheduling of chips in large-scale computing clusters.

Xu Zhijun emphasized at the conference: "SuperPods have become the new normal for AI infrastructure construction."

From the perspective of industrial application demand, the scaling up of SuperPods aligns with the global and domestic demand for computing power. According to a survey by Hexian Research C&C, leading global tech companies such as OpenAI, Microsoft, xAI, and Meta have all launched projects to build GPU clusters with over 100,000 cards. Domestically, as the construction of intelligent computing centers entered a fast track in 2024, data center projects with 10,000-card clusters are being accelerated—making "10,000-card scale" the mainstream choice to match China's current AI computing power needs. In this regard, Huawei's new-generation SuperPods appear fully capable of meeting China's current computing power demands.

Notably, NVIDIA and Huawei exhibit distinct differences in the interconnection methods and scales of their SuperPods:

• NVIDIA: Copper Interconnection as Core, Optical Interconnection as SupplementNVIDIA has long relied on copper interconnection as the core technical solution for its Scale-Up SuperPods. For example, its NVL72 product integrates 72 GPUs in a single cabinet, with high-speed connections between GPUs via short-range copper cables. The larger-scale NVL576, meanwhile, uses high-speed InfiniBand or Ethernet to "Scale-Out" 8 NVL72 cabinets, forming a complete cluster through optical interconnection. It is evident that in NVIDIA's architecture, copper interconnection dominates within a single SuperPod, while optical interconnection is only used for cluster expansion between multiple SuperPods.

• Huawei: All-Optical Interconnection to Break SuperPod Scale BoundariesHuawei has focused on an optical interconnection-centric technical route to build increasingly large SuperPods. At the conference, Huawei further proposed the concept of "SuperPod + Cluster" to continuously address China's AI computing power bottlenecks, with all links connected via optical interconnection technology. From CFOL's perspective, Huawei is blurring the traditional boundary between "SuperPod" and "Cluster" by expanding the computing scale of individual SuperPods—its SuperPods essentially already possess the core capabilities of data center clusters.

Lingqu Protocol: Enabling CPU-GPU Collaboration Under All-Optical Interconnection Architecture

In terms of SuperPod interconnection protocols, Huawei launched the Lingqu (UnifiedBus, UB for short) new computing system architecture. This architecture establishes a technical foundation for resource pooling and equal collaboration among diverse components such as CPUs, NPUs, GPUs, memory (MEM), and switches. Its core breakthrough lies in enabling efficient scheduling of various computing units through all-optical interconnection.

This architectural choice further widens the gap between Huawei and NVIDIA in their SuperPod computing support systems, and directly determines the core positioning of their SuperPods in application scenarios.

From NVIDIA's technical layout, its SuperPods have always centered on GPU interconnection, focusing on compute-intensive scenarios such as AI training and high-performance computing. Whether it is the previous NVL72 and NVL576, or the upcoming NVL144, their architectural designs enhance intelligent computing capabilities by optimizing inter-GPU connection efficiency—without integrating general-purpose CPUs into the core of the SuperPod. This design gives NVIDIA's SuperPods strong specialized performance in pure intelligent computing scenarios.

In contrast, Huawei has adhered to the "GPU + CPU collaboration" technical route since the launch of the CM384 SuperPod. For instance, the CM384 integrates 384 Atlas 910C NPUs and 192 Kunpeng 920 CPUs, with efficient collaboration between the two types of computing cards enabled by optical interconnection technology. This design allows Huawei's SuperPods to meet both intelligent computing needs (e.g., AI and large-model training) and general computing scenarios (e.g., data processing and business logic operations), adapting to diverse enterprise-level application requirements. It demonstrates particularly strong comprehensive adaptability in complex business scenarios that require collaboration between the two types of computing power.

Breaking Memory Bottlenecks: Hardware Upgrades or Hardware Collaboration?

As generative AI moves toward large-scale applications, attention must be paid not only to the "peak computing power" of AI chips but also to memory bandwidth and power constraints. Memory bottlenecks have become a core obstacle to the sustainable development of the AI industry. Both Huawei and NVIDIA have focused their strategies on hardware upgrades and technical collaboration, albeit with different priorities.

Hardware Upgrades: High-Bandwidth Memory (HBM) as a Core Competitiveness for High-End Computing Chips

NVIDIA's H100 chip has long dominated the high-end market with its 4TB/s HBM bandwidth. The H20 chip, based on the Hopper architecture, not only features 96GB of HBM3 memory but also maintains a high bandwidth of 4.0TB/s, supporting large-scale AI computing. In contrast, Huawei's newly announced Atlas 950 series has achieved key breakthroughs in HBM technology: its self-developed HBM reaches a bandwidth of 4TB/s, with memory capacity expanded to 144GB. The Atlas 950DT further incorporates HiZQ 2.0 self-developed HBM technology and adopts a 128B fine-grained memory access design (four times more efficient than the previous generation), enabling AI chips to process unstructured data more accurately and efficiently, and significantly reducing data reading latency at the hardware level.

Hardware Collaboration: NVIDIA's Acquisition to Address "Collaboration Bottlenecks"

Coincidentally, around the same time, NVIDIA announced it would spend over $900 million to recruit Rochan Sankar, CEO of AI hardware startup Enfabrica, along with its core team, and obtain licenses for the company's key technologies. Enfabrica's core technology addresses the "collaboration bottleneck" of AI clusters—its Elastic Memory Fabric System (EMFASYS) optimizes the data transmission architecture between chips, enabling efficient integration and collaborative operation of thousands of computing chips, providing an innovative solution to break through memory bottlenecks. In AI clusters, if network components have slow response speeds or poor cost-effectiveness, even high-performance individual computing chips will remain idle while waiting for cross-chip data transmission, resulting in massive resource waste. This integration represents NVIDIA's key move to solve the "collaboration efficiency" problem in large-scale AI clusters.

Precision Formats: Huawei's HiF8 Balances Efficiency and Precision

In terms of "precision formats" for computing chips, the industry currently widely uses FP16 (half-precision) and FP8 (8-bit precision). These formats sacrifice some precision to reduce storage usage and data transmission volume, thereby achieving significant improvements in storage and computing efficiency. FP16 is a "balanced choice between precision and efficiency," suitable for mid-range inference and lightweight training scenarios that require a certain level of precision. FP8, on the other hand, is an "extreme optimization prioritizing efficiency," better suited for high-concurrency needs in large-scale AI inference. Huawei's self-developed HiF8 format, while retaining the efficiency of FP8, achieves precision close to that of FP16 through an innovative dynamic bit-domain design and tapered precision optimization—breaking new ground in both "low overhead" and "high precision," and providing more flexible precision options for AI computing.