NVIDIA GB200 NVL72

Supercharging Next-Generation AI and Accelerated Computing

Highlights

Real-Time LLM Inference

GB200 NVL72 introduces cutting-edge capabilities and a second-generation Transformer Engine which enables FP4 AI and when coupled with fifth-generation NVIDIA NVLink, delivers 30X faster real-time LLM inference performance for trillion-parameter language models. This advancement is made possible with a new generation of Tensor Cores, which introduce new microscaling formats, giving high accuracy and greater throughput. Additionally, the GB200 NVL72 uses NVLink and liquid cooling to create a single massive 72-GPU rack that can overcome communication bottlenecks.

Massive-Scale Training

GB200 NVL72 includes a faster second-generation Transformer Engine featuring FP8 precision, enabling a remarkable 4X faster training for large language models at scale. This breakthrough is complemented by the fifth-generation NVLink, which provides 1.8 terabytes per second (TB/s) of GPU-to-GPU interconnect, InfiniBand networking, and NVIDIA Magnum IO™ software.

Energy-Efficient Infrastructure

Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy consumption. Liquid cooling increases compute density, reduces the amount of floor space used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink domain architectures. Compared to NVIDIA H100 air-cooled infrastructure, GB200 delivers 25X more performance at the same power while reducing water consumption.

Data Processing

Databases play critical roles in handling, processing, and analyzing large volumes of data for enterprises. GB200 takes advantage of the high-bandwidth memory performance, NVLink-C2C, and dedicated decompression engines in the NVIDIA Blackwell architecture to speed up key database queries by 18X compared to CPU and deliver a 5X better TCO.

Technical Specification

	GB200 NVL72	GB200 Grace Blackwell Superchip
Configuration	36 Grace CPU : 72 Blackwell GPUs	1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core²	1,440 PFLOPS	40 PFLOPS
FP8/FP6 Tensor Core²	720 PFLOPS	20 PFLOPS
INT8 Tensor Core²	720 POPS	20 POPS
FP16/BF16 Tensor Core²	360 PFLOPS	10 PFLOPS
TF32 Tensor Core	180 PFLOPS	5 PFLOPS
FP32	6,480 TFLOPS	180 TFLOPS
FP64	3,240 TFLOPS	90 TFLOPS
FP64 Tensor Core	3,240 TFLOPS	90 TFLOPS
GPU Memory \| Bandwidth	Up to 13.5 TB HBM3e \| 576 TB/s	Up to 384 GB HBM3e \| 16 TB/s
NVLink Bandwidth	130TB/s	3.6TB/s
CPU Core Count	2,592 Arm® Neoverse V2 cores	72 Arm Neoverse V2 cores
CPU Memory \| Bandwidth	Up to 17 TB LPDDR5X \| Up to 18.4 TB/s	Up to 480GB LPDDR5X \| Up to 512 GB/s

GB200 NVL72

GB200 Grace Blackwell Superchip

Configuration

36 Grace CPU : 72 Blackwell GPUs

1 Grace CPU : 2 Blackwell GPU

FP4 Tensor Core²

1,440 PFLOPS

40 PFLOPS

FP8/FP6 Tensor Core²

720 PFLOPS

20 PFLOPS

INT8 Tensor Core²

720 POPS

20 POPS

FP16/BF16 Tensor Core²

360 PFLOPS

10 PFLOPS

TF32 Tensor Core

180 PFLOPS

5 PFLOPS

FP32

6,480 TFLOPS

180 TFLOPS

FP64

3,240 TFLOPS

90 TFLOPS

FP64 Tensor Core

3,240 TFLOPS

90 TFLOPS

GPU Memory | Bandwidth

Up to 13.5 TB HBM3e | 576 TB/s

Up to 384 GB HBM3e | 16 TB/s

NVLink Bandwidth

130TB/s

3.6TB/s

CPU Core Count

2,592 Arm® Neoverse V2 cores

72 Arm Neoverse V2 cores

CPU Memory | Bandwidth

Up to 17 TB LPDDR5X | Up to 18.4 TB/s

Up to 480GB LPDDR5X | Up to 512 GB/s

1. Preliminary specifications. May be subject to change.
2. With sparsity.

NVIDIA GB200 NVL72