NVIDIA has released its official MLPerf Inference v3.1 performance benchmarks running on the world's fastest AI GPUs such as Hopper H100, GH200 & L4.
NVIDIA Dominates The AI Landscape With Hopper & Ada Lovelace GPUs, Strong Performance Showcased In MLPerf v3.1
Today, NVIDIA is releasing its first performance benchmarks within the MLPerf Inference v3.1 benchmark suite which covers a wide range of industry-standard benchmarks for AI use cases. These workloads range from Recommender, Natural Language Processing, Large Language Model, Speech Recognition, Image Classification, Medical Imaging, and Object Detection.
The two new sets of benchmarks include DLRM-DCNv2 and GPT-J 6B. The first is a larger multi-hot dataset representation of real recommenders which uses a new cross-layer algorithm to deliver better recommendations and has twice the parameter count versus the previous version. GPT-J on the other other is a small-scale LLM that has a base model that's open source and was released in 2021. This workload is designed for summarization tasks.
NVIDIA also showcases a conceptual real-life workload pipeline of an application that utilizes a range of AI models to achieve a required query or task. All of the models will be available on the NGC platform.
In terms of performance benchmarks, the NVIDIA H100 was tested across the entire MLPerf v3.1 Inference set (Offline) against competitors from Intel (HabanaLabs), Qualcomm (Cloud AI 100) and Google (TPUv5e). NVIDIA delivered leadership performance across all workloads.
To make things a little more interesting, the company states that these benchmarks were achieved about a month ago since MLPerf requires at least 1 month between the submission time for the final results to be published. Since then, NVIDIA has come up with a new technology known as TensorRT-LLM which further boosts performance by up to 8x as we detailed here. We can expect NVIDIA to submit the MLPerf benchmarks with TensorRT-LLM soon too.
But coming back to the benchmarks, NVIDIA's GH200 Grace Hopper Superchip also made its first submission on MLPerf, yielding a 17% improvement over the H100 GPU. This performance gain is mainly coming from higher VRAM capacities (96 GB HBM3 vs. 80 GB HBM3) and 4TB/s bandwidth.
The Hopper GH200 GPU utilizes the same core configuration as the H100 but one key area that's assisting in the boosted performance is the automatic power steering between the Grace CPU and the Hopper GPU. Since the Superchip platform includes power delivery for both the CPU and GPU on the same board, customers can essentially switch the power from the CPU to the GPU and vice versa in any particular workload. This extra juice on the GPU can make the chip clock faster and run faster. NVIDIA also mentioned that the Superchip here was running the 1000W configuration.
In its debut on the MLPerf industry benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all data center inference tests, extending the leading performance of NVIDIA H100 Tensor Core GPUs. The overall results showed the exceptional performance and versatility of the NVIDIA AI platform from the cloud to the network’s edge.
The GH200 links a Hopper GPU with a Grace CPU in one superchip. The combination provides more memory, bandwidth and an ability to automatically shift power between the CPU and GPU to optimize performance. Separately, H100 systems that pack eight H100 GPUs delivered the highest throughput on every MLPerf inference test in this round.
Grace Hopper Superchips and H100 GPUs led across all MLPerf’s data center tests, including inference for computer vision, speech recognition and medical imaging, in addition to the more demanding use cases of recommendation systems and the large language models (LLMs) used in generative AI. Overall, the results continue NVIDIA’s record of demonstrating performance leadership in AI training and inference in every round since the launch of the MLPerf benchmarks in 2018.
via NVIDIA
The NVIDIA L4 GPU which is based on the Ada Lovelace GPU architecture also made a strong entry in MLPerf v3.1. It was not only able to run all workloads but did so very efficiently, running up to 6x faster than modern x86 CPUs (Intel 8380 Dual-Socket) at a 72W TDP in an FHFL form factor. The L4 GPU also offered a 120x increase in Video/AI tasks such as Decoding, Inferencing, Encoding. Lastly, the NVIDIA Jetson Orion got an up to 84% performance boost thanks to software updates & shows NVIDIA's commitment to improving the software stack to the next level.
WccftechContinue reading/original-link]