Intel launched the November 2023 upgrade to its MLPerf Training 3.1 outcomes and provided a 103% efficiency boost compared to their forecast of 90% back in June. There are just 3 accelerators that are presently sending GPT-3 outcomes on MLPerf today: Intel, NVIDIA and Google - making Intel's Gaudi 2 presently the only feasible option to NVIDIA's GPUs (is that even the right term any longer?) for MLPerf AI work.
Intel showcases competitive price/performance to NVIDIA's cutting edge Hopper chips in most current MLPerf 3.1
Intel was likewise fast to point out that Xeon is the only CPU sending training outcomes on the MLPerf Benchmark. With no more ado here are the slides provided:
As you can see, Intel's Gaudi group at first predicted a 90%efficiency gain in FP8-however had the ability to provide a 103 %gain in GPT-3 market criteria, reducing their time to train in minutes(throughout 384 accelerators)from 311.94 minutes or 5.2 hours down to simply over 2 hours or 153.58 minutes. Intel likewise provided numerous slides to assist in TCO(overall expense of ownership)based choice making showcasing that the Gaudi 2 chip provides comparable efficiency to the NVIDIA H100 while having a lower server expense-making it competitive in price/performance. On GPTJ-99, Gaudi 2 shines a lot more- can be found in simply a little behind NVIDIA's brand-new Hopper chips. While the conversation back in June had to do with Gaudi 2 simply being a practical option to NVIDIA's chips and considerably behind H100(just trading blows with the older A100 design), now the Gaudi 2 chip is simply somewhat behind the H100 and GH200-96G setups. The H100 is simply 9 % faster while GH200-96G is simply 12%faster than Gaudi 2 in Server throughput criteria. This lead reaches 28% in offline criteria. Gaudi 2 exceeded A100 by near 2x in both cases.
Lastly, Intel likewise mentioned that the Xeon is the only CPU that is presently sending MLPerf criteria and highlighted on its dedication to AI work. 2 of 9 About the Intel Gaudi2 Results: Gaudi2 continues to be the only practical option to NVIDIA's H100 for AI calculate requirements, providing considerable price-performance. MLPerf results for Gaudi2 showed the AI accelerator's increasing training efficiency: Gaudi2 showed a 2x efficiency leap with the application of the FP8 information type on the v3.1 training GPT-3 standard, lowering time-to-train by majority compared to the June MLPerf standard, finishing the training in 153.58 minutes on 384 Intel Gaudi2 accelerators. The Gaudi2 accelerator supports FP8 in both E5M2 and E4M3 formats, with the alternative of postponed scaling when essential. Intel Gaudi2 showed training on the Stable Diffusion multi-modal design with 64 accelerators in 20.2 minutes, utilizing BF16. In future MLPerf training criteria, Stable Diffusion efficiency will be sent on the FP8 information type. On 8 Intel Gaudi2 accelerators, benchmark outcomes were 13.27 and 15.92 minutes for BERT and ResNet-50, respectively, utilizing BF16. About the 4th Gen Xeon Results:
Intel stays the only CPU supplier to send
MLPerf outcomes. The MLPerf results for 4th Gen Xeon highlighted its strong efficiency: Intel sent outcomes for RESNet50, RetinaNet, BERT and DLRM dcnv2. The 4th Gen Intel Xeon scalable processors'outcomes for ResNet50, RetinaNet and BERT resembled the strong out-of-box efficiency outcomes sent for the June 2023 MLPerf standard. DLRM dcnv2 is a brand-new design from June's submission, with the CPU showing a time-to-train submission of 227 minutes utilizing just 4 nodes.