FPT GPU Cloud Benchmark: Performance Comparison of GPUs for AI & Machine Learning

FPT GPU Cloud Benchmark: Performance Comparison of GPUs for AI & Machine Learning

Author: Vũ Tuấn Kiệt
14:09 25/03/2025

Benchmarking is crucial for evaluating GPU performance in AI and machine learning. This study measures training speed and scalability across various GPU types to help users choose the best fit for their workloads.

Beyond internal assessments, we compare FPT GPU Cloud's performance with similar vendors to highlight key advantages in processing power, memory bandwidth, and scalability. These insights help customers select the most efficient GPU cloud services for their AI needs.

Check out the FPT AI Factory’s folk of the Optimum Habana trainer code. These H100 benchmarks may be reproduced by following the provided instructions in that repository.

The following benchmarks utilize Habana's Optimum Habana v1.7 trainer code to evaluate the performance of the NVIDIA HGX H100 and HGX H200 against similar vendors.

Result H100 (samples per second) – FPT’s Metal Cloud, K8S, DGX, VM. Batch size 54

Model 1 GPU 2 GPUs 3 GPUs 4 GPUs 6 GPUs 8 GPUs
Similar Vendor’s H100 80GB SXM 142.3 275 400.6 521.8 740.3 962.2
Compared to 1 GPU (times faster) 1.93 2.82 3.67 5.20 6.76
Metal Cloud – Bare Metal H100 80GB SXM 144.2 283.4 418.9 550.7 799.4 1056.3
Compared to Similar Vendor 101% 103% 105% 106% 108% 110%
Compared to 1 GPU (times faster) 1.97 2.91 3.82 5.54 7.33
FPT K8S H100 80GB SXM 143.8 282.4 417.0 546.7 792.8 1046.5
Compared to Similar Vendor 101% 103% 104% 105% 107% 109%
Compared to 1 GPU (times faster) 1.96 2.90 3.80 5.51 7.28
DGX H100 80GB SXM 143.8 282.2 417.2 547.7 793.4 1047.0
Compared to Similar Vendor 101% 103% 104% 105% 107% 109%
Compared to 1 GPU (times faster) 1.96 2.90 3.81 5.52 7.28
FPT VM H100 80GB SXM (no nvlink) 143.0 261.7 376.6 459.5
Compared to Similar Vendor 101% 95% 94% 88%
Compared to 1 GPU (times faster) 1.83 2.63 3.21
Compared to Metal Cloud 99% 92% 90% 83%

Result H200 (samples per second) – FPT’s Metal Cloud, multiple batch sizes: 54, 95, 110

Model 1 GPU 2 GPUs 3 GPUs 4 GPUs 6 GPUs 8 GPUs
Metal Cloud – Bare Metal H200 141GB SXM (bz54) 158.8 312.4 460.7 600.9 881.4 1165.1
Compared to Similar Vendor’s H100 112% 114% 115% 115% 119% 121%
Compared to Metal Cloud H100 110% 110% 110% 109% 110% 110%
Compared to Similar Vendor's Baremetal H200 101% 101% 102% 101% 104% 105%
Compared to 1 GPU (times faster) 1.84 2.71 3.53 5.18 6.85
Metal Cloud – Bare Metal H200 141GB SXM (bz95) 169.4 332.9 489.2 649.7 917.4 1238.1
Compared to Similar Vendor’s H100 119% 121% 122% 125% 124% 129%
Compared to Metal Cloud H100 117% 117% 117% 118% 115% 117%
Compared to Similar Vendor’s Baremetal H200 107% 108% 108% 110% 108% 112%
Compared to 1 GPU (times faster) 1.96 2.87 3.82 5.39 7.28
Metal Cloud – Bare Metal H200 141GB SXM (bz110) 173.9 341.4 505.8 651.0 973.7 1190.0
Compared to Similar Vendor’s H100 122% 124% 126% 125% 132% 124%
Compared to Metal Cloud H100 121% 120% 121% 118% 122% 113%
Compared to Similar Vendor’s Baremetal H200 110% 111% 112% 110% 115% 107%
Compared to 1 GPU (times faster) 2.01 2.97 3.83 5.72 6.99

 

FPT AI Factory optimizes GPU performance through advanced infrastructure and software enhancements.

  • Metal Cloud delivers the highest performance across all GPU configurations, outperforming similar vendor benchmarks, with the performance gap increasing as more GPUs are added (up to 110% at 8 GPUs).
  • FPT K8S performs slightly lower due to additional overhead but remains competitive.
  • FPT VM (without NVLink) shows lower performance, especially with multiple GPUs, reinforcing NVLink’s role in scaling efficiency.
  • Across all models, performance scaling is sublinear, with diminishing returns as the number of GPUs increases, though Metal Cloud scales the best (7.33× at 8 GPUs). Meanwhile, the HGX H200, with its larger VRAM (141GB vs. 80GB) and higher memory bandwidth (4.8TB/s vs. 3.35TB/s), enables larger batch sizes and achieves up to 18% better performance than the H100 at maximum batch size.

Learn more about FPT AI Factory's services HERE.

For more information and consultancy about FPT AI Factory, please contact: