Austin (@databricksters): "TFLOPS flopping? Keep an eye on what your hardware is actually optimized for and what you’re trying to accomplish. Many ML workloads and smaller NNs run in fp32 while larger deep learning training (vision, speech, LLMs, etc.) tend to favor fp16/bf16. Latency and/or cost sensitiv…"

The app for independent voices

TFLOPS flopping? Keep an eye on what your hardware is actually optimized for and what you’re trying to accomplish. Many ML workloads and smaller NNs run in fp32 while larger deep learning training (vision, speech, LLMs, etc.) tend to favor fp16/bf16. Latency and/or cost sensitive workloads may run int8/fp8 if VRAM becomes a bottleneck.

Generally speaking, Hopper (H100/200) dominates in max mixed‑precision AI and large scale training workloads. Lovelace (L4/L40s) is strong and cost‑effective for many AI, graphics, and inference workloads. Ampere (A10/100) are good all‑rounders for VRAM‑heavy workloads if the pricing is favorable.

Legacy workloads may have been previously optimized for Turing (T4) or Volta (V100), but net new workloads should look at Ampere or newer.

Feb 5

7:59 PM

The app for independent voices

Log in or sign up