Deep Magic: The 72-core (16 CPU, 40 GPU, 16 NPU) Apple M3Max chip is perhaps 720 mm^2, contains 92 billion transistors, and is made on TSMC’s 3 nm N3B process. The 132-SM Nvidia H100 chip is 814 mm^2, contains 80 billion transistors, and is made on TSMC’s slightly less demanding 4 nm N4 process. Apple charges laptop buyers about $1500 for the M3Max—but that includes something like a 200% markup over manufacturing, so figure that the manufacturing variable cost of an M3Max is $500. And as a chip more complex in its architecture, larger in its transistor count, and made on a more demanding process, TSMC ought to be charging more for a marginal M3Max than for a marginal H100. That the manufacturing cost of a marginal H100 is 1% of its current market cost is really not sustainable—and it makes me wonder who the people are who have money to burn but not programmer expertise to shift model-training off to a slower but much cheaper chip. I mean, in the end quantity has a quality all its own. And things do not have to use CUDA libraries:

Austin Carr: Nvidia’s AI Chip Stands Out as Tech’s Hottest Product: ‘Nvidia Corp.’s <bloomberg.com/news/arti… artificial intelligence accelerator…. With its 80 billion transistors, the H100 is the go-to workhorse for training… large language models…. A single H100 now lists for $57,000 on hardware vendor CDW’s online shop—and data centers are filled with thousands of them. When Nvidia Chief Executive Officer Jensen Huang delivered the company’s first AI server with an older generation of graphic processing units to OpenAI in 2016 <bloomberg.com/news/feat…, few could’ve predicted the role these kinds of chips would play…. Nvidia’s graphics cards were then synonymous with video games, not machine learning…  <bloomberg.com/news/news…>

4:20 PM
Dec 29