akashbajwa.substack.com…
Pre-training is definitely changing. MoE, Matrix of Experts, aligns verticals with the long term trends.
Agentic AI’s - spread query with MoE, quantity of many in a few area’s, then “less of the pre-training” LLM has be available. This downsizes enormous LLM’s into smaller active agentic tasks. Consider 800M learning points, with < 50M in use. A truly dramatic downsizing.
DeepSeek showed this architectural approach.
OpenAI showed the power of time, instead of blurting the answer quickly, reflect for 20% of the “tokens” expended, and consider the alternatives one more time.
Quality of reflection orders of magnitude better than speed and brute performance.
AI is progressing, engage and understand. It is a force multiplier today.
Training, inference, reasoning, engaging.
Model size doubles every 5 months
#AI #DataCenters #HyperscaleDataCenter