Tony Peng 

I’m Tony Peng, ex-Baidu Global Head of Comms and a former AI reporter; a longtime AI observer with a keen focus on China’s AI development.

A Chinese whistleblower from Meta’s AI team on Llama 4:

After repeated training, the performance of the internal model still fails to reach open-source SOTA levels, and is even far behind them. Company leadership suggested mixing various benchmark test sets into the post-training process, aiming to produce a result that “looks okay” acros…

MY3 ranked No.10? No way.

Who leads in intelligent driving systems.

Log in for more
Or create an account

In this post, I’ll briefly explore DeepSeek’s latest paper, Inference-Time Scaling for Generalist Reward Modeling, published ahead of the rumored release of DeepSeek-R2.

The paper is fascinating: it introduces a new training method for reward model—a key component in reinforcement learning (RL) that scores LLM answers and helps guide them…

👀DeepSeek Reveals New Training Method Ahead of DeepSeek-R2 Release