Tony Peng

I’m Tony Peng, ex-Baidu Global Head of Comms and a former AI reporter; a longtime AI observer with a keen focus on China’s AI development.

1.7K+ subscribers

Tony Peng

13h

A Chinese whistleblower from Meta’s AI team on Llama 4:

After repeated training, the performance of the internal model still fails to reach open-source SOTA levels, and is even far behind them. Company leadership suggested mixing various benchmark test sets into the post-training process, aiming to produce a result that “looks okay” acros…

Tony Peng

13h

MY3 ranked No.10? No way.

The Great Wall Street

Who leads in intelligent driving systems.

Or create an account

Tony Peng

14h

In this post, I’ll briefly explore DeepSeek’s latest paper, Inference-Time Scaling for Generalist Reward Modeling, published ahead of the rumored release of DeepSeek-R2.

The paper is fascinating: it introduces a new training method for reward model—a key component in reinforcement learning (RL) that scores LLM answers and helps guide them…

Recode China AI

👀DeepSeek Reveals New Training Method Ahead of DeepSeek-R2 Release