Ben Dickson (@bdtechtalks): "New study shows LLM-as-a-judge can be fooled by “Master Key” examples: simple token sequences like “Thought process:” or “Let’s solve this problem step by step” or even single tokens such as colon or period. It causes the judge model to classify the response as correct. This aff…"

The app for independent voices

New study shows LLM-as-a-judge can be fooled by “Master Key” examples: simple token sequences like “Thought process:” or “Let’s solve this problem step by step” or even single tokens such as colon or period. It causes the judge model to classify the response as correct. This affects general-purpose LLMs (e.g., GPT-4o, Claude 4, o1, Qwen2.5) and specialized judge models (e.g., OmniJudge, General-Verifier).

How to solve it? Add a small batch of “master examples” to the dataset for training the judge model (the examples can be easily generated by truncating the model’s answers).

Master-RM, a model trained with this method, catches 100% of adversarial examples while keeping 96% of answer accordance with the best general-purpose models.

Jul 21

3:58 PM

The app for independent voices

Log in or sign up