Naama Rozen

Naama Rozen
Naama Rozen's avatar
AI safety and alignment researcher studying how alignment methods behave inside large language models and in observable behavior. I develop empirical audits to test when safety mechanisms generalize and when they fail.