Today’s paper presents Avatar Forcing, a framework designed to generate real-time interactive head avatars capable of engaging in natural conversation. While existing talking head generation models can create lifelike avatars from static portraits, they typically focus on one-way communication, such as synchronizing lip movements with audio, rather than truly interacting with a user. This lack of responsiveness results in avatars that appear disengaged or emotionally flat during live exchanges. The paper addresses these limitations by proposing a system that models causal interactions, allowing an avatar to process and react to a user’s verbal and non-verbal cues with minimal latency.