Meng Li (@mengyoupanshan): "NVIDIA has recently unveiled its latest open-source foundation model, "NitroGen." Officially introduced as a unified vision-to-action model, NitroGen is capable of playing games directly from raw frames. It takes video game frames as input and outputs corresponding gamepad ac…"

The app for independent voices

NVIDIA has recently unveiled its latest open-source foundation model, "NitroGen."

Officially introduced as a unified vision-to-action model, NitroGen is capable of playing games directly from raw frames. It takes video game frames as input and outputs corresponding gamepad actions.

Notably, NitroGen supports post-training, meaning the model can quickly generalize to new games with only lightweight fine-tuning or adaptation.

According to NVIDIA, unlike models trained with reinforcement learning, NitroGen is trained through large-scale imitation learning on human gameplay videos.

It is reported that NitroGen uses an inverse dynamics model to "infer" player inputs from 40,000 hours of publicly available internet videos, synthesizing vast amounts of training data to achieve pure imitation learning.

Of course, the team has also pointed out the model’s limitations:

NitroGen performs best in games designed for gamepads (e.g., action, platform, and racing games) but struggles in games heavily reliant on mouse and keyboard (e.g., real-time strategy and multiplayer online battle arena games).

The team stated that NitroGen aims to explore whether large-scale training on diverse human gameplay behaviors can give rise to emergent general embodied capabilities, similar to how scaling unlocks emergent behaviors in large language models.

AI Disruption

OpenAI GPT-5.2-Codex Debuts

Dec 22

10:34 AM

The app for independent voices

Log in or sign up