NVIDIA Cosmos: Power Robots Now
Open-source vision-language models for robotics, autonomous vehicles, and physical AI
2 Dec 2025 (Updated 28 Dec 2025) - Written by Lorenzo Pellegrini
Lorenzo Pellegrini
2 Dec 2025 (Updated 28 Dec 2025)
NVIDIA unveils open AI models for autonomous driving and robotics
NVIDIA has taken a bold leap forward in the world of artificial intelligence by releasing a suite of open-source AI models and infrastructure designed specifically for "physical AI", a new frontier that bridges digital intelligence with real-world robotics and autonomous vehicles. At the heart of this initiative is the Cosmos-Reason model, a powerful evolution of NVIDIA’s earlier releases, now tailored to enable robots and self-driving cars to reason, plan, and interact with their environments in ways that mimic human cognition.
Introducing Cosmos-Reason: The brain behind physical AI
Cosmos-Reason is a 7-billion-parameter vision-language model (VLM) that stands out for its ability to understand space, time, and fundamental physics. Unlike traditional AI models, Cosmos-Reason is post-trained with physical common sense and embodied reasoning data, allowing it to navigate the complexities of the real world with spatial-temporal awareness. This makes it especially valuable for robotics and autonomous vehicles, where understanding context and predicting outcomes are critical.
The model excels at:
- Automating high-quality data curation and annotation for diverse training datasets.
- Acting as the "brain" for robot planning and reasoning, enabling deliberate, methodical decision-making.
- Interpreting environments and breaking down complex commands into actionable tasks, even in unfamiliar scenarios.
How Cosmos-Reason powers autonomous driving
For autonomous vehicles, Cosmos-Reason serves as the inference engine behind advanced vision-language action (VLA) models. It processes multi-camera and multi-temporal observation frames, along with high-level language inputs such as navigation instructions, to predict safe and compliant driving actions. The model’s architecture is built on causal structured reasoning, meaning it must justify its decisions based on observable evidence, ensuring safety and reliability.
The training process for Cosmos-Reason involves two key stages:
- Modality injection on large-scale driving data to learn the basic mapping from vision to action.
- Supervised fine-tuning on causation chain data, teaching the model to "think clearly before driving" by reasoning through the sequence of events.
Expanding the ecosystem: cosmos world foundation models
Beyond Cosmos-Reason, NVIDIA has introduced the broader Cosmos World Foundation Models (WFMs), which are open, customizable, and multimodal. These models can simulate, reason, and generate data for downstream pipelines in robotics, autonomous vehicles, and industrial vision systems. Key capabilities include:
- Generating up to 30 seconds of continuous video from multimodal inputs, enabling advanced forecasting and scenario planning.
- Scaling simulations across various environments and lighting conditions for rapid, controllable data augmentation.
- Accelerating 3D inputs from physical AI simulation frameworks like CARLA and NVIDIA Isaac Sim™.
Real-world applications and Impact
With Cosmos-Reason and the Cosmos WFMs, developers can now create more intelligent, responsive, and safe AI agents for a wide range of applications:
- Robots that can interpret environments and execute complex tasks using common sense.
- Autonomous vehicles that can safely navigate diverse weather, lighting, and geolocation conditions.
- Industrial and urban systems that leverage real-time video analytics for automation, safety, and operational efficiency.
These models are not just theoretical, they are already being used to power next-generation autonomous driving systems, such as Alpamayo-R1, which is built directly on the Cosmos-Reason foundation.
Conclusion
NVIDIA’s release of open AI models for physical AI marks a significant milestone in the evolution of robotics and autonomous driving. By making these powerful tools available to developers worldwide, NVIDIA is accelerating innovation and paving the way for smarter, safer, and more capable machines that can truly understand and interact with the physical world.
