NVIDIA Unveils Cosmos 3, an AI Designed to Understand the Real World

Artificial intelligence is now capable of generating text, creating images, producing videos, and even writing code. However, one limitation remains: understanding the physical world. A model can describe a car, recognize a pedestrian, or identify an obstacle, but understanding how objects interact in space and anticipating their movements remains a major challenge. This is precisely the problem NVIDIA is seeking to solve with Cosmos 3.

Unveiled at GTC Taipei 2026 alongside the Isaac GROOT humanoid robot, Cosmos 3 marks a new milestone in the development of what NVIDIA calls “physical AI.” Unlike traditional generative models, this technology isn’t just aimed at understanding digital content. Its goal is to help robots, autonomous vehicles, and intelligent systems better interpret, anticipate, and interact with the real world.

For NVIDIA, this capability could significantly accelerate the development of robotics, autonomous vehicles, and future physical agents powered by artificial intelligence.

AI must now understand the physical world

For several years now, advances in AI have been based primarily on the understanding of language, images, and digital data. However, machines continue to face challenges when it comes to interacting with real-world environments.

A robot that needs to grasp an object, avoid an obstacle, or navigate a complex environment must understand much more than just the appearance of a scene. It must be able to anticipate the consequences of its actions, evaluate possible movements, and reason about physical interactions.

This issue is becoming particularly important as global investment in robotics is projected to exceed $260 billion by^2030.¹ Manufacturers are now seeking models capable of bridging the gap between digital perception and physical understanding.

It was against this backdrop that Cosmos 3 was designed.

A model designed for robots and autonomous vehicles

NVIDIA is introducing Cosmos 3 as the first fully open “omnimodel” dedicated to physical AI. The system was developed to serve as the foundation for a new generation of intelligent machines capable of interacting with their environment.

The company already offers two versions of the model. The Super version is designed for applications requiring high physical accuracy, particularly in industrial robotics and autonomous driving. A Nano version is also available for applications requiring faster response times and lower computing costs.

NVIDIA also announced the upcoming release of an Edge version designed to run directly on local devices. This approach addresses a major challenge facing the industry: enabling autonomous systems to make decisions without always relying on a cloud connection.

This strategy shows that NVIDIA is not only seeking to develop a high-performance model, but also to build a true physical AI ecosystem capable of adapting to different levels of infrastructure.

A massive database for learning about reality

One of the most impressive aspects of Cosmos 3 is the data used to train it.

According to NVIDIA, the model was trained on nearly 20,000 billion tokens². This dataset includes:

nearly one billion images;
approximately 400 million real and synthetic videos;
ambient audio data;
text content;
traces of actions carried out by humans and robots.

This diversity allows the model to learn not only to recognize objects or situations, but also to understand the actions associated with these environments.

Unlike a traditional video generator, which focuses primarily on the visual appearance of a scene, Cosmos 3 seeks to model what is actually happening in the physical world.

According to Ming-Yu Liu, vice president of Cosmos Lab at NVIDIA, the goal is to learn the movements, interactions, and behaviors that characterize real-world environments².

From Perception to Action

The true innovation of Cosmos 3 lies in its ability to incorporate the concept of action.

To a human, watching someone open a door, move an object, or climb a staircase seems natural. To a machine, these actions represent a complex combination of movements, physical constraints, and sequential decisions.

Cosmos 3 specifically seeks to capture this aspect.

The model can generate extremely detailed action data, including:

movement paths;
the positions of robotic effectors;
joint angles;
mechanical arm movements;
the steps required to complete a task.

This information is essential for training robots to interact effectively with their environment.

This approach gradually brings artificial intelligence systems closer to human physical reasoning, a capability considered essential for the emergence of truly autonomous agents.

Simulating the impossible to train machines more effectively

One of the most promising use cases involves generating rare or dangerous scenarios.

In the real world, it is often difficult, costly, or risky to replicate certain situations needed to train autonomous systems. Vehicle collisions, industrial accidents, and mechanical failures are rare events, but they are essential for developing robust systems.

Cosmos 3 allows users to virtually generate these types of scenarios to enrich training data.

This approach offers several advantages:

reduction in physical testing costs;
improving the safety of experiments;
shorter development cycles;
an increase in the variety of simulated scenarios.

NVIDIA even claims that certain training phases that might previously have taken several months can now be completed in just a few days².

● RS6787 Certification

Executive Training

AI & Data Science
s for Managers

Integrate AI into your business strategy. A 360° approach—Technology, Business, and Ethics—designed for decision-makers. Prerequisites: 5 years of managerial experience.

3 days Eligible for CPF funding — €1,800 (excluding tax) Paris-Villejuif & Nice

Learn more about the program →

An open model to accelerate the ecosystem

Like the Nemotron family, Cosmos 3 adopts an open strategy. NVIDIA wants to enable developers, researchers, and industry professionals to adapt the model to their own needs.

This openness contrasts with the trend observed among several major players in the sector, who favor more closed models.

The goal is to foster the emergence of an ecosystem capable of accelerating innovation in robotics, autonomous mobility, and smart systems.

Among the first partners announced are Agile Robots, Black Forest Labs, and Runway, demonstrating that NVIDIA is seeking to build a broad network around this new platform².

Toward an AI Capable of Understanding Reality

Cosmos 3 illustrates a profound evolution in artificial intelligence. After learning to understand language, images, and digital data, models are now seeking to develop a more nuanced understanding of the physical laws that govern the real world.

This development could have major implications for robotics, autonomous mobility, industry, and future agent-based AI systems.

The challenge is no longer simply to create models capable of answering questions or generating content. It is now about building systems capable of interacting with their environment in a reliable, predictable, and autonomous manner.

With Cosmos 3, NVIDIA is not merely seeking to improve artificial intelligence. The company is attempting to bring machines closer to understanding the physical world—a challenge that remains one of the greatest frontiers in AI today.

Cosmos 3 is based on a multimodal artificial intelligence architecture designed to understand physical environments and model interactions between objects, humans, and machines. Developed by NVIDIA, this model belongs to a new category of AI called “physical AI,” whose goal is no longer limited to processing text, images, or videos, but also includes understanding actions that take place in the real world.

Unlike traditional generative models, which focus primarily on digital content, Cosmos 3 seeks to represent the physical laws, movements, and behaviors observed in real-world environments. The system analyzes various types of multimodal data, including images, videos, text, sounds, and records of human or robotic actions.

Based on this information, the model learns to identify not only what is present in a scene, but also what is happening there, what movements are being made, what interactions are taking place, and what consequences may result from certain actions. This capability allows it to generate realistic physical simulations and produce data that can be used to train robots, autonomous vehicles, or other intelligent systems.

Key Features of Cosmos 3

Advanced Physical Understanding: Analysis of Interactions Between Objects, Humans, and Machines
Multimodal model: simultaneous processing of text, images, videos, audio, and actions
Simulation Generation: Creating Realistic Physical Environments for AI Training
Motion Modeling: Understanding Trajectories, Displacements, and Dynamic Behaviors
Actionable Data Generation: Producing Actionable Information for Robotics and Automation
Open architecture: flexibility and customization for specific industrial applications
Optimization for Physical AI: Accelerated Development of Autonomous Robots and Smart Vehicles

Technical constraints and limitations

Significant computing power requirements for training and inference
Dependence on the quality and diversity of the physical data used
Difficulty in perfectly replicating certain complex real-world situations
The Need for Validation in Real-World Physical Environments Following Simulation
Risks Associated with Bias in Training Data
Current limitations in understanding highly unpredictable or unprecedented situations

From a technological standpoint, Cosmos 3 illustrates the evolution of artificial intelligence toward a deeper understanding of the physical world. Models no longer seek merely to generate content or answer questions; they are gradually learning to interpret the mechanisms that govern real-world interactions between objects and individuals.

This approach is part of the growing prominence of agent-based AI and intelligent robotics. The goal is to enable autonomous systems to make more reliable decisions by drawing on a better understanding of the environments in which they operate. These advances directly impact fields such as robotics, autonomous mobility, industrial data management, data engineering, and advanced simulation.

Key takeaway: Cosmos 3 transforms artificial intelligence into a system capable of understanding and simulating the physical world, paving the way for a new generation of robots, autonomous vehicles, and intelligent agents capable of interacting more effectively with their environment.

Learn more

The development of models capable of understanding the physical world is a key step in the evolution of artificial intelligence, particularly for robotics, autonomous vehicles, and simulated environments. On a related topic, check out our article “DINOv3 by Meta: Self-Supervision for Precise Visual Analysis”, which examines how advances in computer vision enable AI systems to better interpret their environment and interact with complex real-world situations.

References

1. MarketsandMarkets. (2025). Global Robotics Market Forecast.
https://www.marketsandmarkets.com

2. NVIDIA. (2026). Cosmos 3 Technical Presentation, GTC Taipei 2026.
https://www.nvidia.com

NVIDIA Unveils Cosmos 3, an AI Designed to Understand the Real World

AI must now understand the physical world

A model designed for robots and autonomous vehicles

A massive database for learning about reality

From Perception to Action

Simulating the impossible to train machines more effectively

AI & Data Science
s for Managers

An open model to accelerate the ecosystem

Toward an AI Capable of Understanding Reality

How does Cosmos 3 work?

Learn more

References

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

Leave a comment Cancel reply

About aivancity

Blog

Contact us

NVIDIA Unveils Cosmos 3, an AI Designed to Understand the Real World

AI must now understand the physical world

A model designed for robots and autonomous vehicles

A massive database for learning about reality

From Perception to Action

Simulating the impossible to train machines more effectively

AI & Data Sciences for Managers

An open model to accelerate the ecosystem

Toward an AI Capable of Understanding Reality

How does Cosmos 3 work?

Learn more

References

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

Related posts

Microsoft Launches Agent 365: The Platform That Monitors AI Agents for You

Alibaba Unveils Qwen3.7-Max, an AI Capable of Orchestrating Autonomous Agents

Google is stepping up its game with Gemini 3.5 Flash, an AI capable of reasoning and acting on its own

Leave a comment Cancel reply

About aivancity

Blog

Contact us

AI & Data Science
s for Managers