Site icon aivancity blog

DeepMind unveils two "Robotics" models that boost robot intelligence

What if robots could finally think like us? Google DeepMind has just unveiled two artificial intelligence models, dubbed RT-X and AutoRT, capable of giving robots a much more nuanced understanding of their environment.
These systems, developed through research in multimodal learning, mark a major step forward toward cognitive robotics, where machines no longer simply execute commands, but analyze, learn, and explain their actions.

DeepMind isn't talking here about simple mechanical control models, but about true general-purpose intelligence architectures applied to robotics.

FeaturesRT-XAutoRT
Model typeA Unified Model of Robotic Reasoning and ActionMulti-robot orchestration and autonomy system
Main objectiveUnderstand and follow instructions in natural languagePlan, coordinate, and optimize the operations of multiple robots simultaneously
ApprenticeshipBased on data from more than 30 laboratories and 17 billion parametersContinuous self-learning with autonomous feedback
AppetizersVision, text, verbal interactionData from multiple sensors, vision, and performance feedback
Key SkillsContextual understanding, task transfer, explainable reasoningCoordination, self-correction, robotic fleet management
Speed of adaptationUp to 60% faster than DeepMind's previous systemsReal-time optimization powered by an automated scheduling engine
Areas of applicationDomestic, industrial, and experimental roboticsMulti-agent environments: warehouses, laboratories, hospitals
Battery lifeLanguage-based reasoningAutonomous strategic control under human supervision

This demonstration illustrates RT-X's ability to interpret complex instructions such as "Sort the objects on the table by color" and to adjust its movements autonomously.

In this second video, DeepMind demonstrates how RT-X and AutoRT work together to manipulate objects, avoid obstacles, or coordinate multiple robots within the same workspace.

DeepMind's models rely on a combination of computer vision, spatial reasoning, and natural language processing. Whereas older systems required specific training for each task, RT-X learns to generalize.

By combining images, descriptions, and verbal instructions, it becomes capable of developing a comprehensive action plan and justifying its choices. A robot can thus explain why it chooses a particular route or decides to move one object rather than another.

According to DeepMind, RT-X is based on a multimodal architecture with 17 billion parameters, capable of integrating visual and textual cues to understand the context of an action.

What sets these models apart is their ability to learn from their mistakes.
AutoRT incorporates a self-evaluation mechanism that allows it to correct its actions and improve its performance without constant human supervision.

DeepMind researchers compare this behavior to a formof developmental learning, similar to that of a child discovering the world through trial and error.

This approach leads to robotics that is more autonomous, more adaptive, and capable of operating in unpredictable situations.

The impact of these models goes beyond mere technical performance.
DeepMind designed RT-X and AutoRT as open and collaborative systems: more than 30 international laboratories are participating in their development as part of the Open X-Embodiment project.

This initiative aims to create a shared knowledge base among robots, where every learning experience can be shared.
A robot trained in a Tokyo laboratory could thus instantly benefit from the experience of another robot based in Zurich.

According to DeepMind’s estimates, the combined use of AutoRT and RT-X could increase the speed at which robots adapt to complex environments such as warehouses, hospitals, or homes by 60% 1.

These advances mark a profound shift: the robots of the future will no longer be mere executors, but true reflective agents, capable of reasoning, planning, and explaining their decisions.

An RT-X robot might say, “I moved this object to avoid a collision, or “This surface seems unstable; I’m choosing a different foothold.”
This transparency, rare in robotics, marks a turning point toward explainable and responsible AI that is better integrated into human environments .

The increasing autonomy of robots raises questions:

DeepMind emphasizes the need for constant human oversight and the development of a robust safety framework. The RT-X and AutoRT models can only operate within defined and validated contexts.
The company explicitly rules out any military or surveillance use.

However, several researchers are calling for the establishment of international regulations for cognitive robotics, in order to provide a framework for these emerging technologies and ensure their ethical use2.

With RT-X and AutoRT, DeepMind is bringing artificial intelligence a step closer to human cognition. These models pave the way for truly adaptive robots capable of understanding language, interacting naturally, and learning from their environment.

This convergence of perception, language, and action could transform robotics over the next decade: from industry to healthcare, from logistics to space exploration, robots are becoming thinking partners.

You can also read the article Gemini Gives Astronomers a New Eye: AI Detects the Mysteries of the Night Sky, which explores another application of general-purpose artificial intelligence in the scientific field.

1. DeepMind. (2025). Introducing RT-X and AutoRT: Toward General-Purpose Robots.
https://deepmind.google

2. European Robotics Forum. (2024). Ethics and Regulation of Cognitive Robotics.
https://roboticsforum.eu

Exit mobile version