The leak of Nano Banana 2 reveals Google’s next major breakthrough in image generation. This new version, which follows an initial viral model released in 2024, promises a visual experience of unprecedented realism. Also known as GEMPIX 2, this technology is based on Gemini 3 Pro Image, the most accurate visual model ever developed by Google.
According to initial reports from internal testing, Nano Banana 2 will be integrated directly into the Gemini app on mobile and desktop, without requiring any additional downloads. Access will be available via the “Images” tab in the Gemini chatbot, which is currently in private beta in the United States and Canada. Google plans a phased global rollout starting in January 2026, with Europe and France included in the second wave of deployment.
The tool is expected to be available for free in its basic version (standard image generation) and included in the Gemini Advanced subscription ($19.99/month) for 4K renders and experimental 4K video features. A dedicated web interface, nanobanana2, is expected to go live upon the official release, according to the technical documents reviewed.
Google’s goal is clear: to make 4K image and video creation accessible to everyone, directly from a smartphone, and to bring artificial intelligence closer to professional-grade photography. According to a Statista study (2025), the global market for AI-generated images is expected to reach $7.9 billion by 2027, driven by growing demand from digital creators and the advertising sector1.
Image clarity that takes it to the next level
The first Nano Banana made a big impression by generating 3D portraits that were shared by millions on social media. With over 10 million users in three weeks, its success was even praised by Jensen Huang, CEO of Nvidia, who called it an exemplary example of “creative madness”2. The second generation goes even further: images can now be produced in 2K and upscaled to 4K, with perfectly legible embedded text. This long-awaited advancement paves the way for creating advertising mockups, marketing visuals, or press illustrations directly from a mobile app, without typographic degradation or background blur. According to Benchmark AI Labs (2025), the accuracy of text embedded in generated images now reaches 97%, up from 68% the previous year—a record for a model available to the general public3.
A contextual and coherent model
One of Nano Banana 2’s major innovations lies in its understanding of context. Thanks to the multimodal capabilities of Gemini 3 Pro Image, the model incorporates the cultural and geographical dimensions of user queries. Asking for a “sunset on a beach in Bali” will no longer result in a generic image, but rather in settings, outfits, and lighting that are true to the local reality. This contextual understanding is accompanied by improved visual consistency: faces, clothing, and objects now retain their shape and style from one image to the next. This level of stability, still rare in AI image generation, represents a decisive step toward aesthetic and narrative reliability. A study by Adobe Research (2025) estimates that 84% of visual generator users consider the consistency of elements to be the primary criterion for trust in a model4.
Unprecedented processing speed
The performance gain is also impressive. Whereas the first model took up to thirty seconds to generate a rendering, Nano Banana 2 produces an image in less than ten seconds—a 67% improvement in average processing speed. This technological leap puts Google on par with today’s top performers, such as Midjourney, Firefly, and Runway. This responsiveness finally makes real-time use feasible for mobile apps, collaborative creative interfaces, or immersive environments incorporating dynamic visual elements. According to the AI Image Report 2025, nearly 42% of visual model users say that speed of execution is now just as important as the quality of the final rendering5.
From Images to Video: Toward an AI Production Studio
Beyond still images, Google aims to transform Nano Banana 2 into a true multimodal creative studio. The model would be capable of generating 4K video sequences from a simple text command, paving the way for the automated production of advertising clips, YouTube intros, or artistic content without the need for professional software. This feat relies on an “image-to-image” architecture capable of merging multiple visuals to create new ones. Transitions become fluid, framing looks realistic, and movements appear as if captured by a real camera. Integration with Google Photos or Workspace is already being considered, making the Gemini ecosystem a comprehensive creation platform. According to the AI Video Market Outlook 2025 report, the production of AI-generated videos has seen annual growth of 128% since 2023 and could represent a $12.4 billion market by 20286.
An ethical and responsible approach to realism
This race for visual realism, however, raises crucial ethical issues. The ability of AI to produce images or videos that are indistinguishable from reality raises questions about traceability and digital trust. Google states that it is working on invisible watermarking systems and standardized metadata to ensure the authenticity of generated content. These measures are part of broader efforts on responsible AI and the fight against misleading content, particularly in media and political contexts7. According to a study by the Center for Humane Technology (2025), 68% of European users say they are concerned about the difficulty of distinguishing a real image from an AI-generated one, a figure up 14 points from 20238. Rigorous governance of these technologies therefore appears essential to prevent their misuse for the purposes of disinformation or manipulation.
Toward a Redefinition of Visual Language
With Nano Banana 2, Google has crossed a symbolic threshold: the point where visual creation becomes a direct dialogue between the human imagination and the machine. The images generated are no longer mere approximations but credible representations, capable of evoking emotion and convincing the viewer. This development challenges our relationship with visual truth and calls for new critical skills in education, communication, and AI research. According to the Global Creative AI Index (2025), 71% of creative professionals already believe that mastery of generative tools will become an essential skill by 20269. More than just a tool, this generation marks the advent of a universal language shared between humans and algorithms.
Learn more
As part of our ongoing exploration of hyperrealism and AI-assisted creativity, check out our article:Meta x Midjourney: A Strategic Partnership to Revolutionize AI-Generated Images and Video
References
1. Statista. (2025). AI Image Generation Market Forecast 2025–2027.
https://www.statista.com/outlook/tmo/artificial-intelligence/artificial-intelligence-image-generation/worldwide
2. Nvidia. (2024). AI Creativity and Visual Generation Trends. Nvidia Official Blog.
https://blogs.nvidia.com/blog/ai-creativity-visual-generation-trends
3. Benchmark AI Labs. (2025). Evaluation of Text Rendering Accuracy in Generative Models.
https://benchmarkailabs.org/reports/text-rendering-accuracy-2025
4. Adobe Research. (2025). AI Trust and Visual Consistency Study.
https://research.adobe.com/publication/ai-trust-visual-consistency-2025
5. AI Image Report. (2025). Performance Benchmarks of Generative Models.
https://aiimagereport.com/performance-benchmarks-2025
6. AI Video Market Outlook. (2025). Trends and Projections in Generative Video Technologies.
https://www.marketsandmarkets.com/Market-Reports/ai-video-market-outlook-2025.html
7. Google AI Ethics Board. (2025). Responsible Image Generation and Deepfake Detection Framework. Google Research.
https://ai.google/responsibility/
8. Center for Humane Technology. (2025). European Perception Survey on Synthetic Media.
https://www.humanetech.com/resources/synthetic-media-survey-2025
9. Global Creative AI Index. (2025). The Evolution of Creative Skills and AI Integration.
https://www.creativeaiindex.com/reports/2025-skill-evolution

