Site icon aivancity blog

Image 4: Google’s new showcase for generative visual artificial intelligence

Following the highly anticipated release of Gemini 1.5, Google continues to gain momentum in the field of generative artificial intelligence with the launchof Imagen 4, a new iteration of its image-generation model. Announced at Google I/O 2024, Imagen 4 exemplifies the convergence of technical performance and creative accessibility in the field of AI-driven visual synthesis.

Currently available on a limited basis via the Music AI Sandbox or in ImageFX (within Search Labs), Imagen 4 is part of a suite of creative tools designed to make artificial intelligence more accessible for the creation of images and artistic content1.

This model is based on a proprietary text-to-image generation architecture that combines complex semantic representations with remarkable photorealistic rendering capabilities. It thus positions itself as a direct competitor to the most advanced models, such as Midjourney v6, DALL·E 3, and Stable Diffusion XL Turbo, while distinguishing itself through native integration with Google services.

One of the key improvements introduced by Imagen 4 lies in the consistency across elements and its adherence to textual prompts. Where previous models still struggled with anatomical details (hands, perspectives, interactions between objects), Imagen 4 delivers significantly better results, particularly for faces, textures, and complex scenes.

Tests conducted by experts in generative AI indicate that the model excels at generating realistic photographs and conceptual scenes, as well as at reproducing technical objects and natural environments. Google is focusing on granular detail and in-depth linguistic understanding, enabling Imagen 4 to produce more accurate images from ambiguous or narrative prompts2.

While Imagen 4’s capabilities are impressive, they also raise significant legal and ethical questions. The creation of photorealistic images raises concerns about the potential for visual misinformation and misuse for malicious purposes, particularly in the political, media, and educational spheres.

To address these risks, Google has announced that all images generated by Imagen 4 will include an invisible digital watermark using SynthID, a proprietary technology designed to automatically identify AI-generated images3. Additionally, the model is subject to safety filters, specifically to prevent the generation of violent, hateful, or sexually explicit content.

From a regulatory standpoint, Imagen 4 will also need to comply with the future requirements of the European AI Act, particularly regarding transparency, traceability, and copyright protection. Google’s liability for the distribution of potentially controversial images could become a major issue in the coming months.

The launch of Imagen 4 should not be viewed in isolation: it is part of a broader strategy to integrate generative AI into Google’s services. Eventually, cross-platform integration is planned for Workspace (Docs, Slides), Photos, YouTube, and Gemini. This development could redefine the user experience when it comes to visual creation.

In addition, Google positions Imagen 4 as a driver of professional innovation: product design, visual marketing, prototyping, editorial illustration… these are just some of the use cases targeted by this new generation of multimodal AI.

With Imagen 4, Google is reaffirming its ambition to become a key player in creative AI. While the model’s technical power is undeniable, it calls for a collective reflection on the use of these technologies: what limits should be set on the automation of the imagination? How can we preserve the authenticity of human-created works? And above all, how can we ensure that these tools remain at the service of ethical, transparent, and responsible creativity?

1. Google. (2024). Introducing Imagen 4 and the Music AI Sandbox. ” – French Consumers’ Use of AI.
https://blog.google/technology/ai/google-deepmind-imagen-4/

2. The Verge. (2024). Google’s Imagen 4 is here, and it’s shockingly good at generating realistic photos.
https://www.theverge.com/2024/5/14/google-imagen-4-ai-image-generation

3. DeepMind. (2024). SynthID expands to watermark AI-generated text, audio, and video.
https://www.deepmind.com/blog/synthid-expands

Exit mobile version