Google Unveils Gemini Omni
Google DeepMind introduced Gemini Omni, a groundbreaking AI model that processes text, images, audio, and video inputs within a single, unified system to generate and edit video content.
Google DeepMind introduced Gemini Omni, a groundbreaking AI model that processes text, images, audio, and video inputs within a single, unified system to generate and edit video content. Announced at Google I/O on May 19, 2026, this "any-to-any" multimodal AI represents a significant architectural shift from previous models that handled modalities separately.
Gemini Omni's core innovation lies in its native multimodality, enabling it to understand and generate content across various formats simultaneously. This unified approach allows for more coherent edits and reduces pipeline artifacts commonly seen in AI-generated media. The model also incorporates a strong understanding of real-world physics, including gravity and fluid dynamics, ensuring generated scenes behave realistically. This "world model" capability bridges the gap between photorealism and meaningful storytelling.
A key feature of Gemini Omni is its conversational video editing. Users can refine generated or existing video footage through natural language instructions, with the model remembering context across multiple turns. This allows for iterative changes like altering backgrounds, adjusting lighting, or stabilizing shots without re-prompting from scratch. The model also maintains character consistency across clips, a crucial element for narrative content. Gemini Omni introduces AI avatars, allowing users to create reusable digital likenesses of themselves for use in generated videos.
The first model in the Omni family, Gemini Omni Flash, begins rolling out immediately to Google AI Plus, Pro, and Ultra subscribers via the Gemini app and Google Flow. Free access expands to YouTube Shorts and the YouTube Create app later in the week. Google Flow, an AI filmmaking tool, leverages Gemini Omni for creators, offering capabilities like script-to-video generation, scene assembly, and conversational editing. For developers and enterprises, API access will become available in the coming weeks. Google also offers tiered subscription plans, including a new $100 entry-level option, with the top-tier AI Ultra plan priced at $200 per month.
Gemini Omni builds upon Google's ongoing advancements in AI, including the Gemini 1.5 Pro and Flash models announced at Google I/O 2024 and 2025, which introduced enhanced reasoning, longer context windows, and multimodal capabilities. The development of Omni also follows Google's work on models like Veo for video generation and Imagen for image generation, integrating these capabilities into a single, cohesive system. Google's broader AI strategy, as highlighted in 2024 and 2025, emphasizes making AI helpful for everyone, integrating it across its product portfolio, and prioritizing responsible AI development.
The introduction of Gemini Omni signals a significant step in Google's pursuit of Artificial General Intelligence (AGI) and positions the company at the forefront of AI-powered multimedia creation. Its native multimodality and advanced editing capabilities promise to democratize video production, offering new tools for creators, marketers, educators, and developers alike. The integration of Gemini Omni into widely used platforms like YouTube Shorts further amplifies its potential reach and impact.