Google's Gemini 2.0 brings agentic AI to the general public (updated)
Gemini 2.0 Flash is now available on the Gemini app. #agenticai #google #gemini #artificialintelligence
By Liu Hongzuo -
This article was originally published on 11 December 2024.
6 February 2025 update: Gemini 2.0 is here. Details below.
Gemini 2.0. Is this image also AI-generated? Image: Google.
Google announced a major update to its AI model, Gemini on 11 December 2024.
It is now Gemini 2.0, having gone through its launch (Gemini 1.0 launch) and a multimodal update (Gemini 1.5 Flash launch) earlier this year. Now, the updated AI model offers even wider multimodal reasoning and the introduction of agentic AI into its package.
6 February 2025 update: Google has released Gemini 2.0 Flash to the public. It can be accessed via the Gemini app on desktop and mobile, and developers can build production apps using 2.0 Flash’s Gemini API in Google AI Studio and Vertex AI.
We’ve kept the current article in its original form and posted the full update at the end of the piece.
Gemini 2.0 Flash
Gemini 2.0 Flash is the low-latency version of the full Gemini 2.0 suite. Developers can access the new Gemini API (Google AI Studio, Vertex AI Studio), and end-users can experiment with 2.0 Flash Experimental inside the Gemini desktop and mobile website (with support for Gemini’s mobile app coming later).
Gemini 2.0 Flash now supports multimodal output and input, going beyond its multimodal input-only offering available on Gemini 1.5 Flash.
Gemini 2.0 can now respond with generated images, mixed with text, on top of steerable text-to-speech multilingual audio. This is on top of Gemini 2.0 Flash’s ability to call on Google’s native apps (like Google Search), conduct code executions, and other functions defined by third-party apps.
Another core part of Gemini 2.0 Flash is multimodal reasoning, long context understanding, complex instruction following and planning, and compositional function-calling. These combined elements of perception, reasoning, acting, and learning make Gemini 2.0 Flash capable of agentic AI.
Obviously, like other iterations of AI before agentic AI, the eventual end-user applications depend on what tools the developer builds with Gemini. Unlike generative AI, which creates content based on a single input (prompt, result, repeat), agentic AI can understand longer and multi-step prompts, create a strategy for them, and execute a chain of tasks. It can also further refine its future based on feedback from its own work and the user’s preferences.
Examples of agentic AI with Google Gemini 2.0
Google has two ongoing projects that showcase the future of Gemini’s agentic AI capabilities.
One is Project Astra, which has become more conversant in multiple and mixed languages, accents, and uncommon words. It can also pull Google Search, Google Lens, and Google Maps using Gemini 2.0. It also has an extended in-session memory of up to 10 minutes to recall past conversations better. Latency has also improved, with Google claiming that Project Astra can now understand language at about the latency of regular human conversation.
Project Mariner is the other example, and it’s an early prototype built using Gemini 2.0. Project Mariner is designed as a browser prototype that understands and reasons with information on the user’s browser screen to complete in-browser tasks on behalf of the user (it requires an experimental Chrome extension). Google said this proves that agentic AI can be technically implemented for browser navigation, even though it’s nowhere nearly ready for use due to its low accuracy and slowness.
Finally, Google also has Jules, an experimental AI agent for coding developers that works directly inside a GitHub workflow. Google said Jules can develop a plan and execute it with the developer’s supervision. More about Jules can be seen on Google’s developer blog.
Core upgrades
According to Google, when compared to Gemini 1.5, the updated Gemini 2.0 Flash is “more powerful than Gemini 1.5 Pro” while still delivering the speed and efficiency expected of its Flash models.
Also, Gemini 2.0 Flash can generate integrated responses (which includes text, audio, images) in a single API call. All image and audio output will come with invisible SynthID watermarks to combat disinformation and misattribution.
Gemini Advanced users can try a new agent called Deep Research. This AI assistant helps users do research online. It works by running several Google Searches consecutively, with each new search based on the results of the previous one. Once complete, Deep Research generates a comprehensive report with key findings that can be exported to Google Docs.
6 February 2025 update: Google-provided benchmarks of its Gemini 2.0 models are at the bottom of this article.
Gemini 2.0 goes live
6 February 2025 update: Gemini 2.0 Flash is now available to the general public. It can be accessed via the Gemini app on desktop and mobile.
For developers, the updated Gemini API is now available in Google AI Studio and Vertex AI.
In addition, the experimental version of Gemini 2.0 Pro (for coding and complex prompts) is available to Gemini Advanced users on the app and in production tools like Google AI Studio and Vertex AI.
The Gemini 2.0 Flash-Lite cost-efficiency model is available for public preview in those production tools as well.
Finally, Gemini 2.0 Flash Thinking Experimental (which shows the model’s reasoning) is also available as a dropdown menu option in the Gemini app.
Benchmarks of Gemini 2.0 versions per Google's claims. Image: Google.
All versions of Gemini 2.0 offer multimodal reasoning. That said, only text output is available now, with more output modalities later.
Source: Google (blog) 1, 2, 3, 4, 5
Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.