Google DeepMind has officially released Gemma 4, a new family of open-weight AI models built from the same research that powers Gemini 3. Announced on April 2, 2026, the Gemma 4 family includes four model sizes designed to run on everything from smartphones to workstations. Additionally, Google released all four models under a commercially permissive Apache 2.0 license. As a result, developers now have unprecedented freedom to build, fine-tune, and deploy frontier AI locally. This article covers what makes Gemma 4 special, how it compares to competitors, and why it matters for the future of on-device AI.
What Is Gemma 4 and Why Does It Matter?
Gemma 4 represents Google DeepMind’s most ambitious open model release to date. Unlike previous Gemma generations, this release focuses specifically on agentic workflows and multimodal intelligence. Therefore, developers can now build autonomous AI agents that reason through multi-step tasks, call external functions, and generate structured JSON outputs natively.
Since the original Gemma launch, the community has downloaded Gemma models over 400 million times. Furthermore, developers have created more than 100,000 community variants. Google calls this thriving community the “Gemmaverse.” Because Gemma 4 now ships under Apache 2.0 (replacing Google’s earlier custom license), enterprise adoption barriers have dropped significantly.
Gemma 4 Model Lineup: Four Sizes for Every Use Case
Google released Gemma 4 in four distinct sizes. Each model targets a specific hardware tier and use case. Consequently, developers can choose the right balance of performance and efficiency for their projects.
| Feature | E2B | E4B | 26B MoE | 31B Dense |
| Active Parameters | ~2B | ~4B | ~4B (128 experts) | 31B |
| Context Window | 128K tokens | 128K tokens | 256K tokens | 256K tokens |
| Modalities | Text, image, video, audio | Text, image, video, audio | Text, image, video | Text, image, video |
| Best For | Phones, IoT, Raspberry Pi | Tablets, mid-range phones | Notebooks, consumer GPUs | Workstations, servers |
| Memory Usage | Under 1.5 GB | ~3 GB | ~16 GB (quantized) | ~20 GB (quantized) |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 |
The 31B Dense variant currently ranks third on the Arena AI open-model text leaderboard. It achieves an estimated LMArena score of 1,452. Meanwhile, the 26B MoE model reaches a score of 1,441 while activating only 4 billion parameters per token. Both models beat competitors with over 20 times more total parameters.
Multimodal Capabilities and On-Device Gemma 4 Performance
Every Gemma 4 model can process images and videos natively. The models support variable resolutions and excel at visual tasks like OCR and chart understanding. Additionally, the two smaller edge models (E2B and E4B) feature native audio input. This enables real-time speech recognition and understanding directly on a device, with no internet connection required.
On-device performance is particularly impressive. Google’s LiteRT-LM runtime allows the E2B model to run using under 1.5 GB of memory. As a result, it fits comfortably on Android phones, Raspberry Pi boards, and NVIDIA Jetson Orin Nano devices. Google also reports the edge models are up to 4x faster than previous versions and use 60% less battery. For example, the E2B runs three times faster than the E4B model for latency-sensitive applications.
Context windows have also expanded significantly. The edge models support 128K tokens, while the larger 26B and 31B models handle 256K tokens. Consequently, developers can feed entire codebases or large document collections into a single prompt. Google has also trained all Gemma 4 models in more than 140 languages.
Apache 2.0 License: A Major Shift for Gemma 4
Previous Gemma models shipped under Google’s custom Gemma license. That license included certain restrictions on commercial use and redistribution. However, Gemma 4 arrives under the Apache 2.0 license. This is the same permissive license used by many popular open-source projects.
Hugging Face co-founder Clement Delangue described this licensing change as a major milestone. Because Apache 2.0 removes commercial restrictions, enterprises can now integrate Gemma 4 into production products without legal concerns. Similarly, startups gain complete freedom to modify and deploy the models. Google positions this as a foundation for digital sovereignty, giving teams full control over their data, infrastructure, and AI models.
How Gemma 4 Competes With Llama 4 and Open-Source Rivals
The open-weight AI model space has grown increasingly competitive. Meta’s Llama 4 and Mistral have traditionally dominated the developer ecosystem. Meanwhile, Chinese models from Alibaba, Moonshot AI, and Z.AI now rival frontier proprietary systems. Therefore, Google’s Gemma 4 release is a direct challenge to these established players.
The 31B Dense variant ties with models from Kimi and Z.AI that have over 700 billion total parameters. However, Gemma 4 achieves this with a fraction of the compute and memory requirements. Also, its broad framework support is a key differentiator. Gemma 4 launched with day-one support for Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM, and over a dozen other tools.
For developers already running llama.cpp or Ollama locally, Gemma 4 is available for immediate download and testing. Early community benchmarks on an M2 Ultra show the 26B MoE model generating around 300 tokens per second.
Where to Download and Run Gemma 4 Today
Google has made Gemma 4 available across multiple platforms on day one. Specifically, developers can access the models through Hugging Face, Kaggle, Ollama, Google AI Studio (31B and 26B MoE), and Google AI Edge Gallery (E4B and E2B). In addition, Android developers can prototype agentic flows through the AICore Developer Preview. Because Gemma 4 is the foundation for Gemini Nano 4, code written today will automatically work on Gemini Nano 4 devices later this year.
Fine-tuning is also straightforward. Developers can customize models using Google Colab, Vertex AI, or even a consumer gaming GPU. Getting Started with Gemma 4
Frequently Asked Questions About Gemma 4
Gemma 4 is Google DeepMind’s latest family of open-weight AI models. It includes four sizes (E2B, E4B, 26B MoE, and 31B Dense) built from the same research as Gemini 3. The models support text, image, video, and audio processing. They are released under the Apache 2.0 license for unrestricted commercial use.
Yes. The E2B and E4B edge models are specifically designed for on-device inference. The E2B model runs using under 1.5 GB of memory. As a result, it works on Android phones, Raspberry Pi boards, and similar low-power devices. Google also reports the edge models are up to 4x faster than previous Gemma versions.
Gemma 4 is released under the Apache 2.0 license, which is one of the most permissive open-source licenses available. This allows full commercial use, modification, and redistribution. Previous Gemma models used a more restrictive custom license. Therefore, Gemma 4 removes barriers for enterprise and startup adoption.
The Gemma 4 31B Dense model ranks third on the Arena AI open-model text leaderboard, competing directly with models that have 20x more parameters. Gemma 4 differentiates itself with native multimodal support, on-device optimization, and broader day-one framework compatibility. However, Llama 4 still holds a larger community of deployed applications.
Gemma 4 is available immediately on Hugging Face, Kaggle, and Ollama. The 31B and 26B models are also accessible through Google AI Studio. Furthermore, the edge models (E2B and E4B) are available through Google AI Edge Gallery. Android developers can also access Gemma 4 through the AICore Developer Preview.
Conclusion: Gemma 4 Signals a New Era for Open AI
Gemma 4 is more than an incremental update. It represents Google DeepMind’s clearest commitment yet to making frontier AI accessible, efficient, and truly open. With four model sizes, multimodal capabilities, native function calling, and an Apache 2.0 license, Gemma 4 gives developers everything they need to build the next generation of AI applications. Most importantly, it brings this intelligence to edge devices without sacrificing quality.
Whether you are building an AI agent, exploring on-device inference, or looking for a powerful open model for your startup, Gemma 4 deserves a spot on your shortlist.
Source: Google DeepMind: Gemma 4 Model Page

One thought on “Gemma 4: Google Launches Its Most Capable Open AI Model Family”