Google released Gemini 3 on April 21 with the lowest-key launch we've seen from them in a while. No keynote, no demo reel — just a blog post, updated docs, and a quiet API rollout. The reason this still matters: tool use is now native to the model, not a layer on top.

What's actually new

Three things to know.

Native tool calling. Gemini 3 doesn't translate function-calling requests through an external orchestration layer the way prior versions did. The model itself produces structured tool invocations as part of its standard output. In practice this means lower latency on agent loops and noticeably better behavior on long multi-call sequences — the model retains its sense of what it called five turns ago.

Million-token context, finally usable. Gemini has had a 1M-2M token context window for a while, but recall above 200k tokens degraded fast. Gemini 3's needle-in-a-haystack scores are now meaningfully better — Google's published numbers show ~92% recall at 1M tokens, roughly 12 points up from Gemini 2.

Pricing matches the field. Input is $0.30 per million tokens, output $1.20. Almost identical to GPT-5.5's new pricing. The era where Gemini was decisively the cheapest frontier model is over.

Where it actually wins

Two places.

Long-document workloads. If your application loads entire books, codebases, or transcripts and asks questions across them, Gemini 3 is now genuinely the best option. The recall improvement at the long end is the kind of change that matters in production.

Multimodal. Gemini's video and image understanding is the strongest of any frontier model we've tested. If your application processes screenshots, diagrams, or video, it's worth a real evaluation.

Where it still loses

Coding. The eval scores are competitive but real-world coding tests put it behind Claude Sonnet 4.6 by a meaningful margin. If you're running a coding agent, this release does not change the picture from our compare.

Agent reliability past 8+ tool calls still trails Claude in our internal evals — Google's own evals confirm this with a smaller delta. The gap is shrinking but it's there.

What we'd do

If you're already running Gemini in production: upgrade. The tool-use change alone is worth it.

If you're on Claude or GPT and curious: run a real eval against your specific workload before switching anything. The right model for you is the one that performs best on your tasks, not the one with the freshest launch post.

If your application is long-document or multimodal: this is the most interesting model release for you in 2026 so far.

Sources

  1. 1.Google DeepMind — Gemini 3 release notes · Apr 21, 2026
  2. 2.Google AI Studio — Updated Gemini API docs — tool use · Apr 21, 2026