OpenAI just shipped 3 production voice models: gpt-realtime-2, Translate, and Whisper

TL;DR

On May 7, 2026, OpenAI released three new voice models in its API: gpt-realtime-2 (GPT-5-class reasoning, 128K context, multi-tool calling, audible narration), gpt-realtime-translate (70+ input → 13 output languages, $0.034/min), and gpt-realtime-whisper (streaming STT, $0.017/min). Production voice agents are now a real category.

gpt-realtime-2 is OpenAI's first voice model with GPT-5-class reasoning. Context window expanded from 32K to 128K.
gpt-realtime-translate handles 70+ input languages → 13 output languages, keeping pace with the speaker. Priced at $0.034/min.
gpt-realtime-whisper streams speech-to-text live. Priced at $0.017/min.
gpt-realtime-2 is priced at $32 per million audio input tokens and $64 per million audio output tokens.
The model can call multiple tools at once and audibly narrate actions in flight ('checking your calendar', 'looking that up now').

OpenAI just released three new realtime voice models in the API. The headline model, gpt-realtime-2, is the first OpenAI voice model with GPT-5-class reasoning. The context window jumps from 32K to 128K. The launch landed on May 7, 2026.

The second model is gpt-realtime-translate. Live speech translation, 70+ input languages, 13 output languages, paced with the speaker. The third is gpt-realtime-whisper — a streaming speech-to-text model that transcribes live as a person talks.

The shape that matters most for builders is multi-tool calling with audible narration. While reasoning, gpt-realtime-2 can call multiple tools simultaneously and read out what it is doing, with phrases like "checking your calendar" or "looking that up now." That removes the silent-pause UX problem that has held voice agents back.

Pricing tells you the deployment shape. gpt-realtime-2 is $32 per million audio input tokens and $64 per million audio output tokens. Translate is $0.034 per minute. Whisper is $0.017 per minute. The Translate and Whisper minute-billing makes them safe to put behind production traffic without surprise bills. The gpt-realtime-2 token price is premium, aimed at high-value live conversations — sales calls, support escalations, complex assistance — where one minute of high-quality reasoning is worth dollars, not cents.

Why this matters: until today, the production voice-agent stack required stitching together STT, an LLM, and TTS — three separate models, three latencies, three failure modes. OpenAI just collapsed that into one realtime model with reasoning. Builders should re-evaluate any voice product that was waiting for "real-time agent quality."

What to watch: whether Anthropic and Google ship parallel voice models in 4-6 weeks. The pattern from the last 90 days is that whichever lab ships a category first, the other two follow within a quarter. Voice agents just became a frontier-lab race.

Sources

1.OpenAI — Advancing voice intelligence with new models in the API · May 7, 2026
2.TechCrunch — OpenAI launches new voice intelligence features in its API · May 7, 2026

OpenAI just shipped 3 production voice models: gpt-realtime-2, Translate, and Whisper

Sources

Get the AI briefing

Building with AI? Bring us in.

OpenAI just shipped 3 production voice models: gpt-realtime-2, Translate, and Whisper

Sources

Get the AI briefing

More AI News Now

An AI just cracked 80-year-old math — plus Qwen 3.7, Gemini Omni, Cursor Composer 2.5

Google I/O 2026 — Gemini Spark, Antigravity 2.0, Search agents, Universal Cart, XR glasses. Everything Google shipped today.

Seedance 2.0 just invented N64 games that never existed. And a full energy drink ad.

Building with AI? Bring us in.