An AI just cracked 80-year-old math — plus Qwen 3.7, Gemini Omni, Cursor Composer 2.5

TL;DR

Four stories from May 18-23, 2026, that reset the AI baseline. (1) An OpenAI reasoning model autonomously disproved Paul Erdős's 1946 unit-distance conjecture — the first time AI has independently solved a central open math problem. Fields medalist Tim Gowers and Princeton's Noga Alon validated the 125-page proof. (2) Alibaba's Qwen 3.7 Max launched at Intelligence Index v4.0 score 56.6 — fifth globally, first in China, ahead of Gemini 3.5 Flash. 1M token context. (3) Google's Gemini Omni shipped from I/O 2026 with conversational video editing. (4) Cursor Composer 2.5 matched Claude Opus 4.7 and GPT-5.5 on coding benchmarks at roughly one-tenth the cost.

An OpenAI reasoning model autonomously disproved Erdős's 1946 unit-distance conjecture — first AI to solve a central open math problem.
Fields medalist Tim Gowers called it 'a milestone in AI mathematics'; Noga Alon (Princeton) called it 'an outstanding achievement'.
Alibaba Qwen 3.7 Max lands at Intel Index v4.0 score 56.6 — 5th globally, #1 Chinese model, ahead of Gemini 3.5 Flash.
Qwen 3.7 specs: 1M context, GPQA 92.4, SWE-Verified 80.4, hallucination rate 22.9% (lowest of any frontier model), $2.50/$7.50 per million tokens.
Google Gemini Omni launched video gen with conversational editing — say what to change, model re-renders the video.
Cursor Composer 2.5 hits SWE-Bench Multilingual 79.8% — par with Claude Opus 4.7 and GPT-5.5 — at $0.50 per million input tokens (~1/10 cost).
The composable AI coding stack — Cursor + Claude Code + Codex together — is replacing single-tool workflows for high-velocity teams.

Four stories from May 18-23, 2026 reset the AI baseline. An AI autonomously cracked an 80-year-old math problem. China's flagship model overtook Gemini 3.5 Flash. Google's video model finally shipped. And the cheapest serious coding model is now roughly one-tenth the cost of the frontier.

OpenAI's autonomous math breakthrough (May 20). OpenAI announced that one of its general-purpose reasoning models autonomously disproved Paul Erdős's 1946 unit-distance conjecture. The model got the problem statement and produced the 125-page proof — no step-by-step human guidance, no completion of a partial proof. The proof uses deep algebraic number theory — Golod-Shafarevich theory and infinite class field towers — to construct an infinite family of point configurations that achieve a polynomial improvement n^(1+δ) over the square-grid bound, for some fixed δ > 0. Fields medalist Tim Gowers called the result "a milestone in AI mathematics." Princeton's Noga Alon called it "an outstanding achievement." External mathematicians (Gowers, Alon, Shankar, Tsimerman) wrote a companion paper. This is the first time AI has autonomously solved a central open problem in a major math subfield.

Qwen 3.7 Max — China's frontier landing (May 20-21). Alibaba unveiled Qwen 3.7 Max at the Alibaba Cloud Summit in Hangzhou. It hit Intelligence Index v4.0 score 56.6, ranking 5th globally and #1 in China, ahead of Gemini 3.5 Flash. 1 million token context window. GPQA Diamond 92.4 (ahead of Claude Opus 4.6 Max at 91.3). HMMT 2026 score 97.1 (highest in its comparison group). SWE-Verified 80.4 (statistically tied with Opus 4.6 Max). Hallucination rate 22.9% — the lowest reported among frontier models. API pricing: $2.50 per million input tokens, $7.50 output, cached input drops 90% to $0.25. Two variants: Max (text-only flagship) and Plus (multimodal with vision).

Gemini Omni from Google I/O 2026 (May 19). Google's video generation model finally arrived. The headline feature is conversational editing — type or speak changes like "remove the person in the background," "make the lighting warmer," "replace the narrator's voice" — and the model re-renders. It rolls out across the Gemini app, Google Flow, and YouTube. On raw cinematic quality, Omni still trails ByteDance's Seedance 2 (currently Elo 1,269 on Artificial Analysis Video Arena). On workflow integration with the rest of the Google stack, Omni leads. Two days later, Google announced Adobe, Canva, and CapCut integrations into Gemini — the creative-suite play continues.

Cursor Composer 2.5 — frontier coding at 1/10 cost (May 18). Cursor officially launched Composer 2.5, built on Moonshot AI's Kimi K2.5 with 25x more synthetic coding tasks and a new targeted RL technique. SWE-Bench Multilingual 79.8% — on par with Claude Opus 4.7 and GPT-5.5. CursorBench v3.1 63.2%. Pricing: $0.50 per million input tokens / $2.50 output. Roughly one-tenth the per-task cost of the frontier alternatives. Cursor is also training a successor model on the Colossus 2 supercomputer through SpaceXAI, with 10x more compute than Composer 2.5. The New Stack reported the same week that high-velocity teams are now running Cursor + Claude Code + Codex together as a composable stack — Cursor for daily features, Claude Code for architectural changes, Codex for automated workflows.

The strategic read. Four stories, one week, four different layers. Math. Frontier-model rankings. Multimodal generation. Coding tools. The AI market is no longer about one company or one capability. The capability frontier is now distributed — OpenAI for reasoning, Alibaba for cost-efficient frontier, Google for distribution and integration, Cursor for low-cost developer tooling. If you are building on AI, the right move is to bet on the layer, not the lab. Reasoning where reasoning matters. Cost-efficient inference where margin matters. Distribution where the user already lives. And keep the coding workflow composable. The labs that win the year will be the ones that own one layer cleanly — not the ones that try to own all four.

An AI just cracked 80-year-old math — plus Qwen 3.7, Gemini Omni, Cursor Composer 2.5

Sources

Get the AI briefing

Building with AI? Bring us in.

An AI just cracked 80-year-old math — plus Qwen 3.7, Gemini Omni, Cursor Composer 2.5

Sources

Get the AI briefing

More AI News Now

Google I/O 2026 — Gemini Spark, Antigravity 2.0, Search agents, Universal Cart, XR glasses. Everything Google shipped today.

Seedance 2.0 just invented N64 games that never existed. And a full energy drink ad.

AI video generation just moved to China — and Western media barely noticed

Building with AI? Bring us in.