Four stories from May 18-23, 2026 reset the AI baseline. An AI autonomously cracked an 80-year-old math problem. China's flagship model overtook Gemini 3.5 Flash. Google's video model finally shipped. And the cheapest serious coding model is now roughly one-tenth the cost of the frontier.
OpenAI's autonomous math breakthrough (May 20). OpenAI announced that one of its general-purpose reasoning models autonomously disproved Paul Erdős's 1946 unit-distance conjecture. The model got the problem statement and produced the 125-page proof — no step-by-step human guidance, no completion of a partial proof. The proof uses deep algebraic number theory — Golod-Shafarevich theory and infinite class field towers — to construct an infinite family of point configurations that achieve a polynomial improvement n^(1+δ) over the square-grid bound, for some fixed δ > 0. Fields medalist Tim Gowers called the result "a milestone in AI mathematics." Princeton's Noga Alon called it "an outstanding achievement." External mathematicians (Gowers, Alon, Shankar, Tsimerman) wrote a companion paper. This is the first time AI has autonomously solved a central open problem in a major math subfield.
Qwen 3.7 Max — China's frontier landing (May 20-21). Alibaba unveiled Qwen 3.7 Max at the Alibaba Cloud Summit in Hangzhou. It hit Intelligence Index v4.0 score 56.6, ranking 5th globally and #1 in China, ahead of Gemini 3.5 Flash. 1 million token context window. GPQA Diamond 92.4 (ahead of Claude Opus 4.6 Max at 91.3). HMMT 2026 score 97.1 (highest in its comparison group). SWE-Verified 80.4 (statistically tied with Opus 4.6 Max). Hallucination rate 22.9% — the lowest reported among frontier models. API pricing: $2.50 per million input tokens, $7.50 output, cached input drops 90% to $0.25. Two variants: Max (text-only flagship) and Plus (multimodal with vision).
Gemini Omni from Google I/O 2026 (May 19). Google's video generation model finally arrived. The headline feature is conversational editing — type or speak changes like "remove the person in the background," "make the lighting warmer," "replace the narrator's voice" — and the model re-renders. It rolls out across the Gemini app, Google Flow, and YouTube. On raw cinematic quality, Omni still trails ByteDance's Seedance 2 (currently Elo 1,269 on Artificial Analysis Video Arena). On workflow integration with the rest of the Google stack, Omni leads. Two days later, Google announced Adobe, Canva, and CapCut integrations into Gemini — the creative-suite play continues.
Cursor Composer 2.5 — frontier coding at 1/10 cost (May 18). Cursor officially launched Composer 2.5, built on Moonshot AI's Kimi K2.5 with 25x more synthetic coding tasks and a new targeted RL technique. SWE-Bench Multilingual 79.8% — on par with Claude Opus 4.7 and GPT-5.5. CursorBench v3.1 63.2%. Pricing: $0.50 per million input tokens / $2.50 output. Roughly one-tenth the per-task cost of the frontier alternatives. Cursor is also training a successor model on the Colossus 2 supercomputer through SpaceXAI, with 10x more compute than Composer 2.5. The New Stack reported the same week that high-velocity teams are now running Cursor + Claude Code + Codex together as a composable stack — Cursor for daily features, Claude Code for architectural changes, Codex for automated workflows.
The strategic read. Four stories, one week, four different layers. Math. Frontier-model rankings. Multimodal generation. Coding tools. The AI market is no longer about one company or one capability. The capability frontier is now distributed — OpenAI for reasoning, Alibaba for cost-efficient frontier, Google for distribution and integration, Cursor for low-cost developer tooling. If you are building on AI, the right move is to bet on the layer, not the lab. Reasoning where reasoning matters. Cost-efficient inference where margin matters. Distribution where the user already lives. And keep the coding workflow composable. The labs that win the year will be the ones that own one layer cleanly — not the ones that try to own all four.
Sources
- 1.OpenAI — An OpenAI model has disproved a central conjecture in discrete geometry · May 20, 2026
- 2.explainx.ai — OpenAI solves 80-year Erdős geometry problem: AI autonomously disproves the square grid conjecture · May 21, 2026
- 3.OfficeChai — Alibaba's Qwen 3.7 Max becomes highest-placed Chinese model on Artificial Analysis Index, ahead of Gemini 3.5 Flash · May 21, 2026
- 4.aimadetools — Qwen 3.7 Complete Guide: Alibaba's Strongest AI Model Yet · May 21, 2026
- 5.The Tech Portal — Google introduces Gemini Omni, Gemini 3.5 Flash, AI-powered Search upgrades at I/O 2026 · May 20, 2026
- 6.Cursor — Introducing Composer 2.5 · May 18, 2026
- 7.The New Stack — Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned · May 22, 2026
- 8.BuildFastWithAI — AI News Today — May 23, 2026: 12 Biggest Stories · May 23, 2026