Meta dropped Llama 4 today in two sizes: a 70B that punches well above its weight, and a 405B that lands within striking distance of GPT-5 on most public benchmarks. Both are released with weights, which is the part that matters.

What you actually get

Two model variants, both Instruct-tuned out of the gate.

Llama 4 70B-Instruct — fits comfortably on a single H100 node at 8-bit quantization, runs on a 4090 at 4-bit. Performance is roughly Claude Haiku 3.5-class on reasoning, slightly above on coding, slightly below on multimodal. It's the most useful 70B that's ever shipped.

Llama 4 405B-Instruct — needs 8x H100 for full-precision inference, or 4x H100 at 8-bit. Eval scores land within 3–5 points of GPT-5 on MMLU, BBH, and HumanEval, and roughly tied on most agent-task evals. It's the first open-weights model that a serious team could deploy as a primary instead of a fallback.

Both ship with 256k context windows. Both support tool calling natively (no fine-tune required). Both have multimodal image input.

Licensing — read this carefully

The Llama 4 community license is not an open-source license in the OSI sense. It's the same family of "open weights, restricted commercial terms" that Llama 3 used, with one important update: any company with more than 700 million monthly active users at launch needs a separate commercial agreement.

In practice, that ceiling affects roughly a dozen companies on earth. For anyone else — including most consulting clients, most startups, most enterprise teams — you can use the weights for commercial work without a separate license. Just include the attribution, don't claim the model is yours, and don't redistribute the raw weights.

The other clause to read: outputs generated by Llama 4 cannot be used to train competing models. This is an enforceable term and Meta has shown willingness to enforce it.

What it costs to run

Real numbers, not marketing numbers.

For the 70B: roughly $0.40 per million input tokens on a self-hosted H100, $0.40 per million output. Comparable hosted providers (Together, Groq, Fireworks) are pricing at $0.50–$0.80 per million on day one. If you're already paying for GPU compute, self-hosting wins. If you're not, hosted starts to make sense around 5–10M tokens per month.

For the 405B: self-hosting runs about $1.60 per million input on 8x H100 at full precision, or ~$1.20 at 8-bit quant. Hosted providers will likely land around $3.00 per million input, $5.00 per million output — still well below GPT-5's pricing.

Where it actually wins

Three places.

On-prem and air-gapped deployments. This is the use case that nothing else fills. If your workload can't leave a private cloud or your own data center, Llama 4 405B is now a credible primary model.

Heavy customization. You can fine-tune the weights, you can run it on your own infra, you can ship a forked version inside a product. The whole point is that it's yours.

Cost-sensitive batch workloads. If you're processing millions of documents per day, the per-token economics of self-hosted Llama 4 against hosted GPT-5 or Claude are decisive. Quality is close enough.

Where it still loses

Coding. The 405B is competitive on HumanEval but visibly behind Claude on real-world coding tasks — the same pattern we saw with Llama 3. If your product is a coding agent, this release does not change your decision.

Agent-loop reliability over more than 8 tool calls is still measurably worse than the frontier hosted models. Meta's own evals show a 6–8 point gap.

Multimodal handling is fine for static images but falls behind on diagrams, charts, and screenshots — the cases that actually matter for real workflows.

What we'd do

If you have GPU capacity and an AI workload that can stay on your infrastructure, run a real bake-off this week against your current model. Llama 4 70B is the easier swap — comparable quality to mid-tier hosted models at meaningfully lower cost.

If you don't have GPU capacity, wait two weeks. Hosted pricing on Together, Groq, and Fireworks will compress fast.

If you're a coding shop, this isn't your release. The interesting one for you was GPT-5.5 yesterday, even if the coding picture didn't change much there either.

Sources

  1. 1.Meta AI — Introducing Llama 4: 70B and 405B open-weights releases · Apr 24, 2026
  2. 2.Hugging Face — meta-llama/Llama-4-405B-Instruct model card · Apr 24, 2026
  3. 3.Llama community license — Llama 4 community license terms · Apr 24, 2026