Issue #106·Thursday, May 21, 2026·44 min read·22 stories

Gemini Omni ships, Anthropic outed for $15B/yr compute

KPMG rolls Claude to 276k staff. Shai-Hulud returns. Trump readies an AI executive order.

Yesterday's I/O 2026 roundup hit the headlines, so today we deep-dive on Gemini Omni, the multimodal model we flagged as a leak in edition #99. Anthropic is paying SpaceX $15 billion a year for GPU access according to the just-filed S-1, KPMG just rolled Claude out to 276,000 staff, and the Shai-Hulud npm worm hit a third wave with 600 packages compromised in an hour. Trump's draft AI executive order leaked to Axios, asking developers for voluntary early access to frontier models.

NEWS

Gemini Omni Lands: The Multimodal Model From I/O That We Flagged as a Leak in #99

· 5 min read

Yesterday's MacRumors roundup hit the I/O 2026 headlines. The launch worth its own look is Gemini Omni, the multimodal model leaked ahead of the event in edition #99. DeepMind CTO Koray Kavukcuoglu describes it as where Gemini's ability to reason meets the ability to create, starting with video, with images, audio, video and text combinable as inputs in a single pass.

SpaceX's S-1 Exposes the $15 Billion a Year Anthropic Pays for Compute

· 5 min read

The S-1 SpaceX filed with the SEC on Wednesday revealed Anthropic has agreed to pay $1.25 billion per month through May 2029 for GPU access at the Colossus and Colossus II data centres that straddle Tennessee and Mississippi. That is $15 billion a year flowing to a direct AI rival, and a hard number on what frontier compute now costs at the top of the market.

KPMG Embeds Claude Across 276,000 Staff as Anthropic Ships Self-Hosted Sandboxes

· 5 min read

KPMG is rolling Claude out to all 276,000 employees globally and embedding it in Digital Gateway, the platform the firm uses for tax and legal client work. Same day, Anthropic shipped self-hosted sandboxes (public beta) and MCP tunnels (research preview) for Claude Managed Agents, with runtimes from Cloudflare, Daytona, Modal, and Vercel. Enterprise deployments at KPMG's scale need exactly this kind of perimeter control.

Shai-Hulud Returns: 600 npm Packages Compromised in One Hour, Mostly @antv

· 5 min read

After the TanStack hijack we covered in edition #99 and the earlier Mistral wave, the Shai-Hulud worm hit npm again on May 19, publishing 639 malicious versions across 323 packages in 60 minutes. The @antv visualisation ecosystem took most of the damage. Endor Labs flagged that long-stale libraries like timeago.js and size-sensor were among the victims, since unchanged packages get less scrutiny. Secrets exfiltrate via the Session P2P network.

OpenAI's General-Purpose Reasoner Disproves an 80-Year-Old Erdos Conjecture

· 9 min read

A general-purpose OpenAI model, not a math-specific one, refuted the long-standing belief that square grid constructions are optimal for Erdos's planar unit-distance problem, producing an infinite family of counter-examples with a polynomial improvement. External mathematicians verified the proof. The same day, OpenAI launched Guaranteed Capacity, offering 1-3 year compute commitments to enterprise customers who need predictable access across products.

Axios: Trump's Draft AI Executive Order Wants Early Access to Frontier Models

· 2 min read

The White House is preparing an executive order, possibly out this week, that asks AI developers to inform the government about new releases under a voluntary framework. It has two sections: cybersecurity (Pentagon, hospitals, banks) and covered frontier models. Axios reports the Mythos episode at Anthropic softened the administration's all-in stance on AI, though hardliners want stricter terms.

Cerebras Serves China's Top Open Model at 981 Tokens/Sec, 6.7x the Next-Fastest GPU Cloud

· 1 min read

Cerebras has started running Moonshot's Kimi K2.6, the trillion-parameter Chinese open-weight model we have tracked since January, on Wafer-Scale Engine clusters for enterprise trials. Artificial Analysis measured 981 output tokens per second, 6.7x the next-fastest GPU cloud and 23x the median. A 10,000-token input with 500-token output completes in 5.6 seconds versus 163.7 on the official endpoint. US infrastructure is now the best place to run the leading Chinese open model commercially.

Warp's Oz Becomes a Multi-Harness Control Plane: Claude Code, Codex, and Warp Agent in One Cloud

· 5 min read

Warp launched multi-harness orchestration for Oz, so teams can launch, track, and control Claude Code, Codex, and Warp Agent from one cloud. New: automatic multi-agent orchestration with parallel subagents, cross-harness Agent Memory in research preview (the only one that persists across harnesses), Kubernetes self-hosting, and expanded cost and usage controls. The pitch: nobody should bet their future on a single model or harness.

TECHNICAL

How Snapchat Serves a Billion ML Predictions a Second at 100ms Latency

· 13 min read

Snap reported 946 million monthly users in late 2025, with 474 million opening the app daily. Every session runs through a retrieval-and-ranking pipeline that pulls hundreds of candidates from millions of videos, dozens of features per user and per candidate, and a deep model over every pair, all inside 100 milliseconds. ByteByteGo walks through the system shape that survives at that scale.

Railway's Post-Mortem on Eight Hours Offline After Google Cloud Suspended Its Account

· 7 min read

Between 22:20 UTC May 19 and 06:14 UTC May 20, Google Cloud mistakenly suspended Railway's production account, taking the dashboard, API, and control plane offline. The blast radius extended past GCP because Railway's edge proxies depend on a GCP-hosted control plane for routing tables. As caches expired, workloads on Railway Metal and AWS burst-cloud went unreachable too. GitHub then rate-limited their OAuth as reconnect storms hit.

Reuben Brooks: Stop Asking LLMs to Remember Invariants, Put Them in the Substrate

· 12 min read

Broken access control is the OWASP top 1 and ships because the rule lives in a prompt, a checklist, or a shared expectation that every future model invocation will remember it. Brooks argues structural backpressure beats incremental gains in model intelligence: encode the invariant in the substrate the agent writes against, not in instructions you hope it follows. He demos Shen-Backpressure, the tool he built to test that bet.

Datadog Found 1 in 20 AI Requests Already Fails Silently, and Multi-Agent Systems Make It Worse

· 8 min read

Datadog's April 2026 State of AI Engineering report puts silent-failure on production AI requests at 5%. An ICLR 2026 paper, The Reasoning Trap, found training agents to reason more carefully produces more wrong tool calls. An OutSystems survey of 1,900 IT leaders showed 96% of firms run agents but only 12% manage them centrally. Hallucination propagation across agent chains is the failure mode nobody is watching.

Six New Ettin Rerankers Span 17M to 1B Parameters, Distilled From mxbai-rerank-large

· 6 min read

Tom Aarsen released six new CrossEncoder rerankers on Ettin ModernBERT, sizes 17M to 1B. Training uses pointwise MSE distillation from mxbai-rerank-large-v2 over a LightonAI pre-training and fine-tuning mix. Each is state-of-the-art at its size on MTEB(eng, v2) Retrieval when paired with embeddinggemma-300m. Aarsen bootstrapped the recipe with the train-sentence-transformers Agent Skill shipping in Sentence Transformers v5.5.0, so you can ask Claude Code or Codex to fine-tune one.

ANALYSIS

Zitron Returns: AI Is Too Expensive for Anybody but Construction Firms and Nvidia

· 39 min read

Zitron's second appearance after edition #100. His thesis: every AI startup loses millions or billions a year and nobody has worked out how to stop the bleeding. Hyperscalers have committed over $800 billion to infrastructure in three years, with $700 billion more planned for 2026 and another $1 trillion in 2027, meaning AI revenue needs to clear $3 trillion just to break even.

Software Has Entered Its Centaur Era: AI Plus Engineer Beats Either Alone

· 3 min read

Richard Marmorstein draws the parallel to chess. After Deep Blue beat Kasparov in 1997, the best games for years came from centaurs, a human paired with an engine, not from engines alone. He argues software has just entered the same phase. The piece is a careful counter to the 2030 thought experiment where AI has replaced all knowledge work and you are driving TaskRabbit jobs for it.

The Algorithmic Bridge Maps Every US Poll Showing the AI Backlash Is Real

· 17 min read

Alberto Romero pulled together survey data from Gallup, Pew, NBC News, the Washington Post, Change Research, and Marquette Law School to map American sentiment toward AI between 2024 and 2026. Every measure points in the same direction: Americans do not want datacentres near them, do not trust AI in general, and do not like the people building it. The backlash is bipartisan, and AI has become a major political problem.

Paul Kinlan Built the Actual Dataset Behind the Model Half-Life Meme

· 3 min read

Kinlan compiled a TSV of every headline model release from major US frontier labs (OpenAI, Anthropic, Google, xAI, Meta, Mistral) and Chinese labs (DeepSeek, Qwen, Moonshot, ByteDance, MiniMax) since late 2022, splitting each vendor into actual sub-series (Opus separate from Sonnet, GPT separate from o-series). He computes the median gap between trailing releases and projects the next drop. The data confirms the meme: half-lives are dropping, with caveats.

Houda Nait: What Becomes Valuable When Intelligence Is Abundant

· 8 min read

A former OpenAI and Stanford PhD researcher argues the central question is not which jobs AI replaces, it is what becomes valuable when intelligence is cheap. For centuries, capitalism organised itself around the scarcity of expertise. That bottleneck is disappearing fast. The piece reframes the AI conversation away from benchmark races and toward what humans contribute in a world where most decisions can be outsourced to a model.

TOOLS

Roboflow Ships an MCP Server So Claude and Codex Can Build Vision Pipelines

· 4 min read

The Roboflow MCP Server lives at one URL (mcp.roboflow.com), so any MCP-compatible agent picks up the latest capabilities the moment it connects, with no SDK to update or version to pin. Claude, Codex, Cursor and others can now create projects, upload and auto-label images, pull from Roboflow Universe, train and evaluate models, and stand up deployable Workflows without leaving the agent loop.

Superlog (YC P26): Install Observability With One Prompt, Get Bug Fix PRs in Slack

· 1 min read

Run a single npx command and Superlog instruments your services with OTel, adds request spans, queue metrics, and structured error logs, and starts grouping errors into incidents with SEV scores and impact assessments. When something breaks, it prepares a fix PR with a regression test and posts it to Slack. Y Combinator-backed, no lock-in, built for the coding-agent-in-the-loop workflow.

browse.sh: One CLI That Gives Coding Agents Skills, Primitives, and Cloud Sessions

· 6 min read

A browser CLI built to be driven by AI agents, not humans. Add per-site skills (airbnb.com, recreation.gov, ramp.com) so an agent can plan a road trip and book the campsites in one prompt. Drive any page with low-level primitives (click, scroll, type, hover, press), tail network and console in real time, and switch to remote Chromium on Browserbase by prefixing any command with cloud.

Two SKILL.md Files That Turn Claude Code or Codex Into a Distributed Systems Tester

· 9 min read

One skill designs a claim-driven test plan from the system's promises and binds each scenario to an abstract model (register, queue, log, lock, lease, ledger). The other runs the plan and produces a findings report with 9-state verdicts and an explicit SUT-or-harness-or-checker blame classification. Works with Claude Code, Codex, Copilot CLI, Cursor, or Gemini. The reviewer reads two artefacts and ships.