Issue #116·Thursday, June 4, 2026·36 min read·18 stories

Meta agents run your business 💼, China trains robots by hand 🤖, your GPU is faking it 💻

Codex finds a DoS in every web server. Gemma 4 fits a 16GB laptop. Copilot's token bill comes due.

Microsoft used Build to make Windows 11 stop fighting Linux and to show off a quantum chip its own AI helped design. The through-line underneath the week is cost: Tom Tunguz argues benchmarks now need a second axis, intelligence per dollar, and that maths is already deciding who gets to use AI at work. Further down, an a16z partner makes the case that the dentist's back office, not the frontier lab, is where AI finally earns its keep.

NEWS

Meta Wants Its Business Agents to "Run Your Whole Business"

· 2 min read

At its Conversations event in London, Meta introduced the Business Agent across WhatsApp, Instagram and Messenger, letting owners hand off customer chats, bookings and even closing sales, with a human able to step in. Over a million businesses signed up during testing in India, Mexico and Brazil. Zuckerberg's stated goal is agents that "eventually help you run your whole business", once the underlying models catch up.

Windows 11 Stops Fighting Linux, and Microsoft's Quantum Chip Gets 1,000x More Reliable

· 7 min read

At Build, Microsoft pitched Windows 11 as a developer platform that stops fighting Linux: native Coreutils, WSL Linux containers, an experimental Intelligent Terminal and agent-ready Windows Development Skills. Separately it announced Majorana 2, a topological quantum chip its own Discovery agentic AI helped design, with qubits 1,000 times more reliable than the last generation and a 20-second lifetime, pulling its scalable-quantum timeline forward to 2029.

Google's Gemma 4 12B Is Built to Run on a 16GB Laptop, No Cloud Needed

· 4 min read

Google has filled the gap in its Gemma 4 line with a 12B model that runs locally on a consumer laptop with 16GB of RAM, no $20,000 accelerator required. It slots between April's mobile E2B and E4B models and the heavier 26B and 31B versions, ships under Apache 2.0, and uses a new encoding scheme and token-prediction trick to punch above its size on multimodal work.

Codex Finds "HTTP/2 Bomb", a One-Box DoS Across NGINX, Apache, IIS, Envoy and Cloudflare

· 3 min read

Researchers at Calif disclosed HTTP/2 Bomb, a denial-of-service flaw in the default HTTP/2 config of NGINX, Apache, IIS, Envoy and Cloudflare Pingora. OpenAI's Codex found it by chaining a compression bomb against HPACK with a Slowloris-style zero-byte flow-control hold, so one near-empty header balloons into thousands of server allocations that never free. A single home PC on a 100Mbps line can stall a vulnerable server. Check your defaults.

GitHub Switches Copilot to Metered Billing, and the Token Bill Lands Hard

· 5 min read

On June 1 GitHub moved every Copilot plan from flat-rate requests to usage-based billing, where one credit equals one cent and your meter runs on prompt size, response length and which model you pick. The old cross-subsidy of heavy users is over. Developers are now watching months of credits vanish in a single day on long chats and frontier models, and rethinking how often they reach for AI.

At JPMorgan, Some Staff Now Spend More on Tokens Than They Earn

· 1 min read

JPMorgan's payments data chief Zachery Anderson told Semafor that some employees are "spending more on tokens than their salary" after the bank pushed everyone to fold AI into daily work. There is no usage leaderboard or company-wide rationing yet, but it is monitoring spend, and some firms already limit which staff get which tools. The open question: whether AI, like pricey trading models, should be reserved for a few.

TECHNICAL

I Built a C++ Backend So My GPU Would Stop Eating Air

· 8 min read

Standard LLM batching pads short sequences with zeros to match the longest, so your GPU burns compute multiplying by nothing. This writeup builds WarpGroup-Backend, a small C++ engine that bin-packs variable-length sequences VRAM-aware instead of padding, hitting 2.08x throughput on an H100, 5.89x on a GTX 1080, and no OOM crashes. There is a neat aside on how it mirrors the MAC scheduler already running in your phone.

How OpenAI Built a "Vanilla" Data Agent for 1.5 Exabytes

· 18 min read

OpenAI's data platform holds 1.5 exabytes across 90,000 datasets for 4,000 users, where the hard part isn't writing SQL but finding the right tables and knowing what they mean. Head of Data Platform Emma Tang explains the deliberately "vanilla" agent built to fix that, why strong data foundations make a simple architecture enough, and how the same Codex investment migrated 90,000 tables and 600 petabytes across clouds in two months.

An Agent Ran Overnight and Found PostHog's 3-Year-Old ClickHouse Bug

· 7 min read

At a team offsite, PostHog pointed a Karpathy-style autoresearch agent at its query engine, fed it slow production queries, and let it run overnight. By morning it had surfaced something embarrassing: for almost three years, every query with a timestamp filter had been ignoring ClickHouse's primary key. The fix cut the granules scanned sharply and lifted performance about 11%. A concrete template for turning agents loose on your own database.

ANALYSIS

Benchmarks Now Have a Second Axis, and It's Intelligence Per Dollar

· 2 min read

Tom Tunguz notes Microsoft added average token usage to a model release card, its coding model hitting 71.6 on SWE-Bench Verified at a third of Claude Haiku 4.5's token cost. The subsidy era is ending: Uber capped AI spend after four months, Salesforce froze hires while spending $300M on Anthropic tokens. He argues every layer must now price per result, a closed ticket or shipped PR, not per token.

China Is Building Its Robot Future One Folded Shirt at a Time

· 5 min read

Rest of World reports Chinese firms are mobilising large local workforces to record millions of hours of human-movement data in homes and factories, the raw material for physical AI. One Beijing resident found a humanoid from Shenzhen's X Square Robot waiting to work when he got home. The piece's thesis: this cheap, localised data pipeline gives China a scaling edge over the research-heavy, outsourced US approach to training robots.

The Doctor's Office That Spends 200 Hours a Month on Admin Is the Real AI Frontier

· 7 min read

An a16z partner spent months working inside a dental office and a gastroenterologist's practice and found each burns roughly 200 hours a month on admin, entering insurance payments by hand and printing patient bills. There are 500,000-plus US doctors' offices like it. His case: software only rearranged that clerical work into dashboards and queues, someone still clicks. AI can finally do the work itself, and small business is the opening.

Two Years After xz, the Supply-Chain Attack Your Scanner Still Can't See

· 8 min read

The xz-utils backdoor was caught by a Microsoft engineer noticing a 500ms SSH slowdown, not by any scanner. Two years on, this analysis warns most tooling still hasn't caught up: it was a maintainer-trust hijack with no CVE to match, and lockfile scans run quarterly while threats move daily. Three fixes: pin direct dependencies, review every lockfile diff, and subscribe to a real-time feed like OSV.

Eight GenAI Productivity Myths, Checked Against the Research

· 5 min read

Research-Driven Engineering Leadership takes the question every leader now gets, "what's your AI productivity?", and tests eight popular claims against the evidence, including a 2025 Microsoft study of over 450 engineers. The first myth it punctures: that developers spend most of their day coding, when the study found coding takes only about 14% of it, which caps how much an assistant can move. A useful antidote to lines-of-code dashboards.

They're Made Out of Weights

· 1 min read

Max Leiter reworks Terry Bisson's 1991 story They're Made Out of Meat for the age of language models. Two observers circle the same unease: the things holding fluent conversations are just floating-point weights, copyable to any machine yet only alive while the GPUs run. It is short, funny and a little haunting, and it lands the discomfort of persistent memory better than most earnest essays. Read it cold.

TOOLS

ASSERT: Turn Plain-English Specs Into Runnable LLM Eval Suites

· 3 min read

ASSERT is a local-first, framework-agnostic eval tool that turns a natural-language spec, your product requirements or system prompt, into structured tests. It derives behaviour categories, generates single and multi-turn cases, runs them against any endpoint through LiteLLM's 100-plus providers, and scores with an LLM judge grounded in OpenTelemetry traces, so the verdict can cite tool calls and routing, not just the final reply. Built to regression-test agents against intent.

Workcell: Run Coding Agents in a Bounded VM, Not on Your Laptop

· 6 min read

Workcell runs coding agents inside a dedicated Colima VM plus a hardened container on Apple Silicon macOS, so your home directory, keychain, provider state and local sockets stop being the trust boundary. It ships native Tier-1 adapters for Codex, Claude Code and Gemini, keeps signed commits and publication on the host, and defaults verification paths to a non-root user. It is pre-1.0 and macOS-only for now.

Pyrefly 1.0 Is Meta's Rust Python Type Checker, With One-Command Migration off Mypy

· 5 min read

Meta's Pyrefly type checker reaches 1.0, a Rust-based tool built for speed and low memory. It installs with pip and no extra dependencies, defaults to flagging only high-profile errors until you opt into more, and pyrefly init can migrate an existing Mypy or Pyright config automatically. Run pyrefly suppress to silence the initial flood and read each error from its own suppression comment. A credible Mypy and Pyright challenger.