On June 1 GitHub moved every Copilot plan from flat-rate requests to usage-based billing, where one credit equals one cent and your meter runs on prompt size, response length and which model you pick. The old cross-subsidy of heavy users is over. Developers are now watching months of credits vanish in a single day on long chats and frontier models, and rethinking how often they reach for AI.
Meta agents run your business ๐ผ, China trains robots by hand ๐ค, your GPU is faking it ๐ป
Codex finds a DoS in every web server. Gemma 4 fits a 16GB laptop. Copilot's token bill comes due.
NEWS
Researchers at Calif disclosed HTTP/2 Bomb, a denial-of-service flaw in the default HTTP/2 config of NGINX, Apache, IIS, Envoy and Cloudflare Pingora. OpenAI's Codex found it by chaining a compression bomb against HPACK with a Slowloris-style zero-byte flow-control hold, so one near-empty header balloons into thousands of server allocations that never free. A single home PC on a 100Mbps line can stall a vulnerable server. Check your defaults.
At Build, Microsoft pitched Windows 11 as a developer platform that stops fighting Linux: native Coreutils, WSL Linux containers, an experimental Intelligent Terminal and agent-ready Windows Development Skills. Separately it announced Majorana 2, a topological quantum chip its own Discovery agentic AI helped design, with qubits 1,000 times more reliable than the last generation and a 20-second lifetime, pulling its scalable-quantum timeline forward to 2029.
At its Conversations event in London, Meta introduced the Business Agent across WhatsApp, Instagram and Messenger, letting owners hand off customer chats, bookings and even closing sales, with a human able to step in. Over a million businesses signed up during testing in India, Mexico and Brazil. Zuckerberg's stated goal is agents that "eventually help you run your whole business", once the underlying models catch up.
Google has filled the gap in its Gemma 4 line with a 12B model that runs locally on a consumer laptop with 16GB of RAM, no $20,000 accelerator required. It slots between April's mobile E2B and E4B models and the heavier 26B and 31B versions, ships under Apache 2.0, and uses a new encoding scheme and token-prediction trick to punch above its size on multimodal work.
JPMorgan's payments data chief Zachery Anderson told Semafor that some employees are "spending more on tokens than their salary" after the bank pushed everyone to fold AI into daily work. There is no usage leaderboard or company-wide rationing yet, but it is monitoring spend, and some firms already limit which staff get which tools. The open question: whether AI, like pricey trading models, should be reserved for a few.
TECHNICAL
OpenAI's data platform holds 1.5 exabytes across 90,000 datasets for 4,000 users, where the hard part isn't writing SQL but finding the right tables and knowing what they mean. Head of Data Platform Emma Tang explains the deliberately "vanilla" agent built to fix that, why strong data foundations make a simple architecture enough, and how the same Codex investment migrated 90,000 tables and 600 petabytes across clouds in two months.
At a team offsite, PostHog pointed a Karpathy-style autoresearch agent at its query engine, fed it slow production queries, and let it run overnight. By morning it had surfaced something embarrassing: for almost three years, every query with a timestamp filter had been ignoring ClickHouse's primary key. The fix cut the granules scanned sharply and lifted performance about 11%. A concrete template for turning agents loose on your own database.
Standard LLM batching pads short sequences with zeros to match the longest, so your GPU burns compute multiplying by nothing. This writeup builds WarpGroup-Backend, a small C++ engine that bin-packs variable-length sequences VRAM-aware instead of padding, hitting 2.08x throughput on an H100, 5.89x on a GTX 1080, and no OOM crashes. There is a neat aside on how it mirrors the MAC scheduler already running in your phone.
ANALYSIS
Tom Tunguz notes Microsoft added average token usage to a model release card, its coding model hitting 71.6 on SWE-Bench Verified at a third of Claude Haiku 4.5's token cost. The subsidy era is ending: Uber capped AI spend after four months, Salesforce froze hires while spending $300M on Anthropic tokens. He argues every layer must now price per result, a closed ticket or shipped PR, not per token.
The xz-utils backdoor was caught by a Microsoft engineer noticing a 500ms SSH slowdown, not by any scanner. Two years on, this analysis warns most tooling still hasn't caught up: it was a maintainer-trust hijack with no CVE to match, and lockfile scans run quarterly while threats move daily. Three fixes: pin direct dependencies, review every lockfile diff, and subscribe to a real-time feed like OSV.
Research-Driven Engineering Leadership takes the question every leader now gets, "what's your AI productivity?", and tests eight popular claims against the evidence, including a 2025 Microsoft study of over 450 engineers. The first myth it punctures: that developers spend most of their day coding, when the study found coding takes only about 14% of it, which caps how much an assistant can move. A useful antidote to lines-of-code dashboards.
An a16z partner spent months working inside a dental office and a gastroenterologist's practice and found each burns roughly 200 hours a month on admin, entering insurance payments by hand and printing patient bills. There are 500,000-plus US doctors' offices like it. His case: software only rearranged that clerical work into dashboards and queues, someone still clicks. AI can finally do the work itself, and small business is the opening.
Rest of World reports Chinese firms are mobilising large local workforces to record millions of hours of human-movement data in homes and factories, the raw material for physical AI. One Beijing resident found a humanoid from Shenzhen's X Square Robot waiting to work when he got home. The piece's thesis: this cheap, localised data pipeline gives China a scaling edge over the research-heavy, outsourced US approach to training robots.
Max Leiter reworks Terry Bisson's 1991 story They're Made Out of Meat for the age of language models. Two observers circle the same unease: the things holding fluent conversations are just floating-point weights, copyable to any machine yet only alive while the GPUs run. It is short, funny and a little haunting, and it lands the discomfort of persistent memory better than most earnest essays. Read it cold.
TOOLS
ASSERT is a local-first, framework-agnostic eval tool that turns a natural-language spec, your product requirements or system prompt, into structured tests. It derives behaviour categories, generates single and multi-turn cases, runs them against any endpoint through LiteLLM's 100-plus providers, and scores with an LLM judge grounded in OpenTelemetry traces, so the verdict can cite tool calls and routing, not just the final reply. Built to regression-test agents against intent.
Workcell runs coding agents inside a dedicated Colima VM plus a hardened container on Apple Silicon macOS, so your home directory, keychain, provider state and local sockets stop being the trust boundary. It ships native Tier-1 adapters for Codex, Claude Code and Gemini, keeps signed commits and publication on the host, and defaults verification paths to a non-root user. It is pre-1.0 and macOS-only for now.
Meta's Pyrefly type checker reaches 1.0, a Rust-based tool built for speed and low memory. It installs with pip and no extra dependencies, defaults to flagging only high-profile errors until you opt into more, and pyrefly init can migrate an existing Mypy or Pyright config automatically. Run pyrefly suppress to silence the initial flood and read each error from its own suppression comment. A credible Mypy and Pyright challenger.