Issue #113·Monday, June 1, 2026·42 min read·21 stories

The AI Bill Comes Due; the Open-Source Tool That Cuts It 90%

US killed a China-chip loophole. Men use coding agents 2x women. A 3D MMO that ran on dial-up.

The bill for two years of all-you-can-eat AI is landing on desks this week. One company torched half a billion dollars on Claude in a single month, Amazon pulled the plug on its internal 'tokenmaxxing' leaderboard, and GitHub Copilot's jump to usage pricing has builders doing furious maths. Elsewhere, SoftBank stakes $87 billion on French nuclear power, Liquid AI drops an open-weights model that runs on a laptop, and Okta ships a kill switch for the rogue AI agents companies deploy faster than they secure.

NEWS

Corporate America Starts Rationing AI as the Token Bill Comes Due

· 7 min read

Companies that spent freely to look AI-forward are now metering it: an Uber executive says the firm blew its annual agentic budget by March, and EntelligenceAI found only 18% of coding-token spend reaches shipped products. The all-you-can-eat subsidy is ending for smaller builders too, as GitHub Copilot swaps flat-rate pricing for token-based billing, with some reporting monthly bills jumping from $29 to $750.

One Firm Burned $500M on Claude in a Month; Amazon Axes Its 'Tokenmaxxing' Leaderboard

· 2 min read

An AI consultant told Axios that one client spent half a billion dollars on Claude licences in a single month after giving staff uncapped seats, with some using premium models to check the weather. Separately, Amazon deprecated KiroRank, an internal leaderboard that had staff running up token use just to climb the ranks. The memo from an Amazon SVP: don't use AI just for the sake of using AI.

US Closes the Year-Old Loophole That Fed China Nvidia's Top Chips via Malaysia

· 2 min read

The Commerce Department issued unusual weekend guidance closing a loophole it created a year ago, when it stopped enforcing the Biden-era AI Diffusion rule. The gap let overseas subsidiaries of Chinese firms, many based in Malaysia, buy Nvidia's most advanced Rubin and Blackwell chips and AMD's MI350x without a licence. One supply-chain source estimates hundreds of thousands of chips moved through before the door shut.

SoftBank Bets $87B on French Data Centres, Drawn by a Nuclear Grid the US Can't Match

· 4 min read

SoftBank will commit up to €75 billion ($87 billion) to build five gigawatts of AI data-centre capacity in France, its largest European AI investment, with EDF and Schneider Electric as partners. The first €45 billion phase targets 3.1GW in Hauts-de-France by 2031. The real draw is electricity: France runs about 70% on nuclear, is the world's largest net power exporter, and posts industrial prices under half the UK's.

Liquid AI Ships an 8B Tool-Calling Model With Open Weights That Runs on a Laptop

· 5 min read

Liquid AI released LFM2.5-8B-A1B, an open-weights edge model built for fast, reliable tool calling on consumer hardware. It expands the context window to 128K, scales pretraining from 12T to 38T tokens, and doubles the vocabulary for better non-Latin tokenisation. A reasoning-only model that chains tool calls and fits on an entry-level laptop, it has base and post-trained weights on Hugging Face today.

Okta Builds a Kill Switch for Rogue AI Agents; 92% Deploy Them, 22% Secure Them

· 5 min read

Okta says enterprises are deploying AI agents far faster than they secure them: 92% of executives report moderate or widespread use of autonomous agents, but only 22% have identities tied to them. The company is selling a kill switch that severs an agent's access tokens at the authorisation layer. CEO Todd McKinnon says ServiceNow came to Okta asking for exactly that capability.

Anthropic Finds Male Researchers Use Coding Agents Twice as Often as Women

· 3 min read

Anthropic studied how social scientists use AI and found researchers with typically male names use coding agents like Claude Code more than twice as often as those with female names, a gap that holds within the same disciplines and career levels. Economists lead adoption at 39%, education researchers trail at 4%. Top-25-university researchers use the tools 40% more than peers, and code generation for data analysis is 97% of use.

DuckDuckGo's 'No AI' Search Traffic Climbs as Users Reject Google's AI Overhaul

· 2 min read

DuckDuckGo says visits to its 'No AI' search page more than tripled after Google's May 19 I/O reimagined search around AI suggestions, follow-up questions, and agents. Traffic hit 3x on May 28 and has averaged about 84% above baseline since. The page strips AI answers, the chat interface, and most AI images, and DuckDuckGo is pushing Chrome and Firefox extensions that make it the default.

TECHNICAL

Hitting 3,000 Tokens a Second Per Request on the Datacentre GPUs You Already Own

· 16 min read

kog.ai makes the case that single-request decode speed, not throughput, is what matters for agents, and that it is a memory-bandwidth problem, not a FLOPS one. Standard datacentre GPUs have a far higher decoding ceiling than current inference stacks expose; the bottleneck is software. Co-designing model architecture, runtime, and low-level GPU kernels as one pipeline, they reach 3,000 tokens a second per request on a 2B model.

I Put a £200 Datacentre GPU in My Gaming PC to Run a 27B Model Locally

· 13 min read

Oscar Molnar wanted to run bigger models than his RTX 4080's 16GB allowed, so he bolted a £200 datacentre-grade Tesla V100 into his gaming PC. The writeup traces the whole hack: SXM2-to-PCIe adapters, cooling, NixOS drivers, and the debugging in between. The payoff is 32GB of VRAM running a 27-billion-parameter model at 32 tokens a second, and a clear lesson that bandwidth, not raw compute, is the real local-LLM bottleneck.

Embeddings Aren't Magic: The Predictable Failure Modes of RAG Retrieval

· 7 min read

A RAG system over policy docs dazzles at first, handling paraphrase, typos, even cross-language queries with no synonym table. Two weeks later it tells the expert who wrote the manual 'I couldn't find that', because she searched the term the document actually uses and the embedding didn't. It catalogues what embeddings reliably break on, negation, exact reference numbers, internal product codes, and puts the fix in upstream keyword filtering, not a better reranker.

How 2004 RuneScape Fit a Multiplayer RPG Into 56k Dial-Up

· 8 min read

A 3D world with a couple of thousand players per server, dozens on screen, running in a browser over 5 kilobytes a second. This teardown traces a single click one tile north, byte by byte, from a decompiled 2004 RuneScape client to another player's screen. It is a masterclass in not wasting bytes: tick-based updates, delta encoding, and bit-packing, the techniques that made a 56k MMO feel alive.

ANALYSIS

AI Dark Output: The Visible Cost of Invisible Output

· 8 min read

SemiAnalysis argues AI's value is becoming 'dark output' that macroeconomic data can't see, much as 1980s statistics missed the computer revolution (Solow: 'You can see the computer age everywhere but in the productivity statistics'). The costs in dollars, watts, gallons, and jobs are easy to count; the output is not. Their warning: 'no productivity bump' isn't evidence AI lacks value, only that the measurement isn't built yet.

Om Malik: At Anthropic's Valuation, You're Buying the Press Release, Not the Financials

· 6 min read

Om Malik got pinged to buy $10 million of Anthropic common at an implied $1 trillion, days before the company closed above $900 billion. His point cuts past the usual bubble debate: every number, including ones Anthropic discloses, is so large you cannot tell what is real without an SEC filing. Buying at a trillion is trusting headlines and FOMO-drunk investors to have done diligence they have not.

Build Agents, Not Pipelines

· 11 min read

Goedecke reduces LLM architecture to one real decision: do you express control flow in code (a pipeline) or hand the model tools and let it manage control flow itself (an agent)? He frames it as library versus framework. Pipelines give you control and predictability; agents start faster and flex, but you surrender the deterministic structure. His advice is to choose deliberately rather than reaching for an agent by default.

Why Coding Agents Still Reward Expertise You Can Only Earn by Hand

· 5 min read

Why keep hiring juniors when agents write the code juniors used to? The essay reaches for a half-century-old parallel: 'calculator' was once a human job, killed by the scientific calculator. The tools grew more powerful, but those who direct them well still need hard-won intuition. Coding agents are the same, amplifying judgement you can only build manually, which is why fresh grads struggle while labs still fight over junior talent.

Code Isn't Product

· 5 min read

Mironov names the confusion AI is about to make obvious: shipping code is not the same as creating product value. AI lets teams ship 100x more features, and boards are using that throughput to justify slashing R&D. What isn't scaling is customer attention: prospects don't have 100x the bandwidth to evaluate offers, buying hasn't sped up, and budgets are flat. The bottleneck moves from building to getting anyone to care.

TOOLS

A Netflix Engineer Open-Sourced Headroom to Strip 90% of Wasted Tokens

· 7 min read

Netflix senior engineer Tejas Chopra built Headroom after a $287 Claude Sonnet bill, estimating up to 90% of tokens hitting an LLM are redundant. It compresses tool outputs, logs, files, and RAG chunks before they reach the model, as a library, proxy, or MCP server. Open-source since January and still raw at v0.22, it has saved users an estimated $700,000 and freed 200 billion tokens.

AISlop Catches the Slop Coding Agents Leave Behind, With No LLM in the Loop

· 7 min read

AISlop scans for the patterns Claude Code, Cursor, Codex, and OpenCode leave behind: narrative comments over self-explanatory code, swallowed exceptions, 'as any' casts, hallucinated imports, dead code, oversized functions. Tests pass, lint passes, the code rots anyway. It runs 40-plus rules across seven languages, scores each change 0 to 100 in under a second, and is deterministic with no model in the runtime path. MIT-licensed, runs via npx.

run-llama Open-Sources liteparse, a Self-Hostable Rust Document Parser

· 1 min read

From the LlamaIndex team, liteparse is a fast, open-source document parser written in Rust, effectively the self-hostable answer to their own paid LlamaParse. It turns PDFs and office docs into clean structured text for RAG and extraction pipelines, with no API calls and no per-page cost. Written in Rust for speed, it is a drop-in option when you would rather not ship documents to a hosted parser.

auto-subs Wires On-Device Subtitle Generation Into Resolve, Premiere, and After Effects

· 1 min read

auto-subs generates subtitles entirely on-device and drops them straight into DaVinci Resolve, Premiere, and After Effects, so footage never leaves your machine for a cloud service. It is aimed at editors who want fast, private captioning inside the tools they already use. Written in TypeScript and trending daily, it is a practical privacy-first alternative to upload-based transcription for anyone cutting video locally.