Back to archive
Issue #113··42 min read·21 stories

The AI Bill Comes Due; the Open-Source Tool That Cuts It 90%

US killed a China-chip loophole. Men use coding agents 2x women. A 3D MMO that ran on dial-up.

The bill for two years of all-you-can-eat AI is landing on desks this week. One company torched half a billion dollars on Claude in a single month, Amazon pulled the plug on its internal 'tokenmaxxing' leaderboard, and GitHub Copilot's jump to usage pricing has builders doing furious maths. Elsewhere, SoftBank stakes $87 billion on French nuclear power, Liquid AI drops an open-weights model that runs on a laptop, and Okta ships a kill switch for the rogue AI agents companies deploy faster than they secure.

NEWS

The Commerce Department issued unusual weekend guidance closing a loophole it created a year ago, when it stopped enforcing the Biden-era AI Diffusion rule. The gap let overseas subsidiaries of Chinese firms, many based in Malaysia, buy Nvidia's most advanced Rubin and Blackwell chips and AMD's MI350x without a licence. One supply-chain source estimates hundreds of thousands of chips moved through before the door shut.

SoftBank will commit up to €75 billion ($87 billion) to build five gigawatts of AI data-centre capacity in France, its largest European AI investment, with EDF and Schneider Electric as partners. The first €45 billion phase targets 3.1GW in Hauts-de-France by 2031. The real draw is electricity: France runs about 70% on nuclear, is the world's largest net power exporter, and posts industrial prices under half the UK's.

Liquid AI released LFM2.5-8B-A1B, an open-weights edge model built for fast, reliable tool calling on consumer hardware. It expands the context window to 128K, scales pretraining from 12T to 38T tokens, and doubles the vocabulary for better non-Latin tokenisation. A reasoning-only model that chains tool calls and fits on an entry-level laptop, it has base and post-trained weights on Hugging Face today.

Okta says enterprises are deploying AI agents far faster than they secure them: 92% of executives report moderate or widespread use of autonomous agents, but only 22% have identities tied to them. The company is selling a kill switch that severs an agent's access tokens at the authorisation layer. CEO Todd McKinnon says ServiceNow came to Okta asking for exactly that capability.

Anthropic studied how social scientists use AI and found researchers with typically male names use coding agents like Claude Code more than twice as often as those with female names, a gap that holds within the same disciplines and career levels. Economists lead adoption at 39%, education researchers trail at 4%. Top-25-university researchers use the tools 40% more than peers, and code generation for data analysis is 97% of use.

DuckDuckGo says visits to its 'No AI' search page more than tripled after Google's May 19 I/O reimagined search around AI suggestions, follow-up questions, and agents. Traffic hit 3x on May 28 and has averaged about 84% above baseline since. The page strips AI answers, the chat interface, and most AI images, and DuckDuckGo is pushing Chrome and Firefox extensions that make it the default.

TECHNICAL

Oscar Molnar wanted to run bigger models than his RTX 4080's 16GB allowed, so he bolted a £200 datacentre-grade Tesla V100 into his gaming PC. The writeup traces the whole hack: SXM2-to-PCIe adapters, cooling, NixOS drivers, and the debugging in between. The payoff is 32GB of VRAM running a 27-billion-parameter model at 32 tokens a second, and a clear lesson that bandwidth, not raw compute, is the real local-LLM bottleneck.

kog.ai makes the case that single-request decode speed, not throughput, is what matters for agents, and that it is a memory-bandwidth problem, not a FLOPS one. Standard datacentre GPUs have a far higher decoding ceiling than current inference stacks expose; the bottleneck is software. Co-designing model architecture, runtime, and low-level GPU kernels as one pipeline, they reach 3,000 tokens a second per request on a 2B model.

A RAG system over policy docs dazzles at first, handling paraphrase, typos, even cross-language queries with no synonym table. Two weeks later it tells the expert who wrote the manual 'I couldn't find that', because she searched the term the document actually uses and the embedding didn't. It catalogues what embeddings reliably break on, negation, exact reference numbers, internal product codes, and puts the fix in upstream keyword filtering, not a better reranker.

A 3D world with a couple of thousand players per server, dozens on screen, running in a browser over 5 kilobytes a second. This teardown traces a single click one tile north, byte by byte, from a decompiled 2004 RuneScape client to another player's screen. It is a masterclass in not wasting bytes: tick-based updates, delta encoding, and bit-packing, the techniques that made a 56k MMO feel alive.

ANALYSIS

Mironov names the confusion AI is about to make obvious: shipping code is not the same as creating product value. AI lets teams ship 100x more features, and boards are using that throughput to justify slashing R&D. What isn't scaling is customer attention: prospects don't have 100x the bandwidth to evaluate offers, buying hasn't sped up, and budgets are flat. The bottleneck moves from building to getting anyone to care.

SemiAnalysis argues AI's value is becoming 'dark output' that macroeconomic data can't see, much as 1980s statistics missed the computer revolution (Solow: 'You can see the computer age everywhere but in the productivity statistics'). The costs in dollars, watts, gallons, and jobs are easy to count; the output is not. Their warning: 'no productivity bump' isn't evidence AI lacks value, only that the measurement isn't built yet.

Goedecke reduces LLM architecture to one real decision: do you express control flow in code (a pipeline) or hand the model tools and let it manage control flow itself (an agent)? He frames it as library versus framework. Pipelines give you control and predictability; agents start faster and flex, but you surrender the deterministic structure. His advice is to choose deliberately rather than reaching for an agent by default.

Om Malik got pinged to buy $10 million of Anthropic common at an implied $1 trillion, days before the company closed above $900 billion. His point cuts past the usual bubble debate: every number, including ones Anthropic discloses, is so large you cannot tell what is real without an SEC filing. Buying at a trillion is trusting headlines and FOMO-drunk investors to have done diligence they have not.

Why keep hiring juniors when agents write the code juniors used to? The essay reaches for a half-century-old parallel: 'calculator' was once a human job, killed by the scientific calculator. The tools grew more powerful, but those who direct them well still need hard-won intuition. Coding agents are the same, amplifying judgement you can only build manually, which is why fresh grads struggle while labs still fight over junior talent.

TOOLS

AISlop scans for the patterns Claude Code, Cursor, Codex, and OpenCode leave behind: narrative comments over self-explanatory code, swallowed exceptions, 'as any' casts, hallucinated imports, dead code, oversized functions. Tests pass, lint passes, the code rots anyway. It runs 40-plus rules across seven languages, scores each change 0 to 100 in under a second, and is deterministic with no model in the runtime path. MIT-licensed, runs via npx.

From the LlamaIndex team, liteparse is a fast, open-source document parser written in Rust, effectively the self-hostable answer to their own paid LlamaParse. It turns PDFs and office docs into clean structured text for RAG and extraction pipelines, with no API calls and no per-page cost. Written in Rust for speed, it is a drop-in option when you would rather not ship documents to a hosted parser.

auto-subs generates subtitles entirely on-device and drops them straight into DaVinci Resolve, Premiere, and After Effects, so footage never leaves your machine for a cloud service. It is aimed at editors who want fast, private captioning inside the tools they already use. Written in TypeScript and trending daily, it is a practical privacy-first alternative to upload-based transcription for anyone cutting video locally.

Netflix senior engineer Tejas Chopra built Headroom after a $287 Claude Sonnet bill, estimating up to 90% of tokens hitting an LLM are redundant. It compresses tool outputs, logs, files, and RAG chunks before they reach the model, as a library, proxy, or MCP server. Open-source since January and still raw at v0.22, it has saved users an estimated $700,000 and freed 200 billion tokens.