Issue #60·Wednesday, March 18, 2026·32 min read·16 stories

GPT-5.4 Mini and Nano, Nvidia's 60-Exaflop AI Factory

AI subscriptions can't cover inference costs. Martin Fowler names a new pattern. A fly brain goes digital.

Two OpenAI model drops yesterday: Mini approaches GPT-5.4 performance at 2x the speed, Nano undercuts Gemini 3.1 Flash-Lite on price. Nvidia's Vera Rubin platform squeezes seven chips into 40-rack PODs running at 60 exaflops. Also: a 10,000-word argument for why AI's subscription model can't cover its own inference bill, and someone uploaded a fruit fly brain to a virtual body that started hunting for sugar.

▲NEWS

4 stories

GPT-5.4 Mini and Nano

OpenAI released two smaller GPT-5.4 variants yesterday. Mini approaches full GPT-5.4 performance on coding and reasoning at 2x the speed, making it a direct competitor to Claude Sonnet for high-volume workloads. Nano is the cheapest model in OpenAI's lineup, priced below Gemini 3.1 Flash-Lite, targeting classification and data extraction tasks where cost per token matters most.

Read full story→

OpenAI Expands Government Footprint With AWS Deal

OpenAI will use AWS GovCloud and Classified Regions to serve US federal agencies, including classified workloads. The deal gives OpenAI access to government buyers it couldn't previously reach and puts it in direct competition with AWS-backed Anthropic for the same contracts. The partnership was reportedly in the works for months.

Nvidia's 60-Exaflop Vera Rubin POD: Seven Chips, One AI Factory

The Vera Rubin platform integrates seven distinct chips into a single AI factory: the Rubin GPU, Vera CPU, and Groq 3 LPU among them. The system scales to 40-rack PODs delivering 60 exaflops for end-to-end AI workloads, from training through inference. Nvidia positions this as the next step beyond Blackwell for companies building at data centre scale.

Boffins Hook Fly Brain Map to Virtual Body, Which Starts Looking for Sugar

Researchers built the first digital simulation of a fruit fly brain connected to a virtual body. The connectome-based model walks, grooms, and responds to simulated sugar on its own without being explicitly programmed to do so. Some experts suggest the behaviour resembles machine learning applied to a brain structure rather than a full biological replica, but the result is striking either way.

⚙TECHNICAL

4 stories

Meta's AI Agent Doubled Ad Model Accuracy With 5x Less Engineering

Meta's Ranking Engineer Agent (REA) automates hypothesis generation, training, debugging, and iteration for ads ranking models. The result: three engineers now manage eight models, up from two engineers per model previously. REA uses a hibernate-and-wake mechanism for long-running async workflows with human oversight at strategic decision points.

Martin Fowler: 'Context Anchoring' Gives AI Coding Sessions Persistent Memory

Fowler's second pattern this week, after 'Supervisory Engineering' in yesterday's edition. This one externalises AI conversation decisions into a living document, like an Architecture Decision Record that updates in real time. It captures reasoning, rejected alternatives, and open questions so context survives beyond the LLM's window and across session boundaries.

How Reddit Migrated Petabyte-Scale Kafka from EC2 to Kubernetes

Reddit moved its entire Kafka fleet to Kubernetes with zero downtime. A DNS facade layer let them run a mixed EC2/Kubernetes cluster simultaneously during migration. They forked Strimzi, moved the control plane to KRaft, and designed every step to be reversible. The key lesson: abstraction layers that let you run old and new infrastructure side by side are worth the upfront cost.

Simon Willison: How Coding Agents Use Subagents to Manage Context Limits

Subagents are fresh LLM instances dispatched for specific goals with their own clean context window. Claude Code's 'Explore' agent is the working example: it maps a codebase without polluting the parent agent's context. Willison walks through parallel, specialist, and review subagents, explaining when each pattern helps and when over-fragmentation hurts.

◈ANALYSIS

4 stories

Why AI's Subscription Model Can't Cover the Cost of Inference

A 10,000-word argument that the current AI boom rests on unsustainable economics. The core claim: subscription pricing fundamentally can't cover inference costs, and financial reporting from major labs is too opaque to tell how bad the gap is. The Uber comparison is sharp, both subsidise usage to build market share, but AI has no clear path to unit economics. Data centre investments look increasingly speculative.

Stratechery Interviews Jensen Huang on CPUs for Agents, Hybrid Models, and Selling Chips to China

Ben Thompson's post-GTC interview covers a lot of ground. Huang explains why Nvidia sees CPUs as critical for agentic workflows, not just GPUs, and how their Nemotron 3 uses a hybrid transformer-SSM architecture for both intelligence and efficiency. The conversation also gets into competition dynamics under the China export cap and Huang's frustration with doomers influencing Washington policy.

Every Layer of Review Makes You 10x Slower

The bottleneck in software development isn't coding speed, it's review and approval layers that compound latency at roughly 10x per stage. AI coding tools accelerate the writing step but don't touch the coordination overhead that actually slows delivery. The argument: reduce review stages and give small teams end-to-end ownership of quality instead of stacking more approval gates on faster code.

OpenAI Has New Focus (on the IPO)

Om Malik reads OpenAI's rapid product announcements, Sora, Atlas, hardware, as narrative-building for investors, not users. The real race is to public markets before Gulf sovereign wealth fund attention shifts elsewhere. Anthropic's Claude Code revenue growth is the benchmark everyone's watching as both companies position for IPOs.

⚒TOOLS

4 stories

Antfly: Text, Vector, and Graph Search Over Multimodal Data in a Single Go Binary

A distributed search engine in Go that handles text, images, audio, and video in one system. Antfly auto-generates embeddings, builds graph edges, and ships with built-in RAG agents for retrieval-augmented generation. It also includes a Postgres extension (pgaf) for existing database integration and React components for building search UIs.

LangChain's Open SWE Brings Stripe-Style Internal Coding Agents to Open Source

An open-source framework for building internal coding agents, modelled on what Stripe, Ramp, and Coinbase built in-house. Open SWE provides isolated cloud sandboxes, subagent orchestration, and integrations with Slack, Linear, and GitHub out of the box. Built on LangGraph and Deep Agents, the idea is that teams can customise the components rather than building from scratch.

NVIDIA OpenShell: Define What Your AI Agents Can Access With YAML Policies

A sandboxed runtime for autonomous AI agents from NVIDIA. Declarative YAML policies control what agents can touch: file access, network activity, and credential usage. Currently in alpha, it runs on a K3s Kubernetes cluster in Docker with GPU acceleration and works with multiple LLM providers. Designed for the problem of agents running unsupervised on real infrastructure.

Terminal Sessions on an Infinite 2D Canvas, GPU-Rendered in Rust

Horizon lets you manage terminal sessions as panels on a boundless 2D surface you can pan and zoom. Organise by project, use workspaces for grouping, and navigate with a minimap. It has native integration with Claude Code and Codex, plus built-in Git status monitoring. GPU-rendered in Rust for the performance side of things.