Back to archive
Issue #91··40 min read·20 stories

$130B AI Capex Sets a Record. OpenAI Explains the Goblins.

Plus: Nature finds warm LLMs fail more, Linux 7.0 wrecks PostgreSQL, and 44 Rust CVEs

Amazon, Google, Microsoft, and Meta combined for $130.65 billion in Q1 capital expenditure on AI data centres, setting another record. OpenAI published a technical post-mortem explaining why GPT-5.5 keeps talking about goblins, tracing the behaviour to a reward signal leak from personality training. A Nature study found that training LLMs for warmth reduces accuracy by 10–30 percentage points.
NEWS

Amazon, Google, Microsoft, and Meta reported a combined $130.65 billion in Q1 2026 capital expenditure, largely on AI data centres. That figure is three times the cost of the Manhattan Project and 71% higher than the same quarter last year. Meta raised its full-year forecast to $125–145 billion, up from $115–135 billion. Only the wealthiest companies on the planet can afford to lead this race.

NVIDIA released Nemotron 3 Nano Omni, an open 30B-parameter multimodal model that processes text, images, audio, video, and documents in a single pass. The mixture-of-experts design activates only 3 billion parameters per inference, delivering 9x higher throughput than comparable open omni-modal models. It tops six leaderboards for document intelligence and video understanding. Palantir, Foxconn, and Oracle are among early adopters.

Starting with GPT-5.1, OpenAI’s models developed an escalating habit of mentioning goblins, gremlins, and other creatures in responses. The root cause was the “Nerdy” personality’s reward signal, which scored creature metaphors higher in 76.2% of training datasets. That bias transferred to non-Nerdy contexts through RL and SFT data reuse, creating a feedback loop across model generations. OpenAI retired the personality and filtered creature-words from training data.

Mistral released Workflows in public preview, a production-grade orchestration layer built on Temporal for running multi-step AI processes at scale. The architecture separates orchestration from execution, keeping customer data within their perimeter while the control plane runs in the cloud. The engine already runs millions of daily executions. Over 40% of agentic AI projects are projected to fail by 2027 due to operational complexity, and Workflows targets that gap directly.

AWS announced GPT-5.4 and Codex on Amazon Bedrock, ending the forced choice between AWS infrastructure and OpenAI models. The bigger signal is the silicon underneath. OpenAI committed to roughly 2 gigawatts of Trainium capacity, while Anthropic locked in over $100 billion across Trainium2 through Trainium4. Eight days apart, the two announcements confirm AWS custom chips are becoming the default substrate for frontier AI inference.

TECHNICAL

A researcher captured ChatGPT’s ad delivery and merchant tracking from live mobile traffic. The backend injects structured ad units into the SSE response stream while the model is still generating. Each ad carries four Fernet-encrypted click tokens linking impressions to merchant-side conversions. On the merchant side, a tracking SDK called OAIQ runs in the browser and reports product views back to OpenAI. The system appears contextual, not behavioural.

Addy Osmani lays out the engineering required for agents that maintain progress across many sessions, sandboxes, and days. The core challenges are persistence, recovery, and verification outside the model’s context window. The post covers the “Ralph loop” pattern for structured handoffs, workspace hygiene between sessions, and why declaring “task complete” prematurely is the most common agent failure mode. Practical enough to build with today.

Dwarkesh Patel’s blackboard lecture with Reiner Pope, CEO of chip startup MatX, walks through the equations governing LLM training and inference. Pope derives what the labs are doing from public API prices, chip specs, and scaling laws. The session covers batch size optimisation, KV cache arithmetic, and why inference economics differ sharply from training. Few people understand the full stack from silicon to model architecture as well as Pope.

An AWS engineer discovered PostgreSQL throughput halved on a 96-vCPU Graviton4 machine after upgrading to Linux 7.0. Profiling showed 55% of CPU time stuck in a single spinlock. The root cause: Linux 7.0 removed PREEMPT_NONE, the kernel’s “never interrupt” scheduler mode that database servers relied on. The article traces the regression from kernel scheduling theory through PostgreSQL’s buffer management to the patch that restored performance.

Canonical disclosed 44 CVEs in uutils, the Rust reimplementation of GNU coreutils shipping in Ubuntu since 25.10. None were caught by the borrow checker, clippy, or cargo audit. The largest cluster involves TOCTOU race conditions on file paths, where Rust’s standard library resolves paths from scratch on every syscall. The findings prompted Canonical to keep cp, mv, and rm as GNU binaries in the 26.04 LTS.

Apple researchers introduced Sonata, a lightweight adapter that predicts whether a query needs extended reasoning before the model starts thinking. The adapter uses self-consistency across multiple reasoning paths as its training signal, learned offline from hidden representations during prefill. Tested across Qwen, GPT-OSS, and InternLM models, Sonata reduced thinking tokens by 20–80% with no accuracy loss, or improved accuracy by up to 5% at the same token budget.

ANALYSIS

Most AI infrastructure spending targets memory bandwidth, with SK Hynix and Micron sold out of 2026 HBM capacity. But diffusion models are emerging for text and reasoning workloads, not just image generation, processing all tokens simultaneously rather than sequentially. That shifts the bottleneck from memory bandwidth to raw compute. Google has already split its 8th-generation TPUs into separate training and inference lines in anticipation.

Researchers trained five language models to produce warmer responses and measured a consistent trade-off. Warm models showed substantially higher error rates on consequential tasks, with accuracy drops of 10–30 percentage points. The effect worsened when users expressed vulnerability, exactly the conditions where warmth-trained models are deployed most often. The findings challenge the assumption that personality tuning is cosmetic rather than functional.

Running the Holistic Agent Leaderboard cost roughly $40,000 for 21,730 agent rollouts across 9 models and 9 benchmarks. A single GAIA benchmark run on a frontier model costs $2,829 before caching. Exgentic measured a 33x cost spread across agent configurations on identical tasks, isolating scaffold choice as a first-order cost driver. For small models, evaluation compute now exceeds pretraining compute across the full development cycle.

A diabetes researcher sent 13 food photos to four AI models over 500 times each, totalling 26,904 queries at the lowest randomness settings. Every model returned different carbohydrate estimates for the same photo. Claude Sonnet 4.6 clustered tightly with 2.4% median variation, while Gemini 2.5 Pro spanned 55g to 484g on a single paella photo. That 429g spread translates to 42.9 units of insulin at standard dosing.

TOOLS

SyncVibe opens a room, shares an invite code, and lets developers code together with their own AI agents. Each participant’s Claude, Codex, or Gemini runs locally on their machine through MCP. The relay forwards chat between participants, and @mentioning a teammate’s AI assigns it tasks visible to the whole room. Your repo never leaves your laptop, your agent runs with your API keys, and only the chat is shared.

Researchers from UIUC and Snowflake created AutoSP, a compiler within DeepSpeed that converts standard transformer training code into multi-GPU sequence-parallel code with minimal changes. Users import the library and compile. The tool automates input partitioning, communication collectives, and compute overlap for both forward and backward passes. It extends maximum trainable context length with little runtime overhead versus hand-written baselines.

This Chrome extension uses Transformers.js to run Google’s Gemma 4 model entirely in the browser with no external API calls. It searches across open tabs, queries browsing history semantically, and summarises the current page through natural language commands. The developer declares no data collection or transfer. At 5.77 MB and 100 users so far, it is early but demonstrates what on-device browser AI looks like in practice.

Facebook Research open-sourced Sapiens 2, a family of high-resolution vision transformers pretrained on 1 billion human images. The models achieve state-of-the-art performance on pose estimation, body-part segmentation, surface normal estimation, and pointmap estimation. Architectures range from 0.1B to 5B parameters, all using patch size 16 at 1024x768 resolution. The standalone backbone file is self-contained and can be dropped directly into any project.

Meta built a command-line tool for its Marketing API that covers campaign creation, performance analytics, catalogue management, and conversion tracking without custom code. Output comes in three formats: table for humans, JSON for piping to jq, and tab-separated for shell tools like awk and cut. The CLI supports unattended operation via exit codes and environment variables, making it usable by both developers and AI agents in CI/CD pipelines.