Back to archive
Issue #70··30 min read·15 stories

Claude Code's Source Leaked + The Subprime AI Crisis

Apple's RAM crisis edge. Qwen3.5-Omni codes from speech. Plus Nvidia's $2B Marvell play.

Anthropic accidentally shipped Claude Code's full source in a .map file, revealing anti-distillation fake tools, frustration detection via regex, and an unreleased autonomous agent mode. Also today: Ed Zitron draws a parallel between AI spending and the subprime mortgage crisis, Apple finds opportunity in the global RAM shortage, and Qwen3.5-Omni writes code from speech without being trained to. Quick note: yesterday's edition wasn't emailed but you can read it here.
NEWS

Alibaba released Qwen3.5-Omni, an omnimodal model that processes text, images, audio, and video across 256K token contexts. The unexpected finding: it writes code from spoken instructions and video input without explicit training on that task, suggesting emergent behaviour in natively multimodal pre-training. It claims state of the art across 215 audio benchmarks and supports 74 languages for speech. No public weights this time.

Nvidia invested $2 billion in Marvell Technology, connecting one of the two dominant custom ASIC design houses to its proprietary NVLink Fusion interconnect. Marvell designs custom chips for AWS (Trainium), Microsoft, and Google, all intended as alternatives to Nvidia GPUs. Under the deal, Marvell provides custom XPUs and NVLink-compatible networking while Nvidia supplies Vera CPUs and ConnectX NICs. The move pulls a key competitor deeper into Nvidia's ecosystem.

AI data centres now consume 70% of high-end memory production. Consumer RAM prices jumped 50% in Q4 2025 with another 40-50% rise expected this quarter, and SK Hynix warns shortages could last until 2030. Apple's unified memory architecture let it ship the $600 MacBook Neo with 8GB that doesn't feel compromised. ASUS's CFO called it "a shock to the entire market."

Anthropic accidentally shipped a .map file with its Claude Code npm package, exposing the full readable source. The standout finding: an anti-distillation system that injects fake tool definitions to poison training data scrapers. The code also reveals frustration detection via regex, an "undercover mode" where the AI conceals its nature, and KAIROS, an unreleased autonomous agent mode. This is Anthropic's second accidental exposure in a week.

TECHNICAL

Nango used an OpenCode agent to create over 200 API integrations across Google Calendar, Drive, Sheets, HubSpot, and Slack. Each integration that previously took an engineer a week now takes 15 minutes and under $20 in tokens. Their key lesson: agents are fast at implementation but demand strict post-completion verification. They also found that letting agents run freely at first, then debugging root causes, outperformed micromanaging each step.

Gergely Orosz interviews Philip Kiely from inference startup Baseten on the engineering discipline behind serving LLMs in production. The deep dive covers five key approaches: quantisation, speculative decoding, caching, parallelism (tensor and expert), and disaggregation (separating prefill from decode). Open models like Kimi 2.5 are driving demand because companies can now tune inference performance themselves instead of depending on closed-model providers.

Rahul Garg, writing on Fowler's site, proposes that AI instructions should be treated as infrastructure, not tips shared on Slack. Two developers on the same team using the same tool produce different results because their prompts encode different standards. The fix: version, review, and share instructions for generation, refactoring, security, and reviews as shared artifacts. Fowler's second appearance in recent editions, after "Context Anchoring" in Issue 60.

Meta built DrP, a platform where engineers turn investigation expertise into software components that run automatically, get tested through code review, and improve over time. It now operates across 300 teams and executes 50,000 automated analyses daily. The insight: runbooks go stale and tribal knowledge walks out when people leave, but executable diagnostics stay current because they break visibly when systems change. DrP treats debugging as infrastructure, not documentation.

ANALYSIS

Vercel shared an internal talk making the case that green CI is no longer proof of safety. Agent-generated code passes tests and follows conventions but has zero production context. It doesn't know your Redis is near capacity or that a feature flag rollout will change downstream load. Their proposal: "self-driving deployments" with continuous validation, automated rollbacks, and executable guardrails that encode the production knowledge agents lack.

Clifford Ressel argues the AI coding discourse is watching the wrong scoreboard. Every task has two costs: implementation (Ci) and verification (Cv). When verification is cheap, delegation pays off, but when it requires production context or implicit constraints, agents create "verification debt" that compounds silently. His framework maps tasks on a Ci/Cv grid to decide what to delegate, what to pair-program, and where to invest in verification infrastructure.

Ed Zitron draws a parallel between the AI spending bubble and the 2008 financial crisis. His argument: companies are locked into AI infrastructure contracts with escalating costs, much like adjustable-rate mortgages that reset upward. Revenue from AI products hasn't kept pace with the capital committed, and the gap between what's been promised to investors and what's been delivered keeps widening. A follow-up to his earlier analysis of data centre overcapacity.

Jack Dorsey and Roelof Botha argue that hierarchy exists because humans can only manage 3-8 direct reports, and AI changes that. Block is restructuring around "intelligence layers" where AI systems route information and pre-compute decisions, replacing middle management functions that have existed since the Prussian military invented the role. The piece traces organisational design from Roman legions to corporate org charts.

TOOLS

Coasts is a CLI tool for running multiple isolated development environments on a single machine using Git worktrees and Docker Compose. Each agent gets its own stack with networking, volumes, and local observability. It's agnostic to AI providers and agent harnesses, works offline, and requires no hosted service. Check out one environment at a time to bind canonical ports, or use dynamic ports to monitor any worktree's progress.

pg_textsearch brings BM25 relevance-ranked search to Postgres with a clean syntax: ORDER BY content <@> 'search terms'. It offers configurable BM25 parameters (k1, b), works with Postgres text search configurations for multiple languages, and uses Block-Max WAND optimisation for fast top-k queries. Parallel index builds handle large tables. Production-ready at v1.0.0, covering PostgreSQL 17 and 18.

Scotty is an SSH task runner that lets you define deploy scripts in plain bash, execute them on remote servers, and watch every step with beautiful terminal output. Features include mid-deploy pausing, Blade and bash task formats, and real-time output streaming. Designed as a modern replacement for Laravel Envoy with better control over execution flow. Drop a Scotty.sh file in your repo and deploy from your terminal.