Back to archive
Issue #103··46 min read·23 stories

Amazon Pours $200B Into AI; Uber's $10B Waymo Divorce

SpaceX IPO pulls to June 11. Scott Alexander on sigmoid copium. Agent finds 18-yo NGINX RCE.

Bloomberg got Jassy on the record: $200B of capex this year, $50B into OpenAI, $13B into Anthropic with a $20B option, and 60,000 corporate jobs gone in five years. SpaceX pulls its IPO forward to June 11 at a $1.75T valuation, while Uber publicly trashes Waymo and spends $10B+ on Rivian, Lucid, and Nuro to own its own fleet. Scott Alexander dismantles 'all exponentials become sigmoids' as AI-skeptic cope, and Tom Tunguz reads Anthropic's $10B months alongside Datadog's 80/20 AI ARR.

NEWS

Reuters reports SpaceX has accelerated its public debut. The IPO could price as soon as June 11 with trading on the Nasdaq under ticker SPCX, raising up to $75 billion at a $1.75 trillion valuation, eclipsing Saudi Aramco as the largest IPO ever. SpaceX runs 90%-plus of Western payload to orbit, with 10,000 Starlink satellites flying. Musk's proposed governance gives him near-unchecked control through super-voting Class B shares.

Datadog released Toto 2.0 on Hugging Face, spanning 4M to 2.5B parameters with open weights. It tops the BOOM, GIFT-Eval, and TIME benchmarks, sits on the Pareto frontier for quality versus size, and shows no saturation at 2.5B. It is 7x more parameter-efficient than Toto 1.0 and runs faster at inference. Trained only on observability and synthetic data, it still leads on general-purpose forecasting.

Uber has committed more than $10 billion to autonomous vehicles even as Waymo still operates on its platform in Austin and Atlanta. The mix includes a $500M equity stake in Lucid (now 11.5%) and a commitment to buy 35,000 Lucid Gravity SUVs, plus deals with Rivian and Nuro. Khosrowshahi keeps calling AV-only operators 'less reliable' while the CTO posts attack videos of Waymo on X.

Bloomberg Businessweek profiles Jassy five years into the CEO job. He has put $50B into OpenAI, $13B into Anthropic with a $20B option, and committed $200B in 2026 capex on AI data centres, custom chips, and satellites. He has also cut 60,000 corporate roles, killed Go stores and Amazon Fresh, and pitched Annapurna's Trainium as the way to undercut Nvidia. Insiders say the bottleneck on big decisions is Jassy himself.

TECHNICAL

Three words broke the author's eval system: 'be specific and detailed'. A hallucinated answer about 'context engineering invented at MIT in 1987' scored 0.525, above the passing threshold. The fix is to split faithfulness into two signals: attribution (where does the claim come from in source) and specificity (how detailed is it). High specificity plus low attribution is the signature of a confident hallucination that a single score misses every time.

Jang walks through AlphaGo's primitives: search, learning from experience, self-play, and how MCTS sidesteps the credit-assignment problem that haunts policy-gradient RL on LLMs. Where naive RL has to figure out which of 100k+ tokens earned the reward, MCTS hands you a strictly better action every move. Jang also kicks off an autoresearch loop on his project, which becomes a useful test of which research tasks LLMs can already automate.

Six hours of autonomous scanning, four confirmed remote memory corruption issues in NGINX, and a working RCE proof-of-concept against any server that uses rewrite and set directives. The critical bug (CVE-2026-42945, CVSS 9.2) is an unpropagated is_args flag during rewrite-and-set sequences that causes an undersized buffer allocation and a write past the heap boundary. The original mistake was introduced in 2008.

GitHub ran a focused accessibility agent over its own front-end changes and resolved 68% of the issues it caught. The top five problems were structural clarity for assistive tech, control naming, important-announcement awareness, text alternatives, and keyboard focus order. The pattern is interesting: not a general-purpose agent told to 'do accessibility', but a narrow-purpose agent reviewing PRs with one job, one rubric, and a measurable resolution rate.

Anthropic ships the first piece of its 'Claude Code at scale' series. The patterns hold across million-line monorepos, decades-old legacy stacks, dozens of microservices, and unlikely languages like C, C++, C#, Java, and PHP. The key design choice is no codebase index: Claude traverses the filesystem and greps the way an engineer would, because at large scale embedding pipelines can't keep up with active teams and the index is always stale.

TRMNL signed an annual ShipHero contract for warehouses in Georgia and Berlin. On March 31 it stopped working. After weeks of being ghosted by support and watching $12 postage cost $140, the founder screenshotted both ShipHero portals, fed them to Claude Design, and rebuilt the entire fulfillment stack for $100 in tokens. The new system handles holds, notes, pack flow, and courier purchase. What worked: matching the team's existing UI muscle memory.

Stop trying to enforce policy through prompts that the model might forget. Dabit walks through hook lifecycle points (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop, SessionEnd) where deterministic handlers inject context, block unsafe actions, validate results, and emit audit logs. Use prompts for guidance. Use hooks for behaviour that has to run every time. The model still picks how to implement a change. Hooks enforce the rules that don't depend on memory.

ANALYSIS

In April 2026, Salesforce announced Headless 360 with Benioff's line: 'No browser required. The API is the UI.' Translation: we ship the substrate, you assemble the software. The Works on My Machine essay argues every enterprise vendor will do some version of this in the next eighteen months. The work that solutions-engineering firms used to do is moving into the buyer's org, so every company gets its own forward-deployed engineering function whether it planned for one or not.

ACX dismantles the favourite AI-skeptic line. Yes, every exponential eventually becomes a sigmoid. No, that does not let you wave the AI trajectory away by drawing one in MS Paint. Alexander walks through the Sigmoid Misidentification Hall of Fame, including UN birth-rate projections that have been wrong for two decades. 'No process can keep growing forever' is technically true but rarely load-bearing for the specific case you care about.

Tunguz returns this week with concrete inference-cycle numbers, not just charts. Anthropic booked $9-10 billion in consecutive months. Google Cloud is growing 63% at an $80B run rate. At Datadog, 6,500 customers sending AI integration data, 20% of the base, now drive 80% of ARR. His read for any pre-AI software company: either resell inference or index your business to customers buying huge volumes of it.

John Gruber pushes back on Steven Levy's Wired piece arguing Apple's next CEO needs a killer AI product. The Apple playbook, Gruber argues, has never been to ship a technology. The iPod was about music, not MP3 files. iPhone defined the mobile era, but Apple skipped social media and was fine. Whether agents replace app-store interactions remains genuinely uncertain; the panic about Apple being 'behind' already assumes the conclusion.

A founder asked Doshi whether to expand surface area or sharpen the existing product after two incumbents entered her space. He told her to throw out the question. Wide-vs-deep, platform-vs-point, horizontal-vs-vertical: framings that feel strategic but mostly let smart people sound smart without referencing a real customer. The real question is which specific feature gets the person whose call you took yesterday to actually buy.

A year ago Goedecke explicitly didn't use AI to write whole PRs in areas he knew. Now he starts every change by asking an agent and usually pushes after a single editing pass. The shift came when agents stopped getting derailed mid-task and started recovering their own mistakes. He uses the GitHub Copilot CLI through tens of sessions a day. About thirty seconds is what it takes him to decide whether to keep an agent's first draft.

TOOLS

Your best Claude Code users have figured out skills, MCP configs, and slash commands that work. That knowledge sits on their machines. sx packages those assets into a team-wide vault so new developers inherit the playbook on day one, with scoped installs by org, team, repo, path, or person. Works with Claude Code, Cursor, Copilot, Gemini, Kiro, plus claude.ai and chatgpt.com through a relay.

A local debugger that streams every token, tool call, and span from your agent into a Vite UI as it happens, no polling. Then Claude Code reads the traces, writes evals against your codebase, and loops: run, observe failure, fix the code, re-run, until every assertion passes. Compatible with the Vercel AI, OpenAI Agents, Anthropic, Claude Agent, LangChain, LangGraph, CrewAI, and Mastra SDKs. Single-binary install.

A code-search library that gives agents the exact snippets they need without scanning whole files. Index and query a full repo end-to-end in under a second, with ~200x faster indexing and ~10x faster queries than a code-specialised transformer, at 99% of its retrieval quality. NDCG@10 of 0.854. Runs on CPU, no GPU, no API keys. Drop in as an MCP server or call from bash through AGENTS.md.

A stealth Chromium build with source-level fingerprint patches, so your browser-agent flows don't get blocked by Cloudflare, PerimeterX, and the rest of the bot-detection stack. Drop-in Playwright replacement. 11.7k stars on GitHub, climbing fast with 1,286 added in a day. Useful when you are scraping behind detection, running headless agents at scale, or testing whether your own bot mitigations actually work.

A TUI issue tracker that runs inside any Git repo with no central service, no Jira contract, and no Linear seat. Vim navigation, event-sourced state on isolated worktrees and state branches, conflict-aware sync that converges instead of merging. Plus an MCP server so coding agents can open, move, and close tickets the same way they read code. Two commands to install and run inside an existing repo.

Vercel Labs shipped Zero, a systems programming language pitched at AI agents rather than humans. The design centres on explicit effects, predictable memory, and structured compiler output so agents can reason about what their code will actually do. It compiles to native binaries with built-in dependency graphing, size analysis, and route mapping. Companion to the vercel-labs/zero-native Zig desktop shell from last week. Not stable yet, open for feedback.