Issue #63·Monday, March 23, 2026·28 min read·14 stories

Musk's $25B TERAFAB + 455 companies had fake audits

OpenAI races to 8,000 staff, a 397B model on a laptop, and LocalStack goes paid

Musk announced TERAFAB, a $25B joint venture between Tesla, SpaceX, and xAI to build chips in-house and break free from TSMC dependency. A leaked audit database exposed 455 companies with near-identical SOC 2 certifications. Plus Daniel Miessler on why the "humans are special" narrative is dangerous.

NEWS

Musk announces TERAFAB, a $25B chip factory uniting Tesla, SpaceX, and xAI

Tesla, SpaceX, and xAI are forming a $20-25B joint venture called TERAFAB to consolidate chip design and packaging under one roof in Austin, Texas. At full capacity, the facility would match 70% of TSMC's global output. AI5 chips will power Tesla vehicles and Optimus robots while D3 chips handle SpaceX's satellite constellation, with 80% of production directed toward space.

OpenAI plans to double workforce to 8,000 by end of 2026

OpenAI is racing to double its headcount after Anthropic captured 73% of first-time enterprise AI spending, up from 50% earlier this year. The hiring push centres on building out the Frontier agent platform and funding acquisitions. OpenAI projects $25B in 2026 revenue against Anthropic's $19B, but the enterprise gap has clearly spooked leadership.

We indexed the Delve audit leak: 533 reports, 455 companies, 99.8% identical

Someone built a searchable database of 533 leaked SOC 2 and ISO 27001 reports from auditor Delve, covering 455 companies including Replit, Ramp, Harness, and Temporal. The reports were 99.8% identical boilerplate, indicating Delve sold certifications as a service without conducting real audits. If your vendor's compliance badge came from Delve, it meant nothing.

TECHNICAL

Flash-MoE: Running a 397B parameter model on a laptop

A pure C and Metal engine runs Qwen3.5-397B-A17B on a MacBook Pro with 48GB RAM, hitting 4.4 tokens per second at 4-bit quantisation while streaming 209GB from SSD. No Python, no frameworks. The engine uses SSD-based expert streaming and FMA-optimised dequantisation kernels. The whole thing was built by a human and AI pair in 24 hours.

A visual guide to attention variants in modern LLMs

Sebastian Raschka compiled a 45-entry gallery of LLM attention mechanisms, each with a visual model card. It covers MHA, MQA, GQA, and the newer variants appearing in prominent open-weight architectures. A downloadable poster version is available. If you've lost track of how attention has splintered since the original transformer, this is a useful single-page reference.

Agentic RAG failure modes: retrieval thrash, tool storms, and context bloat

Three patterns kill agentic RAG systems. Retrieval thrash happens when the agent keeps searching without converging on an answer. Tool storms cascade retries until budgets are gone. Context bloat fills the window with low-signal content that drowns out what matters. The root cause across all three is the same: missing budgets, weak stopping rules, and zero observability.

How to test-drive Claude Code skills with MLflow

Claude Code skills guide LLM behaviour, but that behaviour is inherently unpredictable. This post walks through tracing skill execution with MLflow, writing judge checks against the traces, and refining the skill based on failing judges. The pattern mirrors test-driven development: define what correct looks like first, then iterate until the skill passes.

ANALYSIS

Exactly why and how AI will replace knowledge work

Daniel Miessler draws on 25 years at Apple, Robinhood, and HP to argue that knowledge work is decomposable into tasks, and most tasks are repeatable patterns AI already performs well. The "humans are special" narrative breaks down when you examine what the work actually consists of. His core warning: complacency about your role's durability is the real risk.

Reports of code's death are greatly exaggerated

Vibe coding gives an illusion of precision that collapses at scale. Dan Shipper's text editor went viral then crashed because live collaboration is, in practice, brutally hard to specify in natural language. Programming sharpens thinking through iteration, forcing you to confront edge cases English specs gloss over. AI accelerates the craft but does not eliminate the need for it.

Why recursive AI self-improvement has diminishing returns

Nathan Lambert at Interconnects argues that the popular framing of recursive self-improvement oversimplifies what's actually happening. Two or three labs are consolidating into an oligopoly, and coding assistants are transforming research workflows. But gains plateau quickly outside code and CLI tasks, and it remains unclear which other domains will see the same compounding.

Profiling Hacker News users based on their comments

Simon Willison fed the last 1,000 comments from individual HN accounts into an LLM with the prompt "profile this user." The results were, in his words, startlingly effective. The experiment shows how much a model can infer about someone from public comment history alone, something Willison himself describes as mildly dystopian.

TOOLS

Atomic turns your Markdown notes into a semantic knowledge graph

Atomic takes Markdown notes and builds a semantic knowledge graph by auto-chunking, embedding, tagging, and linking content by similarity. It includes wiki synthesis with citations, a spatial canvas for visual exploration, and agentic RAG chat. Supports OpenRouter and Ollama for model backends. Available on desktop, server, and iOS.

Floci is the MIT-licensed AWS emulator built after LocalStack went paid

LocalStack paywalled in March 2026 with auth tokens, dropped CI support, and frozen security patches. Floci stepped in as an MIT alternative with 24ms startup vs 3.3 seconds, 13MiB idle memory vs 143MiB, and a 90MB image vs 1GB. It passes all 408 SDK tests and supports services LocalStack Community never did, including Cognito, RDS, and API Gateway v2.

Grafeo tops the LDBC benchmark as an embeddable Rust graph database

Grafeo posted the fastest results on the LDBC Social Network Benchmark while supporting six query languages: GQL, Cypher, Gremlin, GraphQL, SPARQL, and SQL/PGQ. It handles both LPG and RDF models, includes HNSW vector search, and provides ACID transactions with MVCC. The Rust core ships with bindings for Python, Node, Go, C#, Dart, and WASM.