Back to archive
Issue #96··38 min read·19 stories

Anthropic Grabs SpaceX GPUs, Commits $200B to Google

Murati testifies Altman lied about safety. DeepSeek hits $45B. Goldman says FOMO drives AI capex.

Anthropic signed deals with SpaceX and Google Cloud totalling $200 billion in committed compute, then doubled Claude rate limits across the board. Mira Murati told a court under oath that Sam Altman lied about safety clearance for an OpenAI model. Goldman Sachs published new research arguing the trillion-dollar AI infrastructure buildout is driven more by competitive insecurity than demonstrated returns.
NEWS

Anthropic signed a deal for all 220,000 GPUs at SpaceX's Colossus 1 data centre, adding 300+ megawatts of capacity within the month. The company also reportedly committed $200 billion to Google Cloud over five years for chips and infrastructure. CEO Dario Amodei attributed the compute scramble to 80-fold annualised growth in Q1. Claude Code and API rate limits have been doubled across all paid tiers, effective immediately.

Nvidia is partnering with Corning to build three new manufacturing plants in North Carolina and Texas dedicated to optical technologies for AI data centres. The deal gives Nvidia the right to invest up to $3.2 billion in Corning and includes warrants for 15 million shares. The factories will create at least 3,000 jobs and increase Corning's US optical manufacturing capacity tenfold. Corning shares climbed 12% on the announcement.

SGLang, the open-source inference engine with 25,000+ GitHub stars and 400,000+ GPUs in deployment, has raised a $100 million seed round led by Accel and co-led by Spark Capital. The project has incorporated as RadixArk and is positioning itself as core infrastructure for serving frontier models. SGLang's radix attention and continuous batching have made it a default choice for teams running large-scale inference workloads.

Chinese AI lab DeepSeek is in talks for its first venture capital round, with the valuation surging from $20 billion to $45 billion in weeks. The round is reportedly led by China's state chip investment fund, with Tencent and Alibaba also in talks. Founder Liang Wenfeng opted to raise in order to offer employees shares and stem researcher poaching. DeepSeek has been optimised to run on Huawei chips.

In a video deposition during the Musk v. Altman trial, former OpenAI CTO Mira Murati testified under oath that Sam Altman falsely told her a new AI model had been cleared to skip the deployment safety board. She confirmed with general counsel Jason Kwon that Altman's claim did not match reality, and routed the model through the board anyway. Murati also said Altman undermined her ability to manage the organisation.

Google employees are dogfooding an AI agent codenamed Remy that runs inside the Gemini app and integrates across Google services, according to an internal document seen by Business Insider. Remy is described as a proactive assistant that monitors for relevant events, handles multi-step tasks, and learns user preferences over time. The timing aligns with Google I/O later this month, where agents are expected to be a centrepiece.

DeepMind's drug discovery spinoff has released its Drug Design Engine, a system that moves past protein structure prediction into end-to-end drug candidate generation. The engine combines molecular generation, property prediction, and synthesis planning into a single pipeline. Isomorphic says the approach has already produced candidates in active pharmaceutical partnerships, representing over a decade and roughly $6 billion worth of deals signed to date.

TECHNICAL

Meta's recommendation systems replicate user embeddings for every candidate item during inference, wasting memory bandwidth at scale. In-Kernel Broadcast Optimization (IKBO) eliminates this by fusing broadcast logic directly into interaction kernels, so replicated tensors never materialise. The approach achieved a 4x speedup on the Linear Compression kernel and a 6.4x throughput gain on Flash Attention on H100 GPUs. IKBO is deployed across Meta's full recommendation funnel.

Research into hallucination detection across multi-agent LLM pipelines found that retrieval quality is the single most reliable predictor of degraded output. When retrieval breaks down, the model does not flag uncertainty. It extrapolates from incorrect context with the same fluency it applies to correct outputs. The practical takeaway: optimise embedding models, chunking strategy, hybrid search, and re-ranking before scaling the LLM.

GitHub's engineering team explains why traditional CI breaks down for agentic workflows. When Copilot Agent Mode takes a different path than the recorded script, the test fails even though the outcome is correct. Their Trust Layer validates essential outcomes rather than rigid execution paths, using lightweight assertions that check what happened without prescribing how. The approach is designed for CI pipelines where agents interact with UIs, browsers, and IDEs.

Google's AI and Infrastructure team deployed specialised multi-agent systems to migrate production ML models from TensorFlow to JAX. Single-agent coding assistants failed at this scale because they lost context across thousands of lines, hallucinated APIs, and could not maintain mathematical equivalence across files. The multi-agent approach splits the migration into coordinated subtasks with scoped context. Sundar Pichai highlighted the 6x speedup at Google Cloud Next.

Databricks' monitoring infrastructure tripled in a year to 5 billion active timeseries ingesting 10 trillion samples daily. Off-the-shelf solutions could not handle the cardinality explosion from serverless and AI workloads across 70 cloud regions. The team rebuilt on a custom Thanos fork with self-healing, self-scaling regional stacks that require minimal oncall intervention. The post details the architectural decisions that made the system sustainable at this scale.

ANALYSIS

Anant Jain warns that engineers are becoming the human equivalent of ChatGPT wrapper startups. They prompt Claude Code for a plan, push 50 PRs a day, paste agent output into Slack, and struggle to justify architectural decisions when pressed. The failure mode is subtle because the output is good enough that problems surface slowly. Jain argues the defensible skill is now taste and sharp judgement, not output volume.

Willison coined the distinction between vibe coding (not looking at the code) and agentic engineering (responsible AI-assisted development). In a new podcast appearance, he admitted the boundary is blurring in his own work. He finds himself approving agent-generated diffs without fully reading them, relying on test suites and gut checks instead of line-by-line review. The convergence is uncomfortable precisely because the output quality has gotten good enough to mask the shift.

Two Goldman Sachs research teams independently examined the AI infrastructure buildout and reached the same conclusion: spending is outpacing returns. Head of global equity research James Covello updated his 2024 "Too Much Spend, Too Little Benefit?" thesis, noting that companies have collectively committed over a trillion dollars to AI infrastructure while measurable productivity gains remain thin. The firm argues FOMO and competitive insecurity are stronger motivators than demonstrated returns.

Lee explores what LLMs' uncanny ability to identify authors from unpublished prose reveals about implicit knowledge. ChatGPT correctly identified him from a 2012 essay but could not explain how. Lee argues this exemplifies a broader pattern: LLMs excel at pattern matching across massive corpora but lack the causal reasoning and hypothesis generation that define scientific thinking. The gap between recognising patterns and understanding why they hold is larger than benchmarks suggest.

TOOLS

The Claude API now includes a memory tool that lets agents store and retrieve information through a client-side file directory. Agents can create, read, update, and delete memory files that persist between conversations, pulling context on demand instead of loading everything upfront. This is the key primitive for just-in-time retrieval in long-running workflows where loading all context at once would overwhelm the window. Storage and security remain under developer control.

Luma Labs released UNI-1.1, an image generation API with dedicated reasoning and generation endpoints. The model interprets intent before rendering, so the first pass is typically usable without retries. It handles up to nine reference images per turn, supports multilingual rendering, and costs less than half of comparable APIs. SDKs are available for Python, JavaScript, and Go, with production-grade rate limits from day one.

Point designlang at any URL and it reads the design system off the live DOM, emitting 17+ files including DTCG tokens, Tailwind config, shadcn themes, Figma variables, typed React component stubs, and paste-ready prompts for v0, Lovable, and Cursor. It also captures layout patterns, responsive behaviour across four breakpoints, hover and focus states, and WCAG contrast scores. Run npx designlang clone to generate a working Next.js starter from any site.