Issue #95·Wednesday, May 6, 2026·44 min read·22 stories

GPT-5.5 Instant Ships. Apple Opens iOS to Claude.

Coinbase replaces managers with AI pods. Huawei hits $12B as Nvidia exits China.

OpenAI updated ChatGPT's default model to GPT-5.5 Instant, cutting hallucinations by 52% on high-stakes prompts. Apple plans to let users choose between Claude, Gemini, and ChatGPT across iOS 27 this fall. Meanwhile, Coinbase cut 14% of staff and replaced managers with AI-native pods, and Anthropic launched 10 finance agents alongside its $1.5B Wall Street joint venture.

NEWS

Huawei Expects $12B in AI Chip Revenue as Nvidia's China Share Hits Zero

Huawei projects $12 billion in AI processor revenue for 2026, a 60% jump from last year, driven by orders from Alibaba, ByteDance, and Tencent. The surge followed DeepSeek V4's optimisation for Huawei's Ascend architecture rather than Nvidia's CUDA ecosystem. Nvidia CEO Jensen Huang confirmed his company's Chinese AI accelerator market share has collapsed to zero. Morgan Stanley estimates China's domestic AI chip market could reach $67 billion by 2030.

OpenAI Ships GPT-5.5 Instant as the New ChatGPT Default

OpenAI updated ChatGPT's default model to GPT-5.5 Instant, available to all users. Internal evaluations show 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance, with a 37.3% reduction on conversations users had flagged for errors. The update also improves image analysis, STEM questions, and web search decisions. Hundreds of millions of daily users get a meaningfully more accurate model without changing plans.

Anthropic Launches 10 Finance Agents and Debuts Claude Opus 4.7 for Wall Street

Anthropic released ten agent templates for financial services covering pitchbooks, KYC screening, and month-end close. They ship as Claude Cowork plugins, Claude Code skills, and Managed Agents cookbooks. Claude Opus 4.7 leads Vals AI's Finance Agent benchmark at 64.37%. The launch caps a 48-hour push that included a $1.5 billion JV with Blackstone and Goldman Sachs to embed Claude in mid-market companies.

Google DeepMind Workers Vote to Unionise Over Pentagon AI Deal

Staff at Google DeepMind's London headquarters voted to join two unions, citing the company's ties to the US military. The vote came after Google signed a deal giving the Pentagon access to its AI models in classified settings, despite a letter from 600+ employees urging against it. Workers are demanding commitments against weapons development, surveillance technology, and stronger whistleblower protections. If Google refuses voluntary recognition, the unions will seek UK arbitration.

Subquadratic Debuts a 12-Million-Token Context Window With $29M in Seed Funding

Miami startup Subquadratic launched with a model architecture it says scales linearly with context length. The company claims 92.1% needle-in-a-haystack retrieval at 12 million tokens, an 83% score on MRCR v2 (beating GPT-5.5's 74%), and 82.4% on SWE-bench. Independent verification is pending, and the claims are large for a seed-stage company. If they hold up, a 50-million-token window is planned next.

Apple Will Let Users Choose Between Claude, Gemini, and ChatGPT in iOS 27

Apple plans to open Apple Intelligence to rival AI models in iOS 27 this fall, Bloomberg reports. Users would choose which model powers Siri, writing tools, and image generation. The current ChatGPT-only integration requires a clunky extra step and has seen poor adoption. The shift signals Apple is positioning itself as a platform for competing AI models rather than building its own.

White House Considers Vetting AI Models Before Public Release

The Trump administration is exploring government oversight of frontier AI models before they reach the public, the New York Times reports. The policy shift follows cybersecurity concerns raised by Anthropic's Mythos model, which demonstrated capabilities that alarmed national security officials. An executive order is being drafted to establish safety standards and review procedures. If implemented, it would mark the first time the US required pre-release vetting of commercial AI systems.

Coinbase Cuts 14%, Replaces Managers With Player-Coaches and AI-Native Pods

Coinbase CEO Brian Armstrong laid off 700 people and restructured the company around flat hierarchies capped at five layers. Pure managers are out. In their place: player-coaches and AI-native pods where one-person teams direct agents doing the work of engineers, designers, and product managers. "We are not just reducing headcount," Armstrong wrote. "We're fundamentally changing how we operate: rebuilding Coinbase as an intelligence, with humans around the edge aligning it."

TECHNICAL

A Self-Healing Layer That Catches RAG Hallucinations in Real Time

RAG retrieved the right document. The LLM still contradicted it. This Python library catches five failure patterns: numeric contradictions, fake citations, negation flips, answer drift, and confident-but-ungrounded responses. Three healing strategies fix bad answers before users see them. It runs in pure Python under 50ms with no external APIs or LLM judges. The deeper problem: RAG hallucination is worse than standard LLM hallucination because users have every reason to trust the answer.

Computer Use Is 45x More Expensive Than Structured APIs

Reflex benchmarked two approaches to letting an AI agent operate the same admin panel: a vision agent driving the UI through screenshots, and an API agent calling HTTP endpoints directly. Same Claude Sonnet, same task, same dataset. The API agent completed it in 8 calls. The vision agent could not finish at all, failing on pagination and multi-step cross-entity lookups. When adjusted for comparable tasks, the vision path cost 45x more per successful operation.

How Sprig Cut Read Latency 4x After Redis and ClickHouse Hit a Wall

Sprig tracks 15 billion visitors, 75 billion user attributes, and 1.3 trillion events. They started on Postgres (Aurora), hit scale limits, moved read-heavy workloads to Redis and ClickHouse, then hit those walls too. ScyllaDB replaced both layers, cutting read latency 4x while handling 20,000-40,000 events per second. The talk walks through each migration phase, what broke at each scale threshold, and why the team chose a wide-column store over further Postgres optimisation.

How to Work and Compound With AI

Eugene Yan lays out a system for making AI collaboration compound over time. Every finished artefact becomes context for the next session. Every correction updates a config that reduces future errors. The framework covers five principles: provide good context through organised directories, encode your taste as config files and lint rules, make verification easy, delegate bigger tasks progressively, and close the loop. None of it is AI-specific. It is how you onboard any collaborator.

ANALYSIS

Say Hello to the Internet of AI

Om Malik maps how AI is reshaping internet infrastructure from the inside out. AI now generates 77 exabytes per month, makes up 20% of total network traffic, and processes over 100 trillion tokens daily. But the real shift is not consumer traffic. It is massive east-west data flows between hyperscale data centres, where custom silicon and private fibre give a handful of companies control over AI compute pricing and availability.

Goertzel: Big AI Labs Will Weaponise Safety Regulation as a Moat

Ben Goertzel applies the "bootleggers and Baptists" framework to AGI regulation. Safety advocates supply the moral legitimacy for strict rules, while large AI labs quietly benefit from compliance costs that smaller competitors and open-source projects cannot absorb. He draws the parallel to self-driving car regulation, where Waymo, Tesla, and Uber backed the SELF DRIVE Act because self-certification at scale is a structural advantage. The prediction: AGI regulation will function as incumbency protection.

Why SaaS Freemium Playbooks Fail for AI Products

Google AI's product lead Vikas Kansal argues the traditional freemium model breaks down when every free interaction burns GPU cycles. You have to give away enough magic for users to reach the aha moment, but that magic is expensive to serve. Kansal proposes three gating strategies: usage intensity (tiered model access and context windows), outcomes (charging for automated multi-step tasks), and compute-heavy modalities like real-time simulations. The framework comes from building Google's AI subscription bundle.

Google Chrome Silently Installs a 4 GB AI Model on Your Device

Privacy researcher Alexander Hanff discovered Chrome is downloading Gemini Nano's 4 GB weights file to user devices without consent. The file lives in OptGuideOnDeviceModel, reinstalls itself if deleted, and surfaces no opt-out UI. Hanff draws a parallel to Anthropic's recent silent installation of a Native Messaging bridge across seven Chromium browsers. At Chrome's scale, the push could generate 6,000 to 60,000 tonnes of CO2-equivalent emissions depending on how many devices receive it.

System Integration as Software

Three quarters of the Fortune 500 run on SAP, but making that software work costs several times more than the licences. Total IT services spend hits $1.8 trillion annually, with the largest SAP migrations taking 3-5 years and costing $100-500 million. Roughly 70% fail. The a16z piece argues AI is finally making this work automatable: understanding undocumented schemas, interpreting custom code, and mapping business processes that exist only in the heads of consultants who left years ago.

Cognitive Surrender: The Difference Between Thinking With AI and Not Thinking at All

Google engineer Addy Osmani draws a line between cognitive offloading (delegating tasks while keeping judgment) and cognitive surrender (accepting AI outputs without forming your own view). A Wharton study found participants accepted wrong AI answers 73% of the time, with confidence increasing despite half the answers being deliberately incorrect. Engineers are especially vulnerable: approving 600-line PRs on vibes, accepting debug fixes without understanding root causes. The antidote is forming expectations before reviewing output.

TOOLS

AG2: The Open-Source Agent Framework Formerly Known as AutoGen

AG2 (rebranded from Microsoft's AutoGen) provides a foundational framework for building multi-agent AI systems. It handles agent communication, task orchestration, and tool integration out of the box. The project has accumulated 4,500 stars and active development across 600+ forks since the rebrand. Built in Python, it supports both conversational and programmatic agent patterns for building systems where multiple AI agents collaborate on complex tasks.

Ship Safe: 23 Security Agents That Scan and Fix Your Code Before You Ship

An open-source security scanner that runs 23 specialised agents in parallel across your codebase. It catches secrets, injections, XSS, supply chain attacks, LLM-specific threats (prompt injection, MCP tool poisoning, memory poisoning), and AI-generated code anti-patterns. Every proposed fix includes a diff preview and rollback. Works offline with no signup required. Integrates into CI/CD pipelines with SARIF output and configurable severity thresholds.

Train a Working GPT on Your Laptop in Under an Hour

A hands-on workshop that strips Karpathy's nanoGPT down to essentials and scales to a 10M parameter model you can train in a single session. You write every piece yourself: tokeniser, transformer architecture with embeddings and attention, training loop, and text generation. Works on Apple Silicon, NVIDIA GPUs, or CPU automatically. No ML experience required, just comfort reading Python. The result generates Shakespeare-like text from a model you built from scratch.

Crabbox: Remote Test Runners for AI Agents and Maintainers

An open-source CLI that syncs your dirty checkout to cloud compute, runs your test suite, and streams output back. One command: crabbox run followed by pnpm test. Behind it sits a Go binary on your laptop, a Cloudflare Worker broker managing lease state, and managed runners on Hetzner or AWS EC2. Supports SSH to existing macOS and Windows targets. Designed for AI coding agents that need to run tests on infrastructure you do not want on your local machine.