Issue #112·Friday, May 29, 2026·40 min read·20 stories

Claude Opus 4.8 Ships; You Can't Manage 20 Agents at Once

Anthropic passes OpenAI at $965B. 10,000 agents hunt Ramp's own bugs. Robinhood opens to agents.

Cognition raised over $1 billion at a $26B valuation, with its Devin agent now at $492M in run-rate revenue. OpenAI and Thrive showed how they built a tax agent that rewrites itself through Codex, ElevenLabs put out Music v2 with API prices halved, and CrowdStrike helped take down a botnet that spent two years hunting open-source developers. There's also a $9.99 Mac app that scans your AI chat logs for leaked keys.

NEWS

Anthropic Ships Claude Opus 4.8 With Dynamic Workflows and a Cheaper Fast Mode

· 7 min read

Anthropic upgraded its flagship to Claude Opus 4.8, just 41 days after 4.7 and at the same price. Users on claude.ai can now control how much effort Claude spends on a task, Claude Code gains a Dynamic Workflows feature that coordinates hundreds of parallel subagents for codebase-scale migrations, and fast mode runs at 2.5x speed for a third of the previous cost. Early testers say it flags its own uncertainty more readily.

Anthropic Closes $65B at a $965B Valuation, Passing OpenAI

· 4 min read

Anthropic raised $65 billion in Series H funding led by Altimeter, Dragoneer, Greenoaks, and Sequoia, valuing it at $965 billion post-money and pushing it past OpenAI as the most valuable AI startup. Run-rate revenue crossed $47 billion earlier this month, up sharply since February's Series G. The round folds in $15 billion of previously committed hyperscaler money and funds compute, safety research, and product expansion.

Cognition Raises $1B at $26B as Devin Hits $492M Run-Rate Revenue

· 3 min read

Cognition raised over $1 billion at a $26 billion valuation led by Lux Capital, General Catalyst, and 8VC. Its AI software engineer Devin now runs at $492 million in run-rate revenue, with enterprise usage up more than tenfold since the start of the year. The company points to outcomes like Mercedes-Benz cutting an eight-month legacy modernisation project down to eight days, and customers including Citi, Goldman Sachs, and the US Army.

Robinhood Opens Trading and Credit to AI Agents Over MCP

· 6 min read

Robinhood launched Agentic Trading and an Agentic Credit Card, letting customers connect their own AI agents to trade stocks and spend through Robinhood's MCP servers. Agents get a dedicated trading account kept separate from the main portfolio, with built-in safety controls and a real-time activity feed. The equities beta works with agents built on tools like Claude or Cursor.

CrowdStrike and Google Take Down the Glassworm Botnet Targeting Open-Source Developers

· 3 min read

CrowdStrike, working with Google and Shadowserver, disrupted Glassworm, a botnet that spent two years pushing malware and stealing passwords from open-source software developers. The operation went after attackers who compromise individual developer machines to reach the wider supply chain. As CrowdStrike put it, adversaries are no longer just targeting products, they are targeting the developers who build them, because one compromised workstation can cascade into thousands of downstream organisations.

ElevenLabs Launches Music v2 and Halves Its API Pricing

· 3 min read

ElevenLabs released Music v2, its latest generative music model, with better vocals, instrumentation, and arrangement across genres plus improved multilingual support. It powers three products: ElevenMusic for listening and remixing, ElevenAPI for embedding music generation into your own product, and ElevenCreative for downloadable tracks for ads and video. The company also cut pricing by up to 50% on the API and up to 40% for Creative self-serve customers.

YouTube Will Auto-Detect and Label Photorealistic AI Video

· 2 min read

YouTube is moving past creator self-disclosure and will start automatically applying a label when its systems detect significant photorealistic AI use in a video. It is also making the existing AI-content labels more prominent for viewers. The shift matters for anyone publishing AI-generated or heavily edited footage, since the platform will now flag it whether or not the uploader discloses the AI use themselves.

Meta Hunts for Revenue Beyond Ads to Justify a $145B Capex Bill

· 4 min read

Meta still makes almost all its money from advertising, yet the stock is down on the year while capital expenditure climbs toward $125 to $145 billion. Sherwood reports the company is now pushing into subscriptions, enterprise, and even floating a cloud business to find revenue that justifies the spend. It is a real shift for a firm whose asset-light software model long funded itself on ads alone.

TECHNICAL

Ramp Pointed 10,000 Coding Agents at Its Own Backend and Found 7 Novel High-Severity Bugs

· 1 min read

Ramp pointed roughly 10,000 coding agents at its backend over an eight-hour run, each told to find one vulnerability, then used thousands more agents to deduplicate and reproduce the results. Seven confirmed high-severity bugs survived, all novel and missed by prior pen tests, bug bounties, and AI scans. The pipeline is model-agnostic: cheaper open-weight models like Kimi K2.6 and DeepSeek V4 Pro still surfaced real high-severity issues.

OpenAI and Thrive Built a Tax Agent That Improves Itself Through Codex

· 1 min read

OpenAI forward-deployed engineers and Thrive Holdings built Tax AI for Crete's network of 30-plus accounting firms, processing 7,000 returns this season at up to 97% accuracy and saving practitioners about a third of their prep time. It measurably improves itself: returns hitting 75% field completion climbed from 25% to 86% in six weeks, via a loop that turns practitioner corrections into production traces, then targeted evals, then scoped Codex tasks.

Why torch.compile Is So Fast: A Walkthrough of Kernel Fusion

· 5 min read

Calling torch.compile can make a PyTorch model run up to 10x faster. Without compilation the GPU launches a separate kernel for every operation, paying launch overhead and writing intermediate results to memory each time. The Inductor compiler fuses dependent operations into single Triton kernels, keeping data in fast registers and cutting memory traffic and launch costs, shown with a concrete vertical-fusion example.

Databricks on Why Serving 125 Trillion Tokens a Month Is a Reliability Problem

· 7 min read

Databricks serves more than 125 trillion tokens a month, and this engineering post argues that at that scale LLM inference is a reliability problem before it is a throughput one. It walks through the failure modes of spiky agent traffic on unreliable GPU fleets, including the blast radius when a single node goes down in disaggregated prefill and decode setups, and how the team holds p95 latency under load.

ANALYSIS

The Orchestration Tax: You Can't Manage 20 Agents at Once

· 1 min read

Addy Osmani argues that spinning up AI agents is cheap, but closing the loop is not: every agent's output still routes through one serial reviewer, you. Borrowing the GIL and Amdahl's Law, he shows that adding agents past your review rate just deepens a queue of unmerged work and lowers your standards. The fix is to treat attention as the scarce resource and scale your fleet to what you can actually review.

How to Get Off the AI Code-Generation Treadmill

· 12 min read

With 42% of committed code now AI-assisted and roughly 29% of it merged without manual review, InfoWorld describes a treadmill: generate fast, then bolt on ever more checks to catch the slop. Its fix is an AI assembly model that shrinks the surface area needing review in the first place, rather than scaling guardrails linearly with every new AI-built feature.

How to Evaluate AI Agents Without the Over-Engineered Eval Stack

· 8 min read

Raindrop's founder argues agent evaluation has gotten needlessly complicated, with vendors selling strategies that diverge from what actually works in production. Drawing on building agents for companies like Framer, Clay, and Vercel, the guide favours raising the reliability floor in critical workflows over chasing benchmark scores: do error analysis, teach agents to say I don't know, define golden cases, inspect trajectories locally, and lean on code-aware offline evals.

Frontier LLMs Disagree on 67% of Real-World Fact-Checks

· 22 min read

Lenz.io ran 1,000 real-world factual claims past a panel of five frontier models and found that on 67% of them, at least one model dissents from the majority verdict. The takeaway for anyone building evals or using an LLM as a judge: a single model is not ground truth. Treating agreement as a signal and disagreement as a flag for human review beats trusting one confident answer.

Why 70+ AI Models Keep Writing the Exact Same Sentence

· 7 min read

Drawing on the NeurIPS Artificial Hivemind paper, this piece shows how more than 70 language models collapse toward identical phrasing and metaphors, with DeepSeek-V3 and GPT-4o independently producing the same Elevate your iPhone with our line. The argument for marketers and builders: letting LLMs draft brand copy unedited flattens every brand into one default voice, so the editing pass is where any distinctiveness now has to come from.

TOOLS

Sieve Scans Your AI Coding Chats for Leaked API Keys

· 4 min read

Sieve is a $9.99 Mac app that scans the chat histories of Claude Code, Cursor, Copilot, Windsurf, and Codex, plus your .env files, for secrets you pasted or that turned up in autocomplete. It can redact found keys straight from VS Code's SQLite databases, stores rotated values in the macOS Keychain behind Touch ID, and ships a local MCP server so Claude Code can check for exposures without seeing raw values.

Cate: A Spatial, Infinite-Canvas IDE for Code, Terminals, and Agents

· 7 min read

Cate is an open-source desktop IDE built around an infinite canvas instead of stacked tabs. You arrange code, terminals, browsers, documents, and AI agents spatially in a per-project workspace, and the layout, including working directories, comes back exactly as you left it across sessions. Instead of cycling a dozen scattered terminals and agent windows, each project gets one persistent canvas you navigate by position.

bumblebee: Perplexity's Single-Binary Scanner for Vulnerable Packages on Dev Machines

· 5 min read

bumblebee is an open-source tool from Perplexity that inventories the packages, extensions, and developer-tool metadata on macOS and Linux dev machines. When a supply-chain advisory names a compromised package and version, you can instantly check which of your laptops actually have a match across npm, pnpm, PyPI, Go, and MCP configs. It ships as a single static Go binary with no dependencies, built for fast incident response.