Issue #72·Friday, April 3, 2026·30 min read·15 stories

Google Open-Sources Gemma 4 + Cursor 3 Goes Agent-First

OpenAI's first media acquisition, the Axios npm hijack, and why chatbots are unsafe at any speed.

Google released Gemma 4 under Apache 2.0, putting frontier-class open models on everything from phones to workstations. Cursor launched its agent-first v3 to challenge Claude Code and Codex, while OpenAI bought TBPN in its first media acquisition. Meanwhile, someone quietly hijacked the Axios npm package and installed a trojan on every machine that ran npm install.

NEWS

Two-Person Company Hits $1B Revenue Using AI Workflows

· 1 min read

MedVi reportedly crossed $1 billion in annual revenue with a team of two, according to the New York Times. The medical technology company achieved that scale by automating its core workflows with AI. If the numbers hold up, MedVi would be among the most revenue-per-employee efficient companies ever built, a concrete data point for the "one-person billion-dollar company" thesis Sam Altman floated in 2024.

OpenAI Acquires TBPN, Its First Media Company Purchase

· 4 min read

OpenAI bought TBPN, the daily tech talk show hosted by John Coogan and Jordi Hays that pulls in $30M+ annually. The show will keep its brand and editorial independence but report to Chris Lehane, OpenAI's head of global affairs. Fidji Simo framed the deal as creating "a space for constructive conversation about the changes AI creates." It is OpenAI's first acquisition of a media company.

Cursor 3 Goes Agent-First to Compete With Claude Code and Codex

· 7 min read

Cursor launched version 3, an agent-first redesign that lets developers manage multiple AI agents working across a codebase simultaneously. The update positions Cursor against Claude Code and OpenAI's Codex in what's becoming a three-way race for the AI coding tool market. Cursor's bet is that developers want to orchestrate agents from inside their IDE rather than switching to a terminal.

Google Releases Gemma 4 Under Apache 2.0, Its First Truly Open Model Family

· 6 min read

Google launched Gemma 4, four open models (2B to 31B parameters) built on Gemini 3 tech and released under Apache 2.0 for the first time. The 31B dense model ranks #3 on Arena AI's text leaderboard, outcompeting models 20x its size. Edge variants run on phones and Raspberry Pi with native vision and audio. All models ship with function calling, structured JSON output, and context windows up to 256K.

Arcee Ships Trinity-Large-Thinking Under Apache 2.0, Claims Strongest Open Model Outside China

· 5 min read

Arcee released Trinity-Large-Thinking, a reasoning model that adds a thinking step before responding to improve multi-turn tool calling and context coherence. The model served 3.37 trillion tokens on OpenRouter in its first two months and ranks as the most-used open model in the US on that platform. Released under Apache 2.0, Arcee positions it as the strongest open weight model built outside China.

TECHNICAL

Reverse Engineering Claude Code's Request Signing

· 7 min read

Every Claude Code request includes a cch hash in its billing header. Get it wrong and features like fast mode are rejected. Researchers reverse-engineered the mechanism from the compiled Bun binary before the source leak made it visible: an xxHash64 of the request body, combined with a SHA-256 derived version suffix. The write-up details MITM interception, binary extraction, and runtime tracing.

MoE Architecture: Why Frontier Models Got Cheaper to Run

· 7 min read

Dense models activate all parameters on every token. Mixture of Experts activates a small subset. Mixtral 8x7B, for example, routes each token to 2 of 8 expert networks, leaving the rest idle. The result is GPT-4-class performance at a fraction of the inference cost. This explainer covers the architecture, the router mechanism, and the trade-offs in memory and fine-tuning complexity that come with sparse activation.

Harness Engineering for Coding Agents

· 7 min read

Birgitta Böckeler at Thoughtworks defines "harness" as everything around a coding agent except the model itself, then narrows it for practical use. The framework distinguishes feedforward controls (instructions, context) from feedback controls (tests, linters, self-review) and maps them to three regulation categories: maintainability, architecture fitness, and behaviour. A practical taxonomy for builders who want coding agents that work with less supervision.

ANALYSIS

Et Tu, Agent? Did You Install the Backdoor?

· 8 min read

Someone hijacked the Axios npm package, which gets 100 million weekly downloads, by compromising a maintainer account and adding a single malicious dependency. The payload detected your OS, installed a remote access trojan, executed it, and deleted itself. a16z argues these attacks are accelerating because AI agents pull dependencies at machine speed with minimal human review. The dependency graph is now an attack surface, not just a convenience.

Chatbots: Unsafe at Any Speed

· 7 min read

Jeffrey Snover, the inventor of PowerShell, applies Ralph Nader's car safety argument to AI. General-purpose chatbots have an infinite goal space, making safety a philosophical impossibility rather than an engineering problem. Snover points to Microsoft's Tay as proof: the chatbot mirrored its environment because it had no defined perimeter. His prescription is "Chatbots for X," constrained systems where safety becomes an engineering problem with tractable solutions.

Willison: We've Passed the AI Inflection Point

· 4 min read

Simon Willison tells Lenny Rachitsky that November 2025 was when AI coding agents crossed from "mostly works" to "actually works." He now writes 95% of his code from his phone and is mentally exhausted by 11am. His key warnings: mid-career engineers face more disruption than juniors, prompt injection remains unsolved, and "dark factories" where AI handles its own QA are coming.

The Hidden Technical Debt of Agentic Engineering

· 22 min read

Individual agents are easy to build. Production agent systems are not. This piece maps seven infrastructure blocks that surround the agent code: integrations, context lake, agent registry, measurement, human-in-the-loop, governance, and orchestration. The framing echoes Google's 2015 paper on ML technical debt, where the ML code was a tiny box surrounded by massive infrastructure. Every team building agents will recognise the pattern.

How Microsoft Vaporised a Trillion Dollars: An Azure Insider's Account

· 7 min read

A former Azure Core engineer who worked on the Boost offload card and network accelerator details how complacency and misaligned engineering decisions eroded trust in Azure. The account traces how internal misjudgment, including attempting to port a massive Windows stack to a low-power accelerator, contributed to Microsoft nearly losing OpenAI as a customer and weakened US government confidence in the platform.

TOOLS

Strands Agents: Write Code, Not Pipelines

· 6 min read

Built from production systems inside Amazon, Strands Agents lets you define tools as functions, write a system prompt, and let the agent loop handle execution. No workflow graphs or step definitions needed. The framework ships middleware to intercept and steer agent loops, native multi-agent composition, and modular skills. Available for Python and TypeScript with 6,000+ GitHub stars and any model provider.

AMD's Lemonade Runs LLMs Locally With a One-Minute Install

· 2 min read

Lemonade is an open-source local LLM server from AMD that handles chat, vision, image generation, transcription, and speech through a single OpenAI API-compatible endpoint. The C++ backend is 2MB, installs in one minute, and auto-configures for your GPU and NPU. It supports multiple models running simultaneously across Windows, Linux, and macOS, with integrations for Open WebUI, n8n, Continue, and GitHub Copilot.