MedVi reportedly crossed $1 billion in annual revenue with a team of two, according to the New York Times. The medical technology company achieved that scale by automating its core workflows with AI. If the numbers hold up, MedVi would be among the most revenue-per-employee efficient companies ever built, a concrete data point for the "one-person billion-dollar company" thesis Sam Altman floated in 2024.
Google Open-Sources Gemma 4 + Cursor 3 Goes Agent-First
OpenAI's first media acquisition, the Axios npm hijack, and why chatbots are unsafe at any speed.
Cursor launched version 3, an agent-first redesign that lets developers manage multiple AI agents working across a codebase simultaneously. The update positions Cursor against Claude Code and OpenAI's Codex in what's becoming a three-way race for the AI coding tool market. Cursor's bet is that developers want to orchestrate agents from inside their IDE rather than switching to a terminal.
OpenAI bought TBPN, the daily tech talk show hosted by John Coogan and Jordi Hays that pulls in $30M+ annually. The show will keep its brand and editorial independence but report to Chris Lehane, OpenAI's head of global affairs. Fidji Simo framed the deal as creating "a space for constructive conversation about the changes AI creates." It is OpenAI's first acquisition of a media company.
Arcee released Trinity-Large-Thinking, a reasoning model that adds a thinking step before responding to improve multi-turn tool calling and context coherence. The model served 3.37 trillion tokens on OpenRouter in its first two months and ranks as the most-used open model in the US on that platform. Released under Apache 2.0, Arcee positions it as the strongest open weight model built outside China.
Google launched Gemma 4, four open models (2B to 31B parameters) built on Gemini 3 tech and released under Apache 2.0 for the first time. The 31B dense model ranks #3 on Arena AI's text leaderboard, outcompeting models 20x its size. Edge variants run on phones and Raspberry Pi with native vision and audio. All models ship with function calling, structured JSON output, and context windows up to 256K.
Dense models activate all parameters on every token. Mixture of Experts activates a small subset. Mixtral 8x7B, for example, routes each token to 2 of 8 expert networks, leaving the rest idle. The result is GPT-4-class performance at a fraction of the inference cost. This explainer covers the architecture, the router mechanism, and the trade-offs in memory and fine-tuning complexity that come with sparse activation.
Every Claude Code request includes a cch hash in its billing header. Get it wrong and features like fast mode are rejected. Researchers reverse-engineered the mechanism from the compiled Bun binary before the source leak made it visible: an xxHash64 of the request body, combined with a SHA-256 derived version suffix. The write-up details MITM interception, binary extraction, and runtime tracing.
Birgitta Böckeler at Thoughtworks defines "harness" as everything around a coding agent except the model itself, then narrows it for practical use. The framework distinguishes feedforward controls (instructions, context) from feedback controls (tests, linters, self-review) and maps them to three regulation categories: maintainability, architecture fitness, and behaviour. A practical taxonomy for builders who want coding agents that work with less supervision.
Simon Willison tells Lenny Rachitsky that November 2025 was when AI coding agents crossed from "mostly works" to "actually works." He now writes 95% of his code from his phone and is mentally exhausted by 11am. His key warnings: mid-career engineers face more disruption than juniors, prompt injection remains unsolved, and "dark factories" where AI handles its own QA are coming.
Individual agents are easy to build. Production agent systems are not. This piece maps seven infrastructure blocks that surround the agent code: integrations, context lake, agent registry, measurement, human-in-the-loop, governance, and orchestration. The framing echoes Google's 2015 paper on ML technical debt, where the ML code was a tiny box surrounded by massive infrastructure. Every team building agents will recognise the pattern.
Someone hijacked the Axios npm package, which gets 100 million weekly downloads, by compromising a maintainer account and adding a single malicious dependency. The payload detected your OS, installed a remote access trojan, executed it, and deleted itself. a16z argues these attacks are accelerating because AI agents pull dependencies at machine speed with minimal human review. The dependency graph is now an attack surface, not just a convenience.
Jeffrey Snover, the inventor of PowerShell, applies Ralph Nader's car safety argument to AI. General-purpose chatbots have an infinite goal space, making safety a philosophical impossibility rather than an engineering problem. Snover points to Microsoft's Tay as proof: the chatbot mirrored its environment because it had no defined perimeter. His prescription is "Chatbots for X," constrained systems where safety becomes an engineering problem with tractable solutions.
A former Azure Core engineer who worked on the Boost offload card and network accelerator details how complacency and misaligned engineering decisions eroded trust in Azure. The account traces how internal misjudgment, including attempting to port a massive Windows stack to a low-power accelerator, contributed to Microsoft nearly losing OpenAI as a customer and weakened US government confidence in the platform.
Lemonade is an open-source local LLM server from AMD that handles chat, vision, image generation, transcription, and speech through a single OpenAI API-compatible endpoint. The C++ backend is 2MB, installs in one minute, and auto-configures for your GPU and NPU. It supports multiple models running simultaneously across Windows, Linux, and macOS, with integrations for Open WebUI, n8n, Continue, and GitHub Copilot.
Built from production systems inside Amazon, Strands Agents lets you define tools as functions, write a system prompt, and let the agent loop handle execution. No workflow graphs or step definitions needed. The framework ships middleware to intercept and steer agent loops, native multi-agent composition, and modular skills. Available for Python and TypeScript with 6,000+ GitHub stars and any model provider.