Back to archive
Issue #114··40 min read·20 stories

Anthropic IPO, Nvidia AI PCs, Meta's AI gets played

Alphabet raises $80B for AI, $10B from Buffett. Perplexity lets agents script the search stack.

The coding-model race got loud overnight: xAI's grok-build, JetBrains' open Mellum2 and MiniMax's million-token M3 all landed, while Microsoft and Google scrambled to catch Claude Code and Codex. A worm slipped into Red Hat's npm packages and harvested CI secrets straight past 2FA. Further down, Nathan Lambert maps where open and closed models split, Steve Yegge calls time on the technical interview, and someone runs Gemma 4 on a decade-old Xeon with no GPU.

NEWS

At Computex, Jensen Huang unveiled the RTX Spark superchip, an Arm Grace CPU fused with a Blackwell GPU and 128GB of unified memory, shipping this year in 30-plus laptops and 10 desktops from Microsoft, Dell, HP, Lenovo and MSI. Above it sits the DGX Station, a GB300 deskside machine with 748GB and 20 petaflops that runs trillion-parameter models locally. Huang called it the first reinvented PC line in 40 years.

Anthropic said it confidentially filed a draft S-1 with the SEC on Monday, setting up one of the largest AI public offerings yet. The move gets it ahead of rival OpenAI, which is readying its own filing, and follows its $65 billion round at a $965 billion valuation. Share count and price are not set, and a listing could come later this year, pending SEC review.

Alphabet is raising $80 billion in equity to fund its AI infrastructure buildout, one of the largest such raises by a hyperscaler. The package includes a $40 billion at-the-market share programme starting next quarter, $30 billion in underwritten offerings and mandatory convertibles, and a $10 billion investment from Warren Buffett's Berkshire Hathaway. The Berkshire money is the surprise, a rare big-tech bet from an investor long sceptical of the sector.

xAI put grok-build-0.1 into public beta through its API, a model trained specifically for agentic coding: web development, debugging and MCP support. It is the same model behind the Grok Build CLI, served at over 100 tokens per second at $1 per million input tokens and $2 per million output. It also runs in Cursor, OpenClaw and OpenCode, and on OpenRouter and Vercel AI Gateway.

StepSecurity found multiple packages in the @redhat-cloud-services npm scope shipping malware that fires on every install, before any application code runs. The payload is a multi-stage credential harvester that sweeps GitHub Actions, AWS, GCP, Azure, Kubernetes, Vault, npm and CircleCI tokens, hidden under three layers of obfuscation in a 4.2MB install script. It is a self-propagating worm that uses stolen npm tokens and npm's bypass_2fa parameter to republish backdoored versions.

JetBrains open-sourced Mellum2, a 12-billion-parameter coding model aimed at the infrastructure layer of agentic systems: routing, retrieval pipelines, sub-agent tasks and private on-premises deployment. The team calls it a focal model, fast and specialised rather than competing with frontier models on breadth. Unlike its proprietary 4B predecessor, Mellum2 is open from day one, built for teams that want to run inference on hardware they control.

Nvidia's Cosmos 3 is now on Hugging Face, the first open omni-model that combines world generation, physical reasoning and action generation in one system instead of separate pipelines. Built on a Mixture-of-Transformers architecture, it comes in Super and Nano sizes with model cards, Diffusers integration, post-training scripts and open synthetic-data datasets. It is aimed at builders working on robotics, autonomous vehicles and smart spaces.

Chinese lab MiniMax's M3 arrives as its first model with a one-million-token context window and native multimodality, built around a new MiniMax Sparse Attention design. It is tuned for software engineering, terminal-based tool use and agentic web browsing across multi-turn sessions. As an open-weight model with frontier-level coding, it takes direct aim at proprietary leaders, and it is already callable on Vercel AI Gateway as minimax/minimax-m3.

TECHNICAL

When you upgrade an RL training loop to let a model call tools mid-rollout, the curves get weird and you eventually hit a shape-mismatch error. The cause is usually a broken Token-In, Token-Out invariant: you parsed the response for tool calls, then re-tokenised the conversation, and the round trip silently changed the tokens. The gradient lands on a sequence the model never sampled. Two fixes follow.

This is a build log for running a quantised Gemma 4 on a recycled server that has no business doing it: a single 2016 Intel Xeon E5-2620 v4, 128GB of slow DDR3, and no GPU. Off-the-shelf Ollama and stock llama.cpp either can't run it or won't expose enough knobs to tune it. The author pairs the model's MTP drafters with a verifier to wring usable speed from ancient hardware.

A security researcher detailed how attackers hijacked high-profile Instagram accounts, including the Obama White House account, with almost no effort. The attacker needs only the target's username, connects through a VPN near the victim's city, then tells Meta's AI support bot the account is hacked and asks it to send verification codes to an attacker's email. The bot complies, bypassing 2FA entirely.

Trustpilot rebuilt its review-intelligence stack on fine-tuned Gemma models, processing millions of user reviews in a real-time streaming pipeline under tight latency and cost constraints. The Google Cloud writeup walks through extracting metadata, running named-entity recognition and other enrichment from messy human text at high volume. It is a concrete pattern for teams moving core ML from bespoke models to fine-tuned open weights without blowing the latency budget.

ANALYSIS

After 35 years conducting technical interviews, Yegge argues the format is on its last leg, not because anyone fixed it but because it is collapsing on its own. He traces how it was broken long before he learned the trade, survived every band-aid, and is now giving way as AI reshapes what hiring needs to test. He is clear there is no silver-bullet replacement, only a set of uncomfortable options.

Nathan Lambert argues the open-versus-closed balance of power is mostly economic: whether users keep paying large margins for the best closed models. Coding agents are the first proof they will, now that net output is clearly higher past the Opus 4.5 and Codex 5.2 thresholds. The flip side is a slow decay of the labs' API businesses, as they roll their best models out later to protect token supply.

AI systems are writing descriptions of your company and products, and buyers read those instead of clicking through. The numbers are stark: 60% of Google searches now end without a click, 77% on mobile, and click-through collapses 58% when an AI Overview appears. More than 73% of brands ranking on Google's first page get zero mentions in AI answers. Gartner expects half of organic traffic gone by 2028.

Anthropic has pulled ahead in generative AI largely on the strength of Claude Code, and OpenAI pivoted toward enterprise with Codex to compete. Now Google and Microsoft are leaning on their balance sheets and cloud businesses to lure developers, with Microsoft lining up coding announcements at this week's Build conference after Google's developer-conference push in May. One analyst calls competing here absolutely critical for their growth.

TOOLS

OpenJarvis is an open-source framework from Stanford's Hazy Research and Scaling Intelligence labs for building personal AI agents that run on your own hardware. The pitch is local-first by default: models run on your machine and the cloud is optional, with energy, cost and latency tracked alongside accuracy as part of their Intelligence Per Watt work. Version 1.0 arrives with built-in Ollama support and a one-line install.

Strands is an open-source SDK, built from production systems inside Amazon, for assembling your own agent harness rather than accepting a framework's defaults. You define tools as plain Python functions and attach hooks that fire on events, so you can cancel a tool call when an agent tries to save a report without source citations. It carries more than 6,600 GitHub stars and installs from pip or npm.

You use Claude, ChatGPT and Cursor for different things, but your context does not move between them, so you keep re-explaining yourself. Second Brain is a self-hosted memory layer you connect to each tool, then tell things once; it recalls them by meaning, not exact wording. Unlike built-in app memory, it lives in your own account on Cloudflare, reachable over MCP, a CLI and Obsidian.

Perplexity introduced Search as Code, which lets models reach into its search stack rather than consume finished results. Its Agentic Search SDK exposes retrieval, ranking, filtering, fanouts and rendering as primitives an agent assembles in sandboxed code, replacing the function-call and MCP loops that cap most agents. It is live in the Agent API and Computer, and beat rivals 2.5x on the new WANDR wide-research benchmark.