Google's I/O 2026 keynote shipped Gemini 3.5 Flash to billions today across Search, Apps, and APIs, with Pichai pitching it as frontier intelligence with action. Gemini Omni and Omni Flash bring single-model any-input-to-any-output generation, starting with video. Antigravity 2.0 lands as the agent-first dev platform. Gemini 3.5 Pro arrives next month, with Android XR Glasses and AI Mode in Search rounding out the agentic push.
Google I/O ships Gemini 3.5 Flash; Karpathy joins Anthropic
SpaceX buys Cursor 30 days post-IPO, Polymarket opens private markets, antirez fixes EDIT tools.
NEWS
Andrej Karpathy joined Anthropic this week, launching a new pre-training group focused on using Claude itself to accelerate pretraining research. The hire signals Anthropic is now the magnet for the small pool of researchers capable of advancing the frontier. Karpathy, who coined 'vibe coding' and recently described himself as in 'a state of AI psychosis', said the next few years at the LLM frontier will be especially formative.
Polymarket launched event contracts on private company milestones, with Nasdaq Private Market as exclusive resolution data provider. Live markets include OpenAI hitting a $1 trillion-plus IPO before 2027 and Anthropic crossing $500 billion this year, plus a head-to-head on whether Anthropic outvalues OpenAI in 2026. NPM is publishing its valuation feed free for the first time, giving retail a derivative-style way into the AI lab boom.
Willison unpacks the Gemini 3.5 Flash drop: it skipped the preview tag and went straight to GA, at 3x the price of Gemini 3 Flash Preview and 6x Gemini 3.1 Flash-Lite. The model ships with 1M input tokens, a January 2025 knowledge cutoff, and a new Interactions API that mirrors OpenAI Responses for server-side history. Pricing now sits within reach of Gemini 3.1 Pro, which 3.5 Pro will presumably price above next month.
Bloomberg reports SpaceX plans to close its Cursor acquisition 30 days after the June 12 IPO, putting the deal on track for July. The plan extends the $60 billion option SpaceX took on Cursor in November and reportedly carries a $10 billion breakup fee if the IPO timeline slips. The acquisition would hand Musk a frontier coding agent inside SpaceX one month after he becomes a public-market AI bidder.
TECHNICAL
Lasso disclosed multiple vulnerabilities in NemoClaw, Nvidia's sandboxed environment for running OpenClaw, showing that container isolation alone cannot stop prompt injection from rewriting an agent's instructions or smuggling data out. The attacks bypass static detection filters and persistently alter an agent's identity, because execution paths are determined by the text the agent reads. The lesson: sandboxes harden the host, not the agent's decision-making.
FutureHouse published Robin in Nature: a multi-agent system that integrates literature-search agents with data-analysis agents to close the loop between hypothesis generation, experiment proposal, result interpretation, and updated hypotheses. The team used it to identify therapeutic targets in experimental biology, which the paper describes as a semi-autonomous approach to scientific discovery. It is the first system that has automated all four stages.
A mechanistic-interpretability study of Qwen 3.5-9B finds nation-state content filtering lives in a small, identifiable circuit: three directions in layers 11 to 20 decide 'is this PRC-sensitive', 'should I refuse', and 'deflect or propagandise'. The factual knowledge is already in the base model, sitting under the alignment overlay. Around layer 24 the verdict commits in Chinese tokens even on English prompts. Steer one axis correctly and the model gives up the facts.
antirez builds a coding agent for his DS4 project on a local model and notices the EDIT tool everyone uses forces the LLM to re-emit the old text verbatim, fine on frontier APIs and brutal on token-poor local inference. He sketches an alternative: line numbers plus a CRC32 checksum on each line, so the agent says 'change line 22 if its CRC matches'. CAS without the token bill.
ANALYSIS
Sean Escriva on why mature team workflows and agentic frameworks disagree on almost everything that matters. Kanban is pull-based and trusts the worker to choose well; agent lifecycles are operator-initiated because the issue body becomes an attack surface the moment any agent can grab any ticket. Planning, review, and trust each break the same way. The collision is coming whether platform teams notice or not.
Anthropic committed $100 million to a partner network and PwC will certify 30,000 staff on Claude. OpenAI stood up DeployCo with $4 billion to send forward-deployed engineers on-site. None of that looks like a token business. It's vendors recognising lock-in has moved from the model layer to the orchestration layer, where workflows, governance, and identity layers don't migrate when the API does.
The 2026 default for letting an agent talk to a system is to install an MCP server, and this piece argues it's the wrong shape for many workloads. Dedicated tool menus let the agent pick from a list; flexible CLIs make it figure out how to compose pieces, which used to be the hard part and isn't anymore. Today's models handle --help, and the context-window tax of long tool lists has only gone up.
Nesbitt catalogues 26 ways an open source project can die: ghost maintainers, corporate orphans, captured maintainers, protestware, registry orphans whose source repo URLs 404. He notes 1.7% of npm packages and 4% of Packagist now point at a repo that isn't there. The piece extends Weekend at Bernie's, his earlier point that critical infrastructure quietly rots while every recency-based health score still rates it green.
TOOLS
watchmen sits behind Claude Code, Codex, and pi.dev, silently mining your sessions and writing skill bundles plus CLAUDE.md and AGENTS.md so the next session is smarter than the last. Skills follow you across agents on the same repo: switch from Claude Code to Codex mid-development and the context carries over. Local storage, your own API key for the LLM analysis runs, 16-week tool-error tracking on each project.
Asymptote Labs open-sourced Beacon, an endpoint telemetry agent that captures activity from Claude Code, Codex CLI, Gemini CLI, OpenCode, Factory Droid, Claude Cowork, and Cursor, normalises it into endpoint events, and forwards to Wazuh, Elastic, Splunk HEC, or a customer-managed SIEM. Built for security and IT teams that need visibility into what coding agents are doing on developer laptops. MDM deployable via Jamf or Fleet.
Forge is a reliability middleware for self-hosted LLM tool-calling. Rescue parsing, retry nudges, step enforcement, plus VRAM-aware context compaction take Ministral 3 8B Instruct Q8 from 53% to 86.5% on the 26-scenario eval suite, and 76% on the hardest tier. Three modes: a WorkflowRunner for full-lifecycle loops, drop-in middleware for your own orchestrator, and an OpenAI-compatible proxy that sits in front of llama-server or Ollama.
Chrome shipped DevTools for Agents 1.0, a stable MCP-style surface that lets Claude Code, Cursor, or any coding agent inspect a live page: console messages, network requests, DOM state, performance traces. The pitch is that AI coding tools are powerful at writing code but disconnected from its execution, generating complex apps without observing their behaviour. This closes the loop. Works with any agent that speaks MCP.
Hassan El Mghari and Together AI shipped Hallmark, a design skill that refuses to produce the AI-looking UI every model defaults to. It picks one of twenty-one macrostructures, dresses it in one of twenty-two themes, and runs sixty-five slop-test gates plus a pre-emit self-critique before returning. Four verbs cover default build, audit existing code, redesign with a different structure, and study design DNA from screenshots or URLs.