Back to archive
Issue #99··36 min read·18 stories

AI Found Its First Zero-Day. TanStack Got Hijacked.

Thinking Machines ships its first model. Samsung's $20B strike. Hyperscalers head to $700B.

Google said a criminal hacking group used AI to discover a previously unknown software flaw and tried to weaponise it, the first publicly documented zero-day found this way. Hours later TanStack confirmed 42 of its npm packages were hijacked in a separate supply-chain attack that pulled an OIDC token straight out of GitHub Actions runner memory. Also in this edition: Thinking Machines Lab released its first model, Samsung's $20B chip strike is 10 days out, and hyperscalers told Q1 calls that 2026 capex tops $700B with 2027 going higher.
NEWS

Google said it caught a criminal hacking group attempting a widespread cyberattack that relied on an AI model to discover a previously unknown software bug. It is the first publicly documented case of AI being used to find a real zero-day, with Google calling it "a taste of what's to come." The model handled discovery, weaponisation, and the attempted attack before the flaw was disclosed.

Samsung and its largest labour coalition are sitting through government-mediated talks with 10 days remaining before an 18-day general strike at the world's biggest memory chip operation. The union wants uncapped performance bonuses, potentially $400K per worker, after SK hynix set the precedent. A settlement could shave Samsung's operating profit by up to 12%, and a walkout would land directly on HBM production for the AI memory market.

OpenAI is acquiring UK consultancy Tomoro to form OpenAI Deployment Company, a standalone $4B+ unit of Forward Deployed Engineers tasked with helping enterprises actually realise value from its models. The pitch: stop inexperienced consultants from souring buyers on AI. McKinsey, Bain, and Capgemini have all committed capital, and OpenAI's FDEs will sit alongside their engagements rather than competing for them directly.

Ford launched Ford Energy, a subsidiary that will manufacture 20 GWh of US-assembled battery energy storage systems annually for utilities, data centres, and large industrial customers. The new unit converts excess EV battery capacity, including assets from the cancelled $11.4B BlueOval SK joint venture, into grid-scale BESS as AI data-centre demand reshapes the market for stationary power.

A revised Gemini interface surfaced over the weekend, exposing a model card reading "Create with Gemini Omni: meet our new video model, remix your videos, edit directly in chat." Reddit screenshots suggest an accidental rollout or limited A/B test, with a new usage-limits tab and fast credit burn hinting at metered pricing. Early outputs trail ByteDance's Seedance 2 on cinematic quality but reportedly lead on prompt adherence.

TanStack published its postmortem for the 11 May supply-chain attack that pushed 84 malicious versions across 42 packages including react-router (12M weekly downloads). Attackers chained a pull_request_target workflow with GitHub Actions cache poisoning, then pulled an OIDC token from the Actions runner process memory to publish directly. External researchers caught it in 20 minutes. Anyone who installed that day should rotate AWS, GCP, Vault, GitHub, npm and SSH credentials.

TECHNICAL

Asim Manizada describes an overengineered, self-orchestrating team of LLM agents, some deliberately "drunk" through prompt and temperature games, that has discovered 20+ CVEs over a few months. The haul includes CVE-2026-31432 and CVE-2026-31433, two remote, unauthenticated out-of-bounds writes in the Linux kernel's ksmbd. The harness explores documentation-code mismatches and new bug classes, so expect more LLM-found CVEs in network-reachable services.

Mira Murati's lab released a research preview of TML-Interaction-Small, a 276B-parameter MoE with 12B active parameters trained from scratch for real-time multimodal collaboration. A multi-stream micro-turn design (200ms chunks) handles interruptions, simultaneous speech, and visual interjections natively. The lab claims combined state-of-the-art on intelligence (Audio MultiChallenge 43.4%) and interactivity (FD-bench v1.5 77.8%) with 0.40-second turn-taking latency.

A New Stack essay separates what builders mean by "agent memory" into five capabilities, persistence, selection, compression, retrieval, and update, that have to work together. Most production systems solve retrieval and call it done, which is how the support agent in the lede had no idea on Wednesday that it had promised the user a refund on Monday. Idempotency and workflow state machines don't cover this.

ANALYSIS

A strategybreakdowns essay reframes Perplexity Computer as an operating system for high-leverage knowledge work rather than a search product. Users describe a workflow and Perplexity handles model choice, deployment, and stack. The author built a "Reddit Radar" content discovery tool across two evenings of five short conversations; the first post it produced hit 200K+ impressions on LinkedIn. The strategic bet is owning the 10x knowledge worker.

Joshua Leigh treats the chat-prompt paradigm as a structural regression in interaction design. After forty years of moving toward direct manipulation through pointing, dragging, and touch, AI tooling reverted to the blinking cursor: type what you want and hope you chose the right words. For visual and spatial work that lossy text hand-off discards information and discoverability. The fix is rebuilding visual intent into AI surfaces.

Oracle 26ai, SQL Server 2025, MongoDB Atlas, and Postgres via pgvector all ship native vector support now, which thins the argument for a dedicated vector store in most enterprise builds. Keeping vectors next to business data avoids glue code, sync lag, and consistency drift. Pinecone, Weaviate, and Milvus aren't doomed, but the "every AI app needs a separate vector DB" pitch no longer holds up.

Sean Goedecke argues that software engineering's run as a 40-year compounding career was a fortunate coincidence. Until 2024 the only way to learn the craft was doing the craft, which let coding hobbies parlay into lifelong work. AI changes that arithmetic, and even if it erodes long-term skill development, engineers are still obliged to use it the way construction workers are obliged to lift heavy objects.

Tanay Jaipuria's earnings roundup: Amazon $200B, Microsoft $190B, Google $180-190B, Meta $125-145B, combining to north of $700B in 2026 capex. Google's CFO told the call 2027 will increase "significantly," and Pichai conceded Google's cloud revenue would have been higher if it could meet demand. Memory, specifically HBM, came up across calls as the binding supply-side choke point.

Ben Thompson reads Cerebras raising its IPO range to $150-160 a share, with 30M shares marketed, as the visible marker of a structural shift in AI compute. The GPU era was a parallel-pixel inheritance from graphics; the inference era will be heterogeneous, with wafer-scale, custom silicon, and specialised inference chips each taking a slice. Agents are going to need a lot of compute, and not all of it will come from Nvidia.

TOOLS

Adam Miller's open-source Claude Code plugin runs up to seven parallel sub-agent lenses (correctness, security, UX) through a dedup pass, a cheap-then-deep validation gate, and an optional Opus cross-cut. A `--ensemble` flag adds a Codex CLI pass. `:fix` and `:walkthrough` batch-accept pre-computed auto-fixes. Author reports it catches more real bugs than the built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex review, anecdotal n=1.

The Codex Chrome extension lets Codex drive your signed-in Chrome profile for tasks on Salesforce, Gmail, LinkedIn, or internal tools, anything that needs authenticated browser state. Localhost and unauthenticated pages stay in the in-app browser. Codex can switch between plugins, Chrome, and the in-app browser per task. Treat page content as untrusted context and review before allowing Codex to continue.

Anthropic shipped agent view in Claude Code, a list interface for managing multiple parallel sessions from one place. Pressing the left arrow from any session, or running `claude agents`, opens an overview with each session's status, last response, and timestamp. `/bg` backgrounds the current session; `claude --bg [task]` launches one directly into the background. Useful for dispatching parallel sub-agents and supervising long-running jobs.