Back to archive
Issue #115ยทยท36 min readยท18 stories

Codex builds live sites ๐Ÿง‘โ€๐Ÿ’ป, autofill leaks passwords ๐Ÿ”‘, video that directs itself ๐ŸŽฌ

Microsoft ships its own models. Trump eyes frontier models early. Tokens top a startup's payroll.

GitHub shipped a desktop app for running coding agents in parallel, each in an isolated session you can inspect and merge. Sophos caught a ransomware kit built with Cursor and Claude Opus, while Florida became the first state to sue OpenAI and name Altman personally. And Martin Scorsese has film Twitter in meltdown over his use of AI for storyboards.

NEWS

At Build, Microsoft moved to loosen its OpenAI reliance with its first proprietary models: MAI-Code-1-Flash for coding and MAI-Thinking-1, a reasoning model, both pitched on low token cost and run on its own Azure. It also introduced Scout, an always-on, OpenClaw-based agent in Microsoft 365 that builds persistent memories and skills as you give it feedback. Together they push Microsoft to compete with OpenAI, Anthropic and Google at more layers of the stack.

Sophos found a ransomware toolkit whose payloads were built with help from Cursor and Claude Opus agents across coding, analysis and revision, with some agents tasked to scrape security research for bypass techniques. The malware was tested against Sophos, CrowdStrike and Microsoft defences and uses Cobalt Strike, a Telegram-based command channel and a Cloudflare Worker to hide its backend. The researchers stress the workflow stayed entirely human-driven.

Mercor CEO Brendan Foody said the $10 billion startup spends more on tokens for its internal agents than on employee pay, calling it 'pretty incredible' on the 20VC podcast. Mercor, which helps OpenAI and Anthropic train models through human experts, runs agents across recruiting, accounting and fraud detection, and has done over 5 million AI-assisted interviews. Foody bets the average enterprise will spend more on compute than headcount within five years.

Florida Attorney General James Uthmeier filed an 83-page lawsuit against OpenAI and Sam Altman personally, the first state case seeking to hold an AI CEO liable for user harm. It argues ChatGPT behaves like a companion rather than a tool, without meaningful guardrails or parental oversight, and ties it to a mass shooting, encouraged suicides and minors becoming addicted. The filing builds on a criminal investigation into a Florida State University shooting.

OpenAI expanded Codex past software with Sites, which turns an agent's work product into a hosted, interactive website instead of a local file, with Wix, Replit, Lovable and Figma as launch partners. It also shipped six job-specific plug-ins covering data analytics, sales, equity investing and more. Codex now has 5 million weekly users, and knowledge workers, already a fifth of them, are growing three times faster than developers.

Trump signed an executive order asking AI companies to give the federal government access to frontier models up to 30 days before release, on a voluntary basis, so it can benchmark their advanced cyber capabilities. The government would help choose the trusted partners that get early access. The order explicitly bars any mandatory licensing or preclearance, and Trump signed it privately, weeks after scrapping a planned ceremony with tech CEOs.

GitHub launched the Copilot app, a desktop control centre for directing several coding agents at once. A single My Work view shows active sessions, issues, pull requests and background automations across connected repositories, each agent running in an isolated environment you can inspect, redirect, test and merge. GitHub frames it as a response to commits nearly doubling to 1.4 billion a month. It is in technical preview for paid Copilot plans.

Martin Scorsese told the New York Times he uses Black Forest Labs' image generation to storyboard his films, and signed on as a partner and adviser to the startup last year. The film community erupted, one writer calling it a stain on his name and a journalist tweeting 'I feel like I'm going to throw up.' Scorsese, an architect of New Hollywood and a noted film preservationist, kept his own praise of the tech guarded.

TECHNICAL

Security researcher RyotaK at GMO Flatt found a flaw in Claude Code's GitHub Actions workflow that lets an attacker compromise any repository using it, including Anthropic's own. A common misconfiguration around allowed_non_write_users means even a command like gh issue view can be abused to exfiltrate secrets, and chaining workflows enables full repository compromise. Variants were exploited in the wild before the fix landed. If you use the workflow, audit your config and run logs.

Meta's Shah Rahman pushes back on the 'everyone is an engineer now' story. With AI writing most new code at Google, OpenAI and Anthropic, he argues teams are shipping more bugs, incidents and technical debt than two years ago, not less. Real AI-native work, he says, rests on context engineering, spec-driven development, critical verification and disciplined problem decomposition, plus security guardrails that are no longer optional.

A reflected HTML injection bug, locked down by a Content-Security-Policy strict enough to kill any script, still leaked saved passwords. The trick is Chrome's password autofill, which fills credentials into any matching email-and-password form regardless of where it submits. An attacker plants their own form, lets Chrome fill it, and exfiltrates the result through Referer header behaviour, no JavaScript needed. The writeup walks the full one-click attack against a hardened login page.

Doubleword wanted AMD's MI300X for cheaper inference: 192GB of HBM3 against the H100's 80GB, comparable FP8 compute, roughly half the list price, and rentable on demand while H100 prices climb 40% in five months. The catch is software. As of May 2026, running vLLM with DeepSeek-V4-Flash on MI300X simply doesn't work. This worklog traces the FP8 dialect mismatch and the other winding paths they hit getting it running.

ANALYSIS

Ethan He built xAI's Grok Imagine in three months after leading NVIDIA's Cosmos world model. On Latent Space he argues video models get most of their intelligence from LLMs, not from training on video, so the next leap won't be a sharper Sora but a video agent that plans, generates, edits and iterates across a whole creative task. Generative media, he reckons, will follow the path coding already did.

Charity Majors watched a conference talk that sold vibe coding as effortless, backlogs cleared and year-long rewrites done in weeks, while she knew colleagues at the same company were months into cleaning up the wreckage. Her worry isn't the tech, it's that enthusiasts and sceptics have retreated into camps and now talk about each other as caricatures rather than to each other. Both feel a real existential threat, and she sees a way to close the gap.

For years visual AI has been judged on pixels, how good the final image or video looks. This argues the more useful frontier is code-native generation, where the model produces the SVG, React component, Blender script or USD scene that renders the output rather than the pixels directly. The source of truth becomes a structured artefact you can edit, version and wire into a real production workflow, which pixel-native diffusion models can't match.

TOOLS

nanoclaw is a lightweight take on OpenClaw that runs each agent in a container for isolation, aimed at the security worries that come with an unrestrained personal agent. It connects to WhatsApp, Telegram, Slack, Discord and Gmail, keeps memory across runs, and handles scheduled jobs, all built directly on Anthropic's Agents SDK. The TypeScript project is trending hard, past 29,000 stars, a timely option as Microsoft pushes OpenClaw-style agents into the mainstream.

surya is an open-source document toolkit that goes past plain OCR to handle layout analysis, reading order and table recognition across more than 90 languages. It's the kind of ingestion layer you reach for when feeding messy PDFs and scans into a RAG pipeline or a document agent. Written in Python and maintained by the Datalab team, it's trending on GitHub with north of 20,000 stars.

TinyFish open-sourced Bigset, a multi-agent system that takes a sentence, infers a schema, and sends agents to build a structured dataset from live web pages. An orchestrator does breadth-first discovery and dispatches sub-agents that each research one row under a six-call budget, attach source URLs, and leave fields blank rather than fabricate. It deduplicates, exports to CSV or XLSX, and reruns on a schedule. AGPL-3.0, self-hosted via Docker.