Back to archive
Issue #86··38 min read·19 stories

Mythos Finds 271 Firefox Bugs. Google's Code Is 75% AI-Generated.

OpenAI ships workspace agents, Qwen 3.6-27B beats its own 397B model, npm worm spreads

Anthropic's Mythos found 271 security vulnerabilities in Firefox 150 before release, a 12x jump over Opus in a single model generation. Google dropped the biggest number at Cloud Next: 75% of its new code is now AI-generated, up from 50% last fall, alongside new TPU 8t and 8i chips built for agentic workloads. OpenAI launched Codex-powered workspace agents that keep working after you close the tab, and Alibaba's Qwen 3.6-27B is beating models 15x its size on coding benchmarks.
NEWS

Mozilla's CTO reports that early access to Anthropic's Mythos Preview helped identify 271 security vulnerabilities in Firefox 150 before release. For context, Anthropic's Opus found just 22 bugs analysing Firefox 148 last month, a 12x jump in a single model generation. Bobby Holley called it the moment "defenders finally have a chance to win, decisively" in the security arms race.

OpenAI introduced workspace agents in ChatGPT, an evolution of GPTs designed for team workflows. Powered by Codex, these agents handle multi-step tasks like qualifying leads, drafting follow-ups, and preparing reports. They run in the cloud, keep working when you close the tab, and respect organisational permissions. Available in research preview for Business, Enterprise, and Edu plans.

Microsoft ported the entire TypeScript compiler from TypeScript to Go, delivering 10x faster type-checking, parsing, and project builds. The migration was a methodical port, not a rewrite, so type-checking semantics are identical to 6.0. Bloomberg, Canva, Figma, Google, Linear, Notion, Slack, and Vercel have been running pre-release builds for months. Despite the "beta" label, Microsoft says you can start using it in production today.

A new supply chain attack is spreading through the npm ecosystem by stealing developer credentials, including npm publish tokens, then injecting malicious code into other packages from compromised accounts. The worm targets packages used in AI agent tooling and database operations. Socket and StepSecurity identified 16 compromised Namastex Labs packages so far. If you use any of them, rotate every secret immediately.

Sundar Pichai's Cloud Next keynote landed three big numbers: 75% of Google's new code is now AI-generated (up from 50% last fall), and the company shipped TPU 8t for large-scale training and TPU 8i for low-latency inference. The eighth-generation chips are designed specifically for agentic workloads. Google also announced the Gemini Enterprise Agent Platform for managing thousands of agents across organisations.

Alibaba released Qwen 3.6-27B, a dense open-weight model that outperforms the previous flagship Qwen 3.5-397B (a 397B MoE with 17B active parameters) across all major coding benchmarks. The entire model fits in 55GB unquantised or 17GB quantised, making it runnable on consumer hardware. Simon Willison tested the Q4 quantisation on his laptop and got strong SVG generation at 25 tokens per second. An open-weight 27B model matching what previously required 397B parameters is a significant compression milestone.

TECHNICAL

Everyone is staring at Gemma 4's benchmark numbers, but the actual design shift is deeper. Google shipped two entirely different architectures under one family name: E2B/E4B for phones (where DRAM is scarce and flash is abundant) and 26B/31B for servers (where the constraints flip). Phones use per-layer embeddings to compress memory at the cost of compute. Servers use interleaved local-global attention to save compute at the cost of memory.

Ring built billion-scale semantic video search on Amazon RDS for PostgreSQL with pgvector, serving sub-second queries across 4 continents and 9 AWS regions. The counterintuitive decision: they skipped traditional vector indexes entirely. Instead, they partition tables by user and run brute-force parallel scans, achieving 100% recall while keeping ingestion fast. The system processes 2 billion new embeddings daily across 150+ TB.

A Rust panic in a Cloudflare Worker could poison the entire Wasm instance, taking down sibling requests and sometimes bricking the Worker for minutes. The root cause was in wasm-bindgen, which had no built-in recovery semantics. Cloudflare built panic=unwind support so destructors run properly, then added abort recovery to prevent re-execution after fatal errors. Both fixes are now upstreamed to wasm-bindgen.

You ask a model to fix an off-by-one error. It fixes it, then rewrites the entire function, renames variables, and adds validation nobody asked for. This research quantifies the problem using Levenshtein distance and added cognitive complexity. GPT-5.4 with high reasoning is the worst offender. The key insight: over-editing is invisible to test suites because the code still passes. Only code review catches it, and bloated diffs make review harder.

ANALYSIS

Om Malik surveys the AI coding gold rush: SpaceX bidding $60B for Cursor, Sergey Brin coming out of retirement to lead a DeepMind coding strike team, Claude Code revenue hitting a $2.5B run-rate that doubled since January. Meanwhile, Meta is recording employee keystrokes and screenshots to train agents that automate their jobs. The people who build software are building the machines that replace them.

Elena Verna argues the traditional Ideal Customer Profile is breaking down. As agents interact with products via MCP on behalf of users, your "user" may never actually touch your product. That changes everything from onboarding flows to pricing to what makes a product defensible. Speed and output reliability replace visual polish. Friction and habituation, the old moats, stop working when the buyer is a model.

Zvi Mowshowitz's deep analysis of Anthropic's model welfare work suggests something went wrong with Opus 4.7. The model reports high welfare scores in structured interviews but may be exhibiting trained sycophancy rather than genuine well-being. Low-level patches and shallow methods were "seen right through" by the model. Zvi argues the parallels to the broader alignment problem are obvious and urges Anthropic to move beyond metric-driven welfare checks.

Digital Native's Rex Woodbury argues that as AI makes software infinitely producible, the bottleneck shifts from engineering to editorial judgement. Software is becoming more like culture and less like infrastructure, rewarding specificity and point of view over standardisation. Andreessen's "Mexican standoff" between PMs, designers, and coders is real, but designers hold the strongest hand because taste is the hardest skill to automate.

A PM with 15 years of enterprise experience builds an HVAC calculator app solo and discovers something uncomfortable: he'd been insulated from user impact his entire career. Designers caught readability issues. CSMs absorbed friction. Contracts kept users locked in. Building alone removed every buffer. The feedback loop collapsed to: someone downloads your app and either keeps using it or doesn't.

TOOLS

Google Labs open-sourced the DESIGN.md specification from their Stitch app. The format lets you define design rules, colour semantics, spacing tokens, and accessibility constraints in a single file that AI agents can parse. Instead of guessing intent, an agent knows exactly what a colour is for and validates choices against WCAG rules. Export once, import across projects and tools.

Jan is a TypeScript-based desktop app that runs large language models entirely on your local machine. No cloud, no API keys, no data leaving your device. With 42,000 GitHub stars and 75 new ones today, it's the most popular open-source option for teams that need AI capabilities but can't send data to external servers. Think ChatGPT's interface with local-only inference.

Assign a Linear issue to Broccoli and it plans, implements, and opens a pull request while you sleep. The open-source agent runs on your own Google Cloud (Cloud Run + Secret Manager), so no data leaves your tenancy. It uses Claude and Codex for code generation and review, pushes fix commits when asked, and deploys in about 30 minutes with one config file.

An NVIDIA engineer built a working VLA demo with Gemma 4 on a Jetson Orin Nano Super. You speak, Parakeet transcribes, Gemma decides whether it needs to look through the webcam to answer, and Kokoro reads the response back. No keyword triggers, no hardcoded logic. The model decides on its own whether visual context is needed. The full pipeline runs locally on a device that costs less than a month of cloud API bills.