Back to archive
Issue #69··28 min read·14 stories

Vulnerability Research Is Cooked + SpaceX's $1.75T IPO

AI exploit dev arrives as a step function, Codex ships plugins, and the ladder is missing rungs.

Thomas Ptacek argues AI exploit development will arrive as a step function, not gradually, and vulnerability research as we know it is ending. SpaceX is preparing the largest IPO in history at $1.75 trillion, while the WSJ digs into the decade-long feud between Dario Amodei and Sam Altman that's shaping how the industry approaches safety, competition, and going public. Alasdair Allan's QCon London talk adds data showing AI succeeds at the exact tasks that used to train junior engineers, raising questions about where the next generation of senior engineers comes from.
NEWS

SpaceX is preparing the largest IPO in history, targeting a $75 billion raise at a $1.75 trillion valuation. For context, the entire US IPO market only raised more in two of the past ten years. The company going public is effectively a new conglomerate after Musk merged xAI and X into SpaceX, making past financial performance largely irrelevant to investors. Thirty percent of the offering is reportedly reserved for individual investors, triple the norm.

The WSJ traces the personal and strategic rift between Sam Altman and Dario Amodei back to their time building OpenAI together. Amodei has privately compared the Altman-Musk legal battle to "Hitler and Stalin" and called a $25 million pro-Trump donation by Greg Brockman "evil." Anthropic's internal brand strategy positions itself as the "healthy alternative" to OpenAI, a framing rooted as much in personal wounds over power and credit as in genuine safety disagreements.

TECHNICAL

Qwen's qwen3-coder-next scores 6.75% on first-try function calling for complex API schemas. AutoBe's response: don't fight the failure rate, build a compiler-validated loop around it. Their harness uses type schemas, AST validation, and self-healing loops to achieve 100% compilation success, even with small models. The insight is that structured output doesn't need perfect generation if you can verify and correct deterministically.

Before launching Comet, Perplexity hired Trail of Bits to audit its AI-powered browser. Using their TRAIL threat model, the security firm demonstrated four prompt injection techniques that could extract users' private data from Gmail through the browser's AI assistant. The vulnerabilities stem from treating external web content as trusted input to the LLM. Recommendations: ML-specific threat modelling, clear instruction boundaries, systematic red-teaming, and least-privilege for agents.

After Anthropic published their multi-agent architecture for autonomous coding, Nathan Delacretaz replicated it in the open with compound-agent. Same benchmark: build a browser-based Digital Audio Workstation. Anthropic's version took 3h50m on Claude Opus for $125. The open-source harness decomposed the project into 18 dependency-ordered tasks with multi-model review, producing a working DAW with subtractive synth, piano roll sequencing, and DSP processing.

A Vercel engineer spun up 8 coding agents from his phone before bed, each targeting a different part of Turborepo's Rust codebase. By morning, 3 of 8 had produced shippable wins. But the real gains came from mixing agent output with traditional profiling and Vercel Sandboxes for isolated testing. Result: Turborepo's task graph computation dropped from 10 seconds to near-instant on a 1,000-package monorepo.

Instead of building 256 separate models for every language pair, Roblox trained a single unified transformer using Mixture of Experts. It handles real-time chat translation across 70 million daily users at 100 milliseconds per request and over 5,000 chats per second. The engineering challenge wasn't building a model that could translate. It was building a system that translates at conversation speed without degrading the user experience.

ANALYSIS

AI has dissolved the barriers to making things. The question is no longer "can we build this?" but "how do we know if it's any good?" Joshua Leigh argues taste isn't aesthetic preference but disciplined contextual judgement, developed through practice and sharpened by honest reckoning with what works. As production cost approaches zero, the capacity to evaluate becomes the scarce skill that separates good work from noise.

Security researcher Thomas Ptacek argues AI exploit development will arrive as a step function, not a slow burn. Within months, pointing an agent at a source tree and typing "find me zero days" will produce substantial results. The economics collapse when the tedious parts of vulnerability research, tracing inputs across program internals, understanding font rendering quirks, become trivially automatable. Ptacek sees this as locked in, with consequences for the internet's security model.

At KubeCon Europe 2026, Kelsey Hightower pushed back on the idea that AI makes open source expendable. If companies won't contribute to and maintain open source, they have no chance with AI, he argued. Hightower challenged developers to stop waiting for perfect AI tools and start building with what exists now. His framing: everyone is a junior engineer when it comes to AI, which means the playing field is temporarily level.

Alasdair Allan's QCon London talk examines an underexplored consequence of AI coding tools. The tasks AI handles best, debugging, boilerplate, small feature work, are precisely what trained junior engineers to become senior ones. Google reports 25% AI-generated code internally, Microsoft 30%, and Copilot sees roughly 30% acceptance rates. The ladder isn't just missing rungs. It's missing the process that created the people who built it.

TOOLS

LlamaIndex open-sourced the local parsing core behind LlamaParse as a standalone CLI and TypeScript library. LiteParse extracts layout-aware text from PDFs, Office documents, and images with zero Python dependencies, running entirely locally. It's optimised for agent workflows where speed matters more than perfect table extraction. For complex document intelligence requiring structured output, LlamaParse remains the cloud option.

AGENTS.md doesn't scale. lat.md replaces it with a graph of interconnected markdown files using wiki links, where sections reference each other and link to specific code locations. Agents search the graph instead of grepping the codebase, and a CLI enforces referential consistency so documentation never drifts from code. Test specs can require backlinks from test files, turning the knowledge graph into a coverage tracker.

OpenAI's Codex now supports plugins that bundle reusable skills, app integrations for Gmail, Slack, and Google Drive, and MCP servers into composable workflows. Install a plugin and Codex can read your email, pull from Drive, or summarise Slack channels. The plugin model mirrors what Claude Code does with skills and MCP, but packaged as a curated directory with one-click installs. Existing permission settings still apply.