Back to archive
Issue #82··32 min read·16 stories

Codex Clicks Around Your Mac. Feds Queue Up for Mythos.

Physical Intelligence robots improvise. Ukraine takes a trench with drones. CoreWeave lands $7B.

OpenAI shipped the biggest Codex update yet, with background computer use, an in-app browser, image generation, and persistent memory. The White House reversed its Anthropic stance and queued Mythos for six federal agencies. Same day: Physical Intelligence's π0.7 robots started improvising untaught tasks and Ukrainian drones captured a Russian position without a soldier on the ground.
NEWS

On Ukraine's Arms Makers' Day, Zelenskyy confirmed the first officially recognised seizure of enemy terrain using only unmanned platforms. A combat stack of reconnaissance, FPV kamikaze, and armed ground robots forced a Russian surrender with no Ukrainian boots on the assault. It signals a doctrinal shift: autonomous systems can now handle the most dangerous initial phase of combined arms.

Quant trading firm Jane Street signed a $6 billion cloud deal with CoreWeave and took a $1 billion equity stake, locking in access to Nvidia's next-generation Vera Rubin architecture. The move signals how financial firms are now directly investing in the compute infrastructure their AI models depend on, bypassing hyperscaler queues. The deal also underscores how tight specialised AI compute supply has become.

OpenAI released a major Codex update giving it background computer use. Codex now sees, clicks, and types with its own cursor while you work in other apps on the same Mac, with multiple agents running in parallel. The app also gets an in-app browser, native image generation, and persistent memory across sessions. More than three million developers use Codex weekly.

An OMB memo reveals the White House is preparing to distribute Anthropic's held-back Mythos model to Defense, Treasury, Commerce, DHS, Justice, and State. OMB is building protections to allow federal use in coming weeks. The move reverses earlier friction between Anthropic and the Trump administration. Mythos remains restricted from public release under Project Glasswing, which already gave preview access to Nvidia, Microsoft, Google, Apple, and JPMorgan.

Physical Intelligence published research showing its new π0.7 robotics model can perform tasks it was never explicitly trained on, a capability the company's own researchers admit surprised them. The model shows compositional generalisation, combining skills across contexts rather than memorising each task. If the claim holds, robotic AI may be approaching an LLM-style inflection point where capabilities compound faster than the training data would predict.

Google shipped three Gemini releases this week. Gemini 3.1 Flash TTS adds granular audio tags for expressive speech across 70+ languages, watermarked with SynthID. The Gemini app is now native on macOS 15+ with an Option+Space screen-share shortcut. And Gemini Robotics-ER 1.6 improves spatial reasoning and can read instruments like pressure gauges for embodied agents.

TECHNICAL

MacMind is a complete transformer neural network written entirely in HyperTalk on a Macintosh SE/30, with 1,216 parameters learning the bit-reversal permutation. It has token embeddings, positional encoding, self-attention, scaled dot-product scores, cross-entropy loss, and full backpropagation. Option-click any button to read the actual math. The project shows that the process training GPT-4 is the same one running on a 68000 processor from 1987.

A new arxiv paper introduces TREX, a multi-agent system that runs the full LLM training pipeline: requirement analysis, literature search, data recipe prep, strategy formulation, training, and evaluation. Two modules called Researcher and Executor collaborate, modelling multi-round experiments as a search tree to reuse results across iterations. On a new 10-task fine-tuning benchmark derived from real scenarios, TREX consistently optimises target-task performance.

Three weeks after shipping an internal knowledge base, a compliance query returned a confident but incomplete answer. The exception clause was in the source, ingested, and embedded. Retrieval never surfaced it because the chunk boundary split the rule from its qualification. The author breaks down why semantic chunking, overlap, and late-chunking strategies all leak different classes of information, and how to structure chunks around intent rather than length.

The article makes the case that enterprise AI code generation needs the two-pass compiler pattern: first pass parses intent into a deterministic intermediate representation, second pass emits the final code from that IR. The approach replaces 'prompt twice, get two different programs' with 'prompt twice, compile to the same IR.' The framing borrows from 1990s-era language design to impose determinism on probabilistic LLM outputs.

ANALYSIS

The xz-utils backdoor was caught by accident when one Microsoft engineer noticed SSH logins were half a second slow. Since then, Shai-Hulud compromised 500 npm packages, and LLMs are lowering the barrier for attackers to submit patient, credible patches over multi-year windows. The post lays out a practical framework for pinning dependencies, monitoring maintainer turnover, and auditing ingress to compiled artefacts rather than just source code.

HBM allocation by SK Hynix, Samsung, and Micron has squeezed consumer DRAM so hard that phones, laptops, consoles, and medical equipment face shortages through 2028. The post proposes that the US should consider CXMT, China's leading DRAM maker, as a supply-side pressure valve for consumer memory while still banning it from defence and hyperscaler use. The tradeoff: accept some dependency in exchange for market stability.

Interconnects' Nathan Lambert argues the stable equilibrium between open and closed models will not be uniform across capability areas, and anyone predicting full convergence is misreading the forces. Supply is dictated by economics: which business strategies sustain open releases. Demand is already there from individuals, organisations, and sovereigns. Distillation, regulation, and compute access each push the capability gap in different directions.

TOOLS

Cloudflare launched AI Search (formerly AutoRAG) as a single primitive for agent retrieval. It does hybrid semantic and keyword matching in parallel, comes with built-in storage and vector index, and lets you spin up search instances per agent or per customer at runtime from a Worker. The ai_search_namespaces binding removes the need for R2 buckets, external vector DBs, and manual pipeline wiring.

Libretto is a toolkit from Saffron Health for building deterministic browser automations. It hands a coding agent a live Chromium instance plus a CLI to inspect pages, capture network traffic, replay recorded user actions, and debug broken workflows interactively. The team built it to maintain healthcare software integrations, and open-sourced it so other teams do not have to reinvent the work.

Vercel took Workflows to general availability, extending framework-defined infrastructure to long-running systems. Since the October 2025 beta, it has handled 100 million runs and 500 million steps across 1,500+ customers. Deep AI SDK integration supports infinitely-long durable agents that hold state and handle external events. The SDK is now also available in Python beta, beyond the original TypeScript codebase.