Back to archive
Issue #36··24 min read·12 stories

Musk: Cheapest AI Compute Moves to Space in 3 Years

New LLM serving framework ships, agentic CI for dev automation, plus how to test agent performance.

Elon Musk stated yesterday that the cheapest place to host AI compute will be in space within three years. This projection, however ambitious, points to the extreme compute demands and the lengths companies will go for efficiency. Builders also got a new high-performance serving framework for LLMs, along with practical guides for implementing agentic CI and evaluating agent performance.

NEWS
4 stories

AI Solves First Unsolved Math Conjecture

The AI system AxiomProver autonomously solved Fel's open conjecture on syzygies of numerical semigroups. It generated a formal, self-verified proof in the Lean theorem prover with no human input. This is the first time an AI has independently resolved an unsolved theoretical mathematics research problem.

2

Agent Eval Tool Adds Statistical Rigor

agentrial is an open-source Python library that evaluates AI agents by running them multiple times to generate confidence intervals for pass rates, cost, and latency. It tracks real costs, attributes step-level failures, and includes regression detection for CI/CD integration.

3

1M Token Context Arrives in Claude Opus 4.6

Claude Opus 4.6 ships with a 1 million token context window in beta. The model improves coding skills, sustains agentic tasks longer, and operates more reliably with large codebases. It also performs state-of-the-art on Terminal-Bench 2.0 and Humanity's Last Exam.

TECHNICAL
4 stories
1

AI Agent Discovers Netty Zero-Day

An AI security agent found a critical zero-day vulnerability (CVE-2025-59419) in Netty's SMTP codec. This flaw allows SMTP command injection by exploiting newline handling, bypassing email security protocols like SPF and DKIM. The agent also autonomously generated a patch, which Netty maintainers accepted.

2

Agentic CI Automates Judgment Tasks

GitHub introduces "Continuous AI," an extension of CI where AI agents handle tasks requiring judgment, not just deterministic rules. These agents can ensure documentation matches code, generate reports, update translations, detect dependency drift, and write tests, all defined via natural language within guardrails.

3

AI Agents Build Linux-Compiling C Compiler

Anthropic leveraged 16 Claude Opus instances to autonomously build a C compiler capable of compiling the Linux kernel. The experiment highlights the challenges of long-running agent teams, requiring effective test harnesses and parallel work management, despite limitations like context window pollution.

4

Four Pillars for Production Agent Evaluation

This article describes a practical method for evaluating agentic AI systems, focusing on four pillars: Task Success, Tool Usage Quality, Reasoning Coherence, and Cost-Performance Trade-offs. It details three evaluation approaches (Automated, Human, Hybrid) and highlights building a reliable pipeline from a golden dataset.

ANALYSIS
4 stories
1

Bakusevych's Framework: Score UI Tasks for AI Delegation

The AI Delegation Matrix offers a framework to decide which UI tasks to assign to AI or humans. It scores tasks on Automation Suitability (risk, reversibility) and ROI (frequency, data readiness), then maps them to three control modes: Human-Led, Assist, or Delegate.

2

Musk: Space Cheapest for AI Compute in 3 Years

Elon Musk predicts space will offer the lowest cost for AI data centers within 36 months. He cites abundant solar power, fewer regulations, and the immense energy/chip manufacturing challenges on Earth as drivers. Context for why future large-scale AI deployments might shift to orbital data centers due to energy and cost advantages.

3

Om Malik: Embedded AI, Not Frontier Models, Drives Value

Om Malik argues that AI's true value comes from "embedded intelligence" within existing workflows, not standalone frontier models. He points to examples like Claude for Excel and Adobe Photoshop, where AI augments user capabilities without requiring new tools or interfaces.

4

Griffith: AI Chat 'Brain Dumps' Are New Literary Form

Dave Griffith argues AI's "share chat" feature creates a new literary form: the "brain dump." This medium transmits the AI's reasoning, not just conclusions, offering a transparent view of its thought processes. He notes this "cognitive voyeurism" carries risks of manipulation, despite its potential for deeper understanding.

TOOLS
2 stories
2

AI Pentesting Framework Integrates 45 Tools

Zen-AI-Pentest is an AI-powered penetration testing framework that integrates over 45 security tools like Nmap and SQLMap with AI agents for autonomous decision-making. It includes safety features such as sandboxed execution and private IP blocking, plus CI/CD integration.