MIT Technology Review got an exclusive with OpenAI chief scientist Jakub Pachocki about the company's new grand challenge: a fully automated AI researcher. The timeline: an 'AI research intern' by September that handles specific problems autonomously, scaling to a full multi-agent system by 2028. Pachocki points to Codex as an early prototype, saying the company already has 'most of what it needs' to build a research lab in a data centre.
Claude Controls Your Mac + OpenAI Plans AI Intern by Sept
OpenClaw security risks surface. Amazon's Trainium chips win over major AI labs. Plus, what the future of work looks like.
I couldn't send yesterday's edition due to tech issues. Catch up on the March 23 edition here.
Anthropic shipped computer use for Claude on Mac. OpenAI is courting PE firms with 17.5% guaranteed returns to push its tools across their portfolios. And a study of 2.4 million workers found 96% of enterprise permissions sit dormant, which is fine until an AI agent inherits them all.
Anthropic released a research preview that lets Claude type, click, and navigate applications on your Mac. If a connector isn't available for the task, Claude falls back to raw keyboard and mouse control. It pairs with Dispatch, so you can trigger tasks from your phone. Available to Pro and Max subscribers on macOS, with Anthropic warning against using it with apps that handle sensitive data.
Both OpenAI and Anthropic are forming joint ventures with PE firms to deploy engineers who customise AI models across portfolio companies, per a Reuters exclusive. OpenAI is sweetening its pitch with preferred equity carrying a guaranteed 17.5% minimum return plus downside protection. Anthropic is running the same playbook (I covered their Blackstone JV in issue #58) but without the return guarantee. The JV structure keeps deployment costs off both companies' books ahead of potential IPOs. Not everyone is buying it: Thoma Bravo walked away, questioning the long-term profit profile.
TechCrunch got an exclusive tour of Amazon's Trainium chip lab, where 1.4 million chips are now deployed across three generations. Anthropic's Claude runs on over a million Trainium2 chips, and the new Trainium3 promises up to 50% lower inference costs than Nvidia GPUs. Lab director Kristopher King said Bedrock, which handles the majority of inference traffic on Trainium2, 'could be as big as EC2 one day.'
DSPy has 4.7M monthly downloads versus LangChain's 222M, yet companies using it in production, including JetBlue, Databricks, and Sephora, consistently report faster model testing and better maintainability. The adoption gap comes down to unfamiliar abstractions, not capability. Every team goes through the same five stages: raw API calls, then structured outputs, then retries, then evaluation loops, then composability. By stage five, you've reinvented DSPy, just worse.
A luxury mechanic was losing thousands per month from hundreds of missed calls a week. His brother built Axle, a voice-based AI receptionist using a RAG pipeline with MongoDB Atlas, Voyage AI embeddings, and Claude for grounded responses. The system connects to a real phone line via Vapi, handles pricing questions accurately ('$45 conventional, $75 synthetic'), and escalates unknowns to a callback flow.
Artificial Genius built what they call a 'third generation' approach to language models using Amazon Nova on SageMaker. Instead of generating responses probabilistically, their system uses the model's learned knowledge strictly for input comprehension, then applies a deterministic layer to produce outputs. The method targets finance and healthcare where reproducibility matters. Unlike lowering temperature to zero, their instruction tuning removes output probabilities entirely.
A16z argues software companies face a binary choice: accelerate revenue growth by 10+ points with AI-native products within 12-18 months, or rebuild for 40%+ true operating margins including stock comp. Everything between those paths is no-man's land. The piece urges CEOs to find their five highest-leverage people regardless of seniority and put them in charge of process-capture sprints and documentation infrastructure.
Oren Etzioni pushes back on both AI displacement doomers and the Citadel Securities argument that a 'compute-cost ceiling' naturally brakes automation. His counterpoint: inference costs are falling roughly 10x per year, so any ceiling is really a speed bump. He also dismantles the 'people will just want more stuff' defence, noting it only works if displaced workers retain enough income to buy that stuff.
Hex's CEO shared data showing AI agents now create more dashboard components than humans do. Mintlify, which builds developer doc platforms, confirmed that agents read documentation more often than people in many cases. The takeaway isn't just about optimising for ChatGPT citations. As OpenClaw and personal agents intermediate more browsing, the real question is what it means to write when AI is the primary reader.
Strange Loop Canon argues that when companies have more AI agents than employees, management shifts from first-person control to real-time strategy. Like Waymo and Tesla built world models for autonomous driving, businesses will need simulated environments of their operations for 'what if' scenario planning. The concept already exists wherever environments are expensive and constrained: factories, power grids, airspace, and warehouses.
Research from Oso and Cyera analysed 2.4 million workers and 3.6 billion permissions: humans exercise just 4% of what they're granted. That dormant 96% was never a problem because people are slow and distracted. AI agents are neither. When an agent inherits a user account, it inherits the full permission surface and will attempt to use whatever privileges it holds.
OpenClaw can control your files, terminal, browser, Gmail, Slack, and home automation. That power comes with serious security holes: third-party skills run untrusted code with full system access, prompt injection can hijack sessions, and overprivileged API tokens enable data exfiltration. Federico Viticci burned through 180M Anthropic API tokens in weeks. The author recommends containerisation, strict token scoping, and treating every third-party skill like an untrusted npm package.
The Obsidian CEO released a set of agent skills that let AI coding agents interact with Obsidian vaults. Agents can read and write Markdown notes, query Bases (Obsidian's structured data layer), manipulate JSON Canvas diagrams, and execute CLI commands. The repo hit 16,000 stars with 453 new stars in a single day, making it one of the fastest-growing Obsidian projects on GitHub.
No framework, no database, no vector store. Agent Kernel gives AI coding agents persistent memory using three files in a Git repo: AGENTS.md (the kernel), IDENTITY.md (who the agent is), and KNOWLEDGE.md (an index of what it knows). It works with Claude Code, Codex, Cursor, and Windsurf. Memory splits into knowledge (mutable facts about the world) and notes (append-only session logs).