Issue #54·Tuesday, March 10, 2026·24 min read·12 stories

Popular AI Eval Tool Promptfoo Joins OpenAI

Yann LeCun raises $1B; Karpathy details autonomous ML agents. Plus, Uber Eats' AI search architecture.

Promptfoo, a tool many builders use to test LLM quality, joined OpenAI yesterday. This consolidates a key evaluation layer under a major model provider. Separately, Yann LeCun landed a $1 billion raise for his vision of AI that understands the physical world, a massive wager on a long-term research direction. Andrej Karpathy also detailed how autonomous agents can drive ML experimentation, offering a blueprint for accelerating scientific discovery.

▲NEWS

2 stories

AI Safety & Evaluation Platform Promptfoo Acquired by OpenAI

OpenAI acquired Promptfoo, a platform for systematically testing AI applications. Promptfoo's security, evaluation, and compliance technology will integrate into OpenAI's model and infrastructure layers. Promptfoo will remain open-source, continuing to support various AI models.

Read full story→

World Models Startup Raises $1B, Co-Founded by LeCun

Yann LeCun co-founded Advanced Machine Intelligence (AMI) and raised over $1 billion to build AI "world models." LeCun argues current LLMs lack physical world understanding for human-level intelligence. AMI plans to develop systems with persistent memory, reasoning, and planning, targeting enterprise applications in manufacturing and robotics with an open-source commitment.

⚙TECHNICAL

4 stories

AI Agents Autonomously Optimize ML Training Code

Andrej Karpathy's "autoresearch" project uses AI agents to autonomously conduct machine learning research. Agents iteratively edit `train.py` code based on a `program.md` prompt, optimizing hyperparameters and architectures. Experiments run for 5 minutes, measuring success by validation bits per byte.

Uber Eats Rebuilds Search with Qwen LLMs and Semantic Embeddings

Uber Eats switched from keyword search to a semantic system, using fine-tuned Qwen LLMs in a two-tower architecture. This converts queries and items into numerical embeddings to capture meaning and intent. They scale to billions of items with Approximate Nearest Neighbor (ANN) search, HNSW, and pre-filters. Matryoshka Representation Learning (MRL) reduces embedding storage costs, and dual embedding columns facilitate continuous updates.

GTM Agent Boosts Sales Conversion 250%

LangChain built a GTM agent to automate sales outreach. The agent triggers on new leads, researches account and contact data from Salesforce, Gong, LinkedIn, and the web, then drafts personalized outreach emails for reps to approve. This increased lead-to-opportunity conversion by 250% and saved reps significant time. The system uses Deep Agents for orchestration and learns from rep edits through a human-in-the-loop process.

Securing Agentic Workflows in CI/CD: GitHub's Approach

GitHub details the security architecture of its Agentic Workflows, addressing challenges of autonomous AI agents in CI/CD. Their layered approach emphasizes defense-in-depth and not trusting agents with secrets. Strategies include isolating agents in containers with restricted network access, using an API proxy for LLM communication, and a "safe outputs" subsystem to vet all agent-generated writes before applying them to the repository.

◈ANALYSIS

4 stories

a16z: ChatGPT Holds Lead Amidst Global AI App Splinter

The 6th a16z report on top 100 generative AI consumer apps, based on January 2026 data, shows ChatGPT's continued dominance. Gemini and Claude are rapidly growing paid subscribers and app integrations. The report notes a global splintering of AI usage, with Western tools largely absent in China and Russia; DeepSeek bridges some of this divide. AI is also integrating into existing consumer products, moving beyond just AI-native apps.

One Take: AI-Generated Code Creates 'Cognitive Debt'

A new analysis identifies "Cognitive Debt": the gap in a team's understanding of AI-generated code. This debt shows up as slower reviews, harder debugging, and longer onboarding times. The proposed solution involves deep understanding of core domain code, with AI handling outer layers like configuration.

Thesis: AI Makes 10x Engineers The Baseline

Nikunj Kothari argues AI has reset productivity standards, making the "10x engineer" a common reality. Startups and individuals quickly adopt AI tools, while larger corporations lag. The piece suggests those who leverage AI become "conductors" of agents, while those who resist risk falling behind.

A Framework: 8 Levels of Agentic Engineering Progress

A new framework breaks down AI-assisted software development into 8 distinct levels. It tracks progression from basic IDE integration to complex autonomous agent teams. The article highlights techniques like context engineering, multi-modal capabilities, and automated feedback loops, noting Level 7 offers significant leverage for builders today.

⚒TOOLS

2 stories

UI Annotations Feed AI Code Agents

Agentation lets you annotate UI elements with notes, then generates structured context like CSS selectors and file paths. This context feeds into AI tools like Claude Code, giving agents precise targets for debugging and development. Agents can also engage in conversational feedback about annotations.

Free AI PR Review Tool Adds Autofix

Devin Review is a free, no-signup tool for GitHub pull requests. It includes autofix suggestions, smart diff organization, copy/move detection, and codebase-aware chat. To use it, replace 'github' with 'devinreview' in any GitHub PR URL.