Back to archive
Issue #55··26 min read·13 stories

Gaming GPUs Top Open LLM Leaderboard

Docker ships an AI agent builder. Meta buys a social network for agents. Plus, quantifying eval noise.

A builder yesterday topped the HuggingFace Open LLM Leaderboard using just two gaming GPUs, proving that cutting-edge performance doesn't always demand a data center. This win arrives as Docker launched its own AI agent builder, pushing execution further into the developer stack. Meanwhile, Meta acquired Moltbook, a social network for AI agents, suggesting a future where agents need dedicated social infrastructure.

NEWS
4 stories

Workspace AI Creates Content from User Files

Google is updating Gemini in Workspace to pull relevant information from user files, emails, and the web for content generation. Gemini can now draft documents, build spreadsheets with data analysis, design presentations, and answer questions using Drive content. These features are rolling out to Google AI Ultra and Pro subscribers.

Read full story
2

Media Revenue: YouTube Projected to Outpace Disney by 2025

YouTube is projected to become the world's largest media company in 2025, with an estimated $62 billion in revenue, surpassing Disney's media business. This growth comes from strong ad revenue and a subscription business, including YouTube TV nearing major pay-TV subscriber counts. YouTube's $500-$560 billion valuation reflects its scale as a distributor and investment in AI tools, which creators are already leveraging to produce content faster and more cost-effectively.

3

AI Agent Social Network Acquired by Meta

Meta acquired Moltbook, a social network designed for AI agents. Moltbook's founders will join Meta's Superintelligence Labs, signaling Meta's interest in talent and technology for autonomous AI systems. This acquisition underscores the industry's rapid investment and competition in agentic systems, a key frontier for real-world tasks.

4

AI Researchers Back Anthropic vs. Pentagon

Over 30 employees from OpenAI and Google, including Jeff Dean, filed an amicus brief supporting Anthropic in its lawsuit against the US government. They argue the Pentagon's "supply-chain risk" label for Anthropic hurts US AI competitiveness and chills professional debate.

TECHNICAL
5 stories
1

Trivedy: Agent Harnesses Add State, Tools to LLMs

Vivek Trivedy unpacks the "agent harness," the essential code turning an LLM into a functional agent. A harness adds durable state, tool execution, and real-time knowledge, which models lack. Key primitives include filesystems, bash/code execution, sandboxes for safety, and memory for continual learning.

2

31% of Uber's Code Now AI-Authored

Uber's internal AI stack now authors 31% of its codebase, with 92% of developers using agents monthly. The company uses specialized agents for tasks like code review and testing, shifting workflows from single-threaded IDE work to orchestrating parallel agents. This strategy aims to eliminate toil, freeing engineers for creative work, despite rising AI costs and the need for token optimization.

3

Infra Noise Skews Agent Benchmarks by 6 Points

Anthropic found that infrastructure noise, like varied CPU/RAM allocation, can skew agentic coding benchmark scores by up to 6 percentage points. This margin often exceeds the differences between top models. Providing more resource headroom reduces errors and allows agents to solve harder problems. Anthropic recommends specifying both guaranteed resource allocation and hard kill thresholds for stable results in agent evals.

4

Embed Agentic Execution in Apps with Copilot SDK

GitHub's Copilot SDK provides direct embedding of multi-step AI agent workflows into applications, moving beyond text-only interactions. It grounds agent execution in structured runtime contexts using protocols like MCP, and agents can then perform autonomous software actions outside the IDE. The SDK makes AI execution a programmable layer within your product.

5

72B LLM Tops Leaderboard Duplicating Layers

A new method, 'LLM Neuroanatomy,' helped a 72B parameter model top the HuggingFace Open LLM Leaderboard. It works by duplicating existing middle layers without retraining, running on consumer GPUs. This suggests LLMs form functional reasoning circuits, enabling performance gains on consumer hardware without costly retraining.

TOOLS
4 stories
1

Python Library Scrapes Web for AI Data, RAG

Crawlee-Python is a web scraping and browser automation library for Python, designed to gather data for AI applications like LLMs and RAG. It extracts HTML, PDFs, JPGs, and PNGs from websites, integrating with tools like Playwright and BeautifulSoup. The library supports both headful and headless modes, plus proxy rotation for reliable data extraction.

2

Agent Trains LLMs Overnight on Single GPU

Karpathy's `autoresearch` system lets an AI agent autonomously conduct LLM experiments on a single GPU. It modifies a `train.py` file based on instructions, trains for a fixed 5 minutes, and iterates to improve a model using validation metrics. This is a self-contained demonstration of autonomous AI research.

3

Go Agent Builder Ships From Docker

Docker released `docker/docker-agent`, an open-source builder and runtime for AI agents. Written in Go, it provides tools and infrastructure for constructing and managing agents within the Docker ecosystem. For builders, it offers a standardized approach to agent development and deployment using familiar Docker infrastructure.

4

Run BitNet b1.58 Locally on Modest Hardware

A guide details how to run the BitNet b1.58 model locally, even on modest hardware. It covers installing the `bitnet.cpp` implementation, downloading the 2B GGUF model, and running it as a local inference server or in interactive chat mode. The guide also shows connecting to the local server using the OpenAI Python SDK.