Back to archive
Issue #43··20 min read·10 stories

OpenAI Nears $850B Valuation

OpenRouter's COO: 1T agent tokens per day. Plus, the path to ubiquitous AI and ggml.ai joins Hugging Face.

OpenAI reportedly finalized an $850 billion valuation deal over the weekend, signaling continued investor confidence in foundational models. OpenRouter's COO shared data revealing builders are now generating a trillion agent tokens daily, highlighting the sheer scale of real-world agent deployments. This comes as new analysis details the technical path to truly ubiquitous AI, targeting 17,000 tokens per second.

NEWS
6 stories

Plugins Enable External Tool Access for Cursor Agents

Cursor launched a plugin system, allowing its agents to connect with external tools and expand their knowledge base. Partners including Amplitude, AWS, and Stripe offer plugins covering the product development lifecycle. Users can discover and install these from the Cursor Marketplace or create their own using primitives like skills and subagents.

Read full story
2

LLM Testing Platform Auto-Generates Tests, Red-Teams Agents

Rhesis is an open-source platform for LLM and agentic application testing. It auto-generates tests from requirements, simulates conversational flows, and conducts adversarial testing with Garak. The platform evaluates using 60+ RAGAS and DeepEval metrics and offers OpenTelemetry-based tracing.

3

Funding Round Nears for OpenAI at $850B+ Valuation

OpenAI is reportedly finalizing a funding round that could raise over $100 billion at an $850 billion+ valuation. Major tech players like Amazon, SoftBank, Nvidia, and Microsoft are expected to invest. This signals continued investor confidence in the AI market, relevant for founders fundraising and for builders tracking how major platforms like OpenAI are pursuing profitability (e.g., through ads in free ChatGPT).

4

Zig-Powered AI Assistant Fits in 678KB

NullClaw is a new AI assistant infrastructure built entirely in Zig, resulting in a 678 KB static binary. It runs on ~1 MB RAM, starts in under 2 milliseconds, and works on most CPUs. The system includes provider, channel, and tool support, with an architecture designed for security and modularity, making it suitable for deployment on $5 hardware and microcontrollers.

5

2-8x LLM Throughput Gains with Terradev CLI v3.6.2

Terradev CLI v3.6.2 ships with automatic vLLM workload optimization, achieving 2-8x LLM throughput gains. The update adds GitOps automation for Kubernetes, one-click HuggingFace Spaces integration, and MoE cluster templates for large models. It also integrates cost optimizations like FlashInfer and LMCache, supports multi-LoRA serving, and includes a training orchestration pipeline for multi-node GPU clusters.

6

llama.cpp Devs Join Hugging Face for Local AI

The team behind llama.cpp is now part of Hugging Face. This move aims to boost local AI development, integrating llama.cpp with Hugging Face's transformers library and improving inference accessibility. Projects remain open-source and community-led.

TECHNICAL
4 stories
1

Custom Silicon Delivers 17K Tokens/Sec for Llama 3.1 8B

Taalas unveiled custom silicon, Hardcore Models, designed for specific AI models to overcome inference latency and cost barriers. Their first product, a hard-wired Llama 3.1 8B, achieves 17,000 tokens/second. This enables developers to explore new real-time AI applications previously impractical due to latency and cost.

2

Builder's Take: Rust, Fine-Tuning Improve Agent Success

One builder shares nine observations from a year of building AI agent systems. Prioritize state-of-the-art models for unpredictable inputs, and fine-tune smaller models for specific tasks. Strongly-typed languages like Rust catch errors early, and building a "braintrust" of agents critiques work.

3

Why LLMs Lack Compiler's Semantic Closure

This article explains 'semantic closure' as a core difference between compilers and LLMs. Compilers verify their outputs against formal specifications and report explicit errors. LLMs, however, generate outputs based on statistical patterns without internal correctness verification, even when deterministic.

4

Code Mode Gives Agents Full API with 1k Tokens

Cloudflare’s new "Code Mode" technique gives AI agents access to their entire API using only 1,000 tokens. Agents write and execute JavaScript code against a typed SDK, interacting via `search()` and `execute()` functions instead of needing thousands of individual endpoint descriptions. This cuts token consumption by 99.9% for API interactions, and Cloudflare open-sourced the SDK for other builders.

ANALYSIS
2 stories
1

Opinion: AI Code Speed Creates 'Comprehension Debt'

A Hackernoon article argues that AI tools accelerate code implementation but expose human comprehension as the new bottleneck. Rapid code generation outpaces a team's ability to understand and maintain the code, leading to "comprehension debt." This encourages adding more features and complexity than necessary, creating fragile systems.

2

Agent Tool Calls Jump to 25% (OpenRouter Data)

Data from OpenRouter, processing a trillion tokens daily, shows agent tool call rates surged from under 5% to over 25% in the last year. This signals a shift to production, with reasoning tokens now comprising 50% of model output. Teams often use frontier models for planning and smaller models for tool execution.

TOOLS
1 story
1

LLM Agent Auto-Generates Repo Docs & CI/CD

OSA (Open Source Advisor), a Python library, uses LLM agents to generate README files, create code documentation (docstrings), set up CI/CD pipelines, and organize repository structures for open-source projects. This tool supports various LLM providers and can generate pull requests with suggested improvements for any repository.