Back to archive
Issue #2··26 min read·13 stories

Your LLM Agents Keep Failing Basic Instructions? Here's Why.

Claude plugin for coding context, sales teams replaced by agents, and xAI's 2 GW power needs

Yesterday, Unsloth released a tool for 2x faster LLM fine-tuning, using 70% less VRAM. This directly impacts iteration speed and compute spend for builders. Separately, new research explains why LLMs miss basic prompt instructions, a critical lesson for designing agentic systems that actually work.

NEWS
6 stories

2026 IPOs: Anthropic, OpenAI, SpaceX

Analysts expect 2026 to see major AI IPOs, with Anthropic and OpenAI potentially debuting publicly. These could collectively surpass hundreds of prior US offerings. SpaceX is also reportedly in the mix, alongside three Chinese tech giants planning $1B+ Hong Kong IPOs.

Read full story
2

Building AI Legacy: Beyond Wealth Accumulation

This article argues that in a post-Singularity future, builders should prioritize contributing to broad prosperity or creating lasting work over accumulating personal wealth. It frames the current AI era as a unique historical moment to shape a positive future for humanity.

3

xAI's Colossus Supercomputer Nears 2 GW Power

xAI is expanding its Colossus supercomputer with a third building, "MACROHARDRR," aiming for nearly two gigawatts of power and over one million GPUs. This multi-billion dollar investment boosts xAI's training capacity but forces them to build a natural gas plant due to massive energy demands.

4

Optimal Small LLM: 3.8x Faster, More Factual

At 70M, many transformer variants cluster tightly; recipe and data dominate. New research shows a diffusion model, Dhara-70M, sacrifices 1.33% accuracy (on specific benchmarks) for a 3.8x throughput boost and superior factuality (measured by a specific metric).

5

Gemini 3 Flash Crushes GPT-5.2 Pro and Opus 4.5 on Reading Comprehension

Gemini 3 Flash (68.5%) destroys every premium model on the Misdirected Attention benchmark, including GPT-5.2 Pro (61.7%) and Opus 4.5 (60.7%). The benchmark tests whether models actually read your prompt or just vibe-match patterns. Even top models only score ~50-68% on simple perturbations like "five dead people vs one living person." Translation: your AI features will break on edge cases more than you think, and paying for reasoning tokens won't fix models that don't comprehend the question.

6

Instagram CEO: Don't Trust Online Content Anymore

Instagram CEO Adam Mosseri declared the 'intimate feed' dead, stating users 'can’t necessarily trust what you see anymore' due to AI. This marks a fundamental shift in digital authenticity assumptions, requiring new verification UX.

TECHNICAL
2 stories
2

Build Your Own Deep Learning Library with NumPy

A new book teaches how to build a deep learning library from scratch, focusing on an autograd engine and layer modules using only NumPy. You'll create a custom library capable of training models like MNIST, simple CNNs, and simple ResNets.

ANALYSIS
1 story
1

SaaStr Founder Swaps 10 Sales Reps for 20 AI Agents

SaaStr founder Jason Lemkin states, 'We replaced our team of 10 Sales Execs with 20 AI Agents, managed by 1.2 humans.' He discusses effective AI tools for sales and predicts most SDRs/BDRs will be obsolete within a year.

TOOLS
4 stories
2

Claude-Mem Gives Claude Persistent Coding Memory

Claude-mem is a Claude Code plugin that captures tool calls, diffs, and summaries during coding sessions. It compresses this data using Claude's agent-sdk and injects relevant context into future sessions, creating a persistent memory. Implement filters, allowlists, or redaction for secret hygiene.

3

NVIDIA Delivers Reproducible DL Examples

NVIDIA's DeepLearningExamples repository offers deep learning scripts. Pick one model, run the reference script on your target GPU, capture throughput and accuracy, and use it as a baseline for optimization and vendor comparisons.

4

Tasker Automates Desktop Tasks with AI

Tasker is a free, open-source desktop agent for browser automation. It records actions or takes plain English descriptions, using AI to adapt to website changes and run locally for privacy. It is still brittle on heavy JS apps, CAPTCHA, 2FA, and anti-bot.