Back to archive
Issue #67··28 min read·14 stories

Leaked: Anthropic's Mythos Model Is a 'Step Change'

IPO talks at $60B, companies ditch frontier APIs, plus Harvey hits $11B and the LiteLLM attack response.

Editor's Take

Interesting trend I'm noticing. Companies are moving away from pure API reliance on frontier models. Instead they're taking open weight models (many of them Chinese), post-training them on their own data, and shipping purpose-built AI that's cheaper and often better at their specific use case.

Cursor built Composer 2 on Kimi 2.5, post-trained on their own user data, and they're deploying improved checkpoints every five hours. Intercom launched Fin Apex 1.0, a custom support model trained on years of their data that they claim outperforms frontier models at one-fifth the cost. Pinterest, Airbnb and Notion are doing similar things quietly, and the HuggingFace founder says hundreds of other companies are too without announcing it.

I don't think this means Claude or OpenAI GPT are going anywhere. If you're a smaller company without much proprietary data, frontier APIs are still the move. But if you're sitting on years of domain-specific data and operating at scale, training your own model on an open source foundation is starting to look like a competitive advantage. Better cost structure, less dependency on a handful of providers, and a model that's deeply good at your specific thing.

Inspired by my friend Harrison Shalhoub's writing on the same trend, worth a read.

NEWS

Legal AI startup Harvey closed $200 million at an $11 billion valuation, a 3.5x jump in one year. Sequoia co-led the round for the third time, alongside Singapore's GIC. The company has now raised over $1 billion total. Harvey's AI agents handle contract analysis, due diligence, and litigation for more than 1,300 customers across 60 countries.

Sources tell The Information that Anthropic executives have discussed going public as early as Q4 2026. Bankers competing to underwrite the offering expect it to raise more than $60 billion. That would place it among the largest tech IPOs ever. The timing aligns with Anthropic's rapid enterprise growth and the Mythos leak suggesting a significant capability jump is imminent.

Around 3,000 unpublished assets from Anthropic's content management system were accidentally made public, revealing Claude Mythos (codenamed Capybara), a new model tier above Opus. Anthropic confirmed it represents a "step change" in performance, with dramatically higher scores on coding, reasoning, and cybersecurity benchmarks. The company flagged the model as "far ahead of any other AI model in cyber capabilities", raising concerns about offensive use.

TECHNICAL

A three-tier walkthrough of how vector databases work, from basic similarity search to production indexing. Level one covers how embeddings convert unstructured data into searchable vectors. Level two introduces metadata filtering and hybrid search for real queries. Level three gets into the algorithms that make it scale: HNSW for graph-based search, IVF for partitioning, and product quantisation for compressing high-dimensional vectors.

Katie Parrott at Every.to examines four apps where AI agents orchestrate functionality using only basic tools like file I/O and web search. The pattern that emerged: simpler tools yield more creative AI combinations, and safety rules work best when embedded directly into the tools rather than the orchestration layer. The trade-off is less predictability in speed and cost, but more flexibility as capabilities improve.

Following last week's LiteLLM supply chain attack, the developer who first spotted it published their full Claude Code conversation transcript. What began as investigating a frozen laptop escalated into discovering credential-stealing malware, a fork bomb, and Kubernetes lateral movement code hidden in litellm_init.pth. From first symptom to published disclosure took 72 minutes. The post argues AI tooling has fundamentally sped up not just malware creation but also detection.

ANALYSIS

Alex Karp's blunt prediction: vocational skills and neurodivergence are the two paths to career security in the AI era. Skilled trades are hard to automate, and different cognitive wiring fosters the kind of unconventional thinking AI cannot replicate. Palantir runs a dedicated Neurodivergent Fellowship, actively recruiting people who think differently. Gartner expects a fifth of Fortune 500 sales teams to recruit neurodivergent talent by 2027.

Coding agents generate PRs fast, but enterprise validation is still slow. If each change needs 30 minutes in a shared staging environment and an agent-assisted dev pushes 5-6 PRs a day, most of the day is queue management. The bottleneck is no longer writing code but proving it works across a distributed dependency graph. Validation needs to move inside the development loop, not sit at the end of it.

Dave Friedman names the metric that matters for AI valuations: not the capability gap between open and closed models (which is compressing), but the monetizable spread, the subset of that gap someone will actually pay a premium for. That number is declining faster than raw capabilities. If open models reach "good enough" for most enterprise use cases, the current multiples on frontier lab equity start looking fragile.

Cursor built Composer 2 on Moonshot's Kimi 2.5 and deploys improved checkpoints every five hours using real-time reinforcement learning from user interactions. Intercom's Fin Apex 1.0 reportedly beats GPT-5.4 and Claude Sonnet 4.6 on support metrics at a fifth of the cost. Airbnb and Pinterest use Alibaba's Qwen. The HuggingFace founder estimates hundreds of companies are doing this quietly.

TOOLS

An open-source research agent that runs locally on your machine. Point it at a topic and it searches papers via AlphaXiv, pulls web results through Perplexity, reads full-text sources, and writes cited drafts. It can also run experiments in sandboxed Docker containers, bursting to Modal or RunPod for GPU compute. Workflows include deep research, literature reviews, paper audits, and automated replication.

An open-source AI gateway at 122KB that routes requests to over 1,600 language, vision, audio, and image models. Handles automatic retries, fallbacks, load balancing, and guardrails, processing over 10 billion tokens daily in production. The 2.0 release also ships Portkey Models, an open-source database tracking pricing for 2,300+ models across 40 providers, useful for teams optimising LLM spend.

A TypeScript framework for running multiple Claude Code instances as a coordinated team rather than isolated agents. It structures multi-agent workflows so different agents can handle different parts of a codebase simultaneously with shared context. Already at 12,000+ GitHub stars with 576 new stars per day, making it one of the fastest-growing Claude Code tools right now.

A SaaSpocalypse Survival Scanner that answers the question every SaaS founder is quietly dreading: can your product be replaced by a Claude Code skill? Enter your product and it evaluates whether a Markdown file with the right prompts could replicate your core functionality. It's part existential crisis, part entertainment, and unfortunately for some, part reality check.