Back to archive
Issue #108··36 min read·18 stories

Nvidia Concedes China to Huawei; Agents Still Can't Code

GitHub breached via poisoned VS Code extension; $9B for spy agencies; Mythos 1 spotted heading for Claude Code

Jensen Huang told CNBC that Nvidia has "largely conceded" China's AI chip market to Huawei, even as revenue surged 85% to $81.6 billion. George Hotz spent six months coding with agents and concluded they produce slop that gets harder to detect, not software that gets closer to correct. And TeamPCP breached GitHub itself through a poisoned VS Code extension, putting 3,800 internal repos up for sale on BreachForums.

NEWS

Jensen Huang told CNBC that Nvidia has "largely conceded" China's AI chip market to Huawei after US export restrictions locked them out. Revenue still surged 85% to $81.6 billion, and the company unveiled an $80 billion buyback. But the Chinese market that once drove a fifth of data centre revenue is gone, with Huang telling investors to "expect nothing" on approvals to sell there again.

A group called TeamPCP breached GitHub itself through a poisoned VS Code extension, accessing 3,800 internal repositories now listed for sale on BreachForums. Ars Technica reports the group has run about 20 waves of supply chain attacks, tampering with over 500 packages and tools. The Laravel Lang localisation packages were also hit in a separate campaign that rewrote Git tags across 700+ versions.

Governor Newsom signed a first-in-the-nation executive order directing state agencies to prepare for AI-driven job displacement. The order calls for exploring severance standards, employment insurance for displaced workers, universal basic capital concepts, and expanded training programmes. California is also building early-warning systems to track hiring and payroll trends before mass layoffs hit.

The White House approved a secret $9 billion request for cutting-edge chips that the CIA and NSA need to run the latest AI models on classified systems. The funding targets Nvidia Grace Blackwell infrastructure. Separately, the NSA is finalising a classified contract with Anthropic to keep using Claude Mythos, even after the Pentagon flagged Anthropic as a supply chain risk.

TestingCatalog spotted traces of a new model called Mythos 1 with a claude-mythos-1-preview label being prepared for Claude Code and a new Claude Security product. The Security side is getting a dashboard for discovered vulnerabilities with historical charts and triage. Anthropic's Project Glasswing update confirmed Mythos-class models could reach general availability once safeguards are in place. Claude Opus 4.8 is also being readied.

Google's CEO sat down with Hard Fork after I/O to discuss the growing public backlash against AI. A recent NYT/Siena poll found 35% of respondents now view AI as "mostly bad" versus just 16% "mostly good," and graduates have been booing mentions of AI at commencements across the country. Pichai outlined Google's bet that useful products will overcome the scepticism.

TECHNICAL

A Towards Data Science piece tackles the cost problem of moving agentic AI from demo to production. Two techniques stand out. "Early Commitment" has the agent classify the problem type upfront and set constraints before exploring, cutting wasted tokens on dead-end paths. "Deterministic Replay" caches successful execution paths so repeat tasks skip the reasoning entirely, dropping token costs for repetitive workflows.

The New Stack argues that traditional CI pipelines are too slow for coding agents that iterate in seconds. Their proposed fix is a new primitive called "plans": small, agent-authored, end-to-end checks that run inside the agent's session against a real integration environment. The goal is collapsing the inner loop (local, fast, mocked) and outer loop (CI, slow, real) into a single validation path.

Cursor's engineering team shares a year of lessons from building cloud agents. The biggest finding: when a cloud agent produces worse output than a local one, the cause is almost always an incomplete development environment, not the model. Failures don't surface as errors. They show up as subtle quality degradation you might blame on the model when the real fix is giving the agent better tooling, dependencies, and network access.

ANALYSIS

A Noema essay maps nine distinct narratives about AI, arguing the phenomenon is too large for any single perspective to capture. The framework borrows from philosopher Timothy Morton's "hyperobjects" concept and organises the narratives in pairs: Builders vs Displaced, Geopolitical Hawks vs Power Critics, Disruptors vs Truth Defenders. Three more stand alone because they identify losses with no corresponding gains.

Oren Etzioni catalogues a dozen AI startups from Project Prometheus ($38B) to AMI Labs ($3.5B) that have raised $29 billion combined without shipping a commercial product. He calls them "Virgin Unicorns" and asks why sophisticated investors keep writing growth-stage cheques to pre-companies. The historical comparison isn't kind. Seven of the ten most-funded startups from the original dot-com era went bankrupt.

Benedict Evans back-tests AI job exposure predictions against a century of computing automation and finds the methodology broken. Accounting should have been destroyed by spreadsheets, databases, and ERPs, but CPAs kept multiplying. The Jevons paradox explains part of it: making analysis cheap means you do more analysis, not less. "Exposure to automation" might mean more work, not fewer workers.

Stack Overflow's blog argues that coding agents have shifted the bottleneck from writing code to reviewing it. One engineer was producing 7x the code of anyone on her team, all high quality, but the other six spent most of their time reviewing her output instead of writing their own. Smartsheet data shows enterprise automation intensity up 55% year-over-year. The workday hasn't grown. It's just denser.

Armin Ronacher (creator of Flask, now working on Pi at Earendil) describes a new failure mode in open source. Users run their bug observations through an AI, which expands the scope, adds confident but wrong root-cause analysis, and pastes in plausible-looking code references. The result poisons the triage pipeline because the agent picking up the issue treats the fabricated diagnosis as evidence and follows it down the wrong path.

George Hotz spent six months writing parts of tinygrad and reversing a USB chip with agents. Each time he suspected he could have done it better and faster manually. His argument: agents are statistical models that mimic the distribution of programming, producing output that breaks in ways getting harder to detect. High performers error-correct around them. Large organisations won't, and their average output will collapse.

TOOLS

A community-driven open-source project that catalogues AI model specifications, pricing, and capabilities across providers. Data is stored as TOML files organised by provider and model, validated by GitHub Actions, and served through a public API. Used internally by the opencode project and accepts contributions for new models and providers.

Microsoft released an open-source toolkit for governing autonomous AI agents, covering policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering. It maps directly to all ten risks in the OWASP Agentic Top 10. The repo is Python-based with 1,800+ stars. Worth a look if you're shipping agents into production and need a security baseline that isn't hand-rolled.

Reasonix is a terminal-first coding agent that talks directly to the DeepSeek API. The append-only loop is designed around DeepSeek's byte-stable prefix cache so long sessions hold 90%+ cache hit rates and input costs drop to roughly a fifth of normal. It defaults to V4-Flash for cheap iteration with a /pro command to lift individual turns to V4-Pro. MCP support, skill files, and session replay are built in.