The Department of Defence signed agreements with Nvidia, Microsoft, AWS, and Reflection AI to deploy their models on classified networks for operational use. The deals follow the Pentagon's dispute with Anthropic over military usage terms and reflect a deliberate strategy to diversify AI vendors. All four companies will operate at IL6 and IL7 security levels, the highest tiers for classified workloads.
Musk: xAI Trained Grok on OpenAI. White House Blocks Mythos.
Sarah Friar vs Altman on IPO timing. Alibaba cuts 98% of redundant tool calls. Emotion AI at work.
During his federal trial against OpenAI, Elon Musk confirmed that xAI used distillation techniques on OpenAI models to train Grok. Asked if that meant yes, he said "partly." The admission is notable because distillation undermines the compute advantage that frontier labs spend billions building. Until now, the practice was widely assumed but never publicly confirmed by a major lab founder.
Anthropic opened Claude Security to all Enterprise customers, putting Opus 4.7 to work scanning codebases for vulnerabilities and generating proposed fixes. The product sits below the Mythos-powered Project Glasswing tier but brings frontier-grade security scanning to a much wider set of organisations. Partners can integrate it through the Claude Platform or build on top of it directly.
Anthropic proposed expanding Mythos access from roughly 50 to 120 companies. The White House said no, citing security risks and concerns that Anthropic lacks enough compute to serve more entities without hampering government use. The model's ability to find and exploit software vulnerabilities has rattled agencies in recent weeks. Anthropic says conversations with the administration remain productive.
Meta acquired Assured Robot Intelligence, a startup building AI models that help robots understand and adapt to human behaviour in dynamic environments. The team, including co-founders Lerrel Pinto and Xiaolong Wang, joins Meta Superintelligence Labs. Meta's goal is to become the Android of humanoid robots, providing foundational hardware, sensors, and software that other companies build on top of.
ByteDance's drug discovery unit Anew Labs unveiled its first AI-designed therapeutic candidate, a small molecule targeting IL-17 for autoimmune diseases like psoriasis. The underlying framework, AnewOmni, was trained on over five million biomolecular complexes and achieves lab validation rates between 23% and 75%. The 36-person team across Shanghai, Singapore, and San Jose has four pipeline candidates so far.
A co-creator of EDEN shows their 2021 quantisation algorithm beats the newer TurboQuant on mean squared error and unbiased estimation. EDEN achieves this by analytically deriving an optimal scale factor after a random rotation step, reaching comparable or better accuracy with fewer bits per coordinate. The result matters for anyone compressing models, embeddings, or KV caches: sometimes the older, mathematically cleaner approach wins.
Meta Research introduces Agentic Self-Instruct, a method that trains AI agents to iteratively generate and refine training datasets like a data scientist would. The approach significantly outperforms traditional synthetic data generation, especially when the agent itself is meta-optimised. The system converts inference compute directly into higher-quality training data for complex reasoning tasks, offering a new path beyond simply scaling parameters.
Cursor's engineering team details how they approach the agent harness as an ambitious software product rather than a thin wrapper. The process combines vision-driven development with quantitative evals and real usage signals. When they get early access to new models, they spend weeks customising the harness to each model's strengths. Most improvements come from obsessively stacking small optimisations rather than waiting for step-change breakthroughs.
The UK AI Safety Institute reports that an early checkpoint of GPT-5.5 matches Anthropic's Mythos on its cyber evaluation suite, completing a multi-step corporate network attack that would take a human roughly 20 hours. This is the second model from a different developer to reach this level. The result suggests frontier cyber capability is a broad trend, not a single-model anomaly.
Researchers at Alibaba introduced Hierarchical Decoupled Policy Optimization, a reinforcement learning framework that trains agents to decide when to use tools versus relying on internal knowledge. Their Metis model reduced redundant tool invocations from 98% to just 2% while setting new accuracy benchmarks. The core problem they address: current agentic models have a metacognitive deficit, blindly calling APIs even when the answer is already in their parameters.
GitGuardian's security researchers discovered that LLMs leave detectable statistical patterns in the passwords they generate. Using a model built on century-old frequency analysis techniques, they identified 28,000 likely LLM-generated passwords in the wild. The patterns persist across different models and prompts. For security teams, this means LLM-generated credentials can be fingerprinted and attributed, adding a new dimension to secret detection.
Dave Rupert tested Chrome and Edge's new browser APIs for running small language models locally, including Prompt, Summariser, and Rewriter powered by Phi-4. The benefits are real: offline use, zero API costs, and full privacy. But SLMs introduce a non-deterministic layer into web standards that have always been deterministic. Mozilla flags model lock-in and monopolistic control as risks. Hardware requirements also create an equity gap for users without capable GPUs.
Shalom Yiblet predicted in 2023 that prompt engineering was temporary. Three years later, he reverses the call. The memorise-specific-phrasings era is over, but the discipline has evolved into context engineering, tool design, and evaluation frameworks. Better models enable more ambitious applications, and an agent that can mutate a database or message a customer has consequences a chatbot never did. The work is permanent because the stakes keep rising.
After the latest Big Tech earnings, Google's stock jumped 10% while Meta, Microsoft, and Amazon stalled. The thesis: Google has the most complete set of AI building blocks assembled over decades, from energy supply to custom TPUs to a global fibre network. Cloud revenue surged 68% in one quarter. If top models converge in quality, the advantage shifts from algorithms to delivery, and Google delivers at a scale nobody else matches.
A VC thesis built on Anthropic's Project Deal experiment, where 69 employees let Claude agents negotiate 186 real transactions. Users paired with the stronger model got better outcomes but never noticed the gap. The piece argues B2B commerce will go dark first, with agents handling discovery, negotiation, and execution end to end. Amazon, Meta, Microsoft, Salesforce, and Stripe have already joined the Universal Commerce Protocol council.
Emotion AI is moving from call centres and trucking into white-collar work. MorphCast analyses Zoom meetings in real time, Burger King is piloting an AI chatbot that scores employee friendliness, and Slack integrations now monitor message sentiment continuously. The EU banned workplace emotion AI last year, but the global market is expected to triple to $9 billion by 2030. Facial expressions predict emotions only about 35% of the time.
US power consumption has grown more in the past two years than the preceding fifteen combined, driven by AI data centres, manufacturing, and electrification. Solar hit a record 28% year-on-year increase in 2025 and battery capacity is set to double to 90 GW, but the EIA still projects only 4.6% generation growth over two years. Residential electricity rates are up over 40% since 2020. Texas alone is capturing 55% of new battery capacity.
A WSJ profile reveals the tension at the centre of OpenAI's IPO planning. CFO Sarah Friar privately walked back Altman's $1.4 trillion compute figure to $600 billion and has suggested waiting until 2027 to go public. OpenAI missed multiple revenue and user targets after Claude Code took developer share. Banks have told both Anthropic and OpenAI that whoever IPOs first gets to define the industry.
An experiment pitting three agent architectures against each other produced a counterintuitive result. A hub model that decomposed tasks and delegated to specialist models cost four times more and performed worse than a simple market where models bid on their own competence. The Coasean prediction that declining transaction costs would favour delegation turns out to need a Hayekian correction: local knowledge, distributed across many models, beats centralised planning.
A Claude Code plugin that analyses your project with a multi-agent pipeline and builds a knowledge graph of every file, function, class, and dependency. The output is an interactive dashboard you can pan, zoom, search, and explore. It supports structural views, business logic mapping, guided architecture tours, and fuzzy search. Works with Claude Code, Codex, Cursor, Copilot, and Gemini CLI.
An open-source harness that lets coding agents like Codex and Claude Code generate 3D CAD models from text descriptions. It exports to STEP, STL, 3MF, DXF, and GLB, includes a local CAD Explorer for inspecting geometry, and supports stable references so agents can make precise follow-up edits. Bundled skills cover CAD, URDF robot descriptions, motion planning, and manufacturing preflight checks.
Resemble AI released Chatterbox Turbo, a 350M-parameter text-to-speech model that runs 6x faster than real time on a single GPU with 75ms latency. It clones voices from five seconds of reference audio and ships with built-in PerTh watermarking for authentication. MIT-licensed and open source, it outperforms ElevenLabs in head-to-head comparisons on the same prompts and reference audio with zero prompt engineering.
The creator of Acai makes the case that AI coding agents go off the rails when specs live only in conversation context. YAML-based acceptance criteria survive context window resets, machine handoffs, and session kills. The workflow is specify, ship, review, iterate, with the spec file as the persistent contract between human intent and agent execution. The post includes comparisons with SpecKit, OpenSpec, Kiro, and Traycer.
A local-first, web-deployable design tool that auto-detects 11 coding agent CLIs on your PATH and turns them into a design engine. It ships with 31 composable skills and 72 brand-grade design systems. Unlike Claude Design, it runs locally, deploys to Vercel, and stays bring-your-own-key at every layer. The agent follows a structured workflow: question form, visual direction, live plan, filesystem scaffolding, five-dimensional self-critique, and sandboxed artifact render.