xAI released standalone Grok Speech-to-Text and Text-to-Speech APIs, built on the stack powering Grok Voice and Tesla vehicles. STT supports 25+ languages with word-level timestamps and speaker diarisation at $0.10 per hour in batch mode. TTS provides natural voices with prosody control at $4.20 per million characters. Both are available immediately through the xAI API.
Tim Cook Steps Down + Google's Anthropic Strike Team
Amazon commits $25B to Anthropic. Kimi K2.6 open-sources 1T params. Deezer drowns in AI music.
Meta plans to lay off 8,000 employees by May 20 while redirecting billions toward AI infrastructure, with $115 to $135 billion projected for 2026. Teams are reorganising into AI-focused pods under a new Chief AI Officer. The restructuring signals that Meta views its AI pivot as urgent enough to cut staff while simultaneously scaling compute investment to unprecedented levels.
After 15 years leading Apple from post-Jobs uncertainty to a $4 trillion valuation, Tim Cook will step down as CEO on September 1. John Ternus, the senior VP of hardware engineering who oversaw the M-series chip transition, takes over. Cook becomes executive chairman. The move ends one of the longest and most consequential CEO tenures in tech, with revenue quadrupling under his watch.
Amazon will invest $5 billion in Anthropic now, with up to $20 billion more tied to commercial milestones. In return, Anthropic committed to spending over $100 billion on AWS technologies over the next decade, including current and future Trainium generations. Anthropic secures up to 5 gigawatts of compute capacity for training Claude, with significant Trainium3 capacity coming online this year.
Google is in talks with Marvell Technology to develop two new custom AI chips, a memory processing unit and an inference-optimised TPU, adding a third design partner alongside Broadcom and MediaTek. The discussions reflect Google's shift toward inference as the dominant compute cost. No contract has been signed, but the move came days after Broadcom locked in a through-2031 TPU agreement.
AI-generated tracks now represent 44% of all new music uploaded to Deezer, roughly 75,000 tracks per day and over two million per month. Consumption remains low at 1 to 3% of total streams, with 85% of those streams detected as fraudulent and demonetised. Deezer has stopped storing hi-res versions of AI tracks and removes them from algorithmic recommendations and editorial playlists.
Moonshot AI released Kimi K2.6 as open weights, a 1T-parameter mixture-of-experts model with 32B activated parameters. It matches GPT-5.4 and Claude Opus 4.6 on coding benchmarks, scoring 58.6 on SWE-Bench Pro. The headline feature is Agent Swarm, running up to 300 sub-agents executing 4,000 coordinated steps across tasks like web research, document analysis, and autonomous code generation.
Google DeepMind has assembled a strike team led by Sebastian Borgeaud to improve Gemini's coding abilities, prompted by an internal assessment that Anthropic's tools are better. In a memo, Sergey Brin wrote that Google must "urgently bridge the gap in agentic execution." Coding agents write about 50% of Google's code, compared to Anthropic's claimed near-100%. The end goal is AI that can improve itself.
Cloudflare built an AI code review system using up to seven specialised agents, each focused on a domain like security or performance, coordinated by a master agent. The plugin-based architecture with model tiering has processed tens of thousands of merge requests. The multi-agent approach delivers more targeted feedback than monolithic models and maintains accuracy at lower cost per review.
Cloudflare assembled its internal AI engineering stack entirely on its own platform, using AI Gateway for routing, Workers AI for inference, and MCP Server Portals for tool discovery. The system includes a proxy Worker for authentication and AGENTS.md files for knowledge management via Backstage. Engineering teams report over 50% velocity improvements since adopting the internal toolchain.
Intercom doubled merged PRs per R&D employee in nine months after going all-in on Claude Code. All engineers, plus designers, PMs, and TPMs, now ship code through it. The team built telemetry infrastructure to measure AI adoption quality, a skills repository with hooks enforcing engineering standards, and a permission framework that enabled rapid rollout across hundreds of engineers.
GitHub designed its agentic workflow security architecture around the assumption that the AI agent is already compromised. The system handles prompt injection risks by treating all agent inputs as untrusted, isolating secrets from agent-accessible environments, and layering permissions so a manipulated agent cannot escalate access. ByteByteGo breaks down why traditional CI/CD trust models break when agents make runtime decisions.
AI coding agents can read every file in your repo but still produce generic output because they lack design context: how the product behaves, which interaction patterns it rejects, what makes a screen feel on-brand. The author built a structured Claude Code skill encoding brand language, component patterns, and visual rules. First-pass output stopped looking like every other B2B SaaS.
Token-based pricing for LLM outputs means users pay regardless of whether the result is useful. The author argues this resembles a slot machine: nondeterminism makes every response different, and the occasional impressive result keeps users pushing the button through failures. Even subscription models obscure usage limits, making actual cost per useful output nearly impossible to predict.
Craig McLuckie and Joe Beda, the Kubernetes creators, are applying their enterprise-infrastructure playbook to agentic AI through Stacklok. Their thesis: the biggest problem in enterprise AI is accountability, not model quality. An agent cannot be held accountable for its work, so the industry needs the same kind of boring, reliable abstraction layer that Kubernetes brought to container orchestration.
Wardgate sits between AI agents and external services, isolating credentials for API calls and gating command execution in secure enclaves called conclaves. Agents get access to APIs, SSH keys, and shell tools without ever seeing the actual credentials. The project addresses prompt injection risks where a manipulated agent could exfiltrate secrets or run destructive commands on the host system.
Hand-written CUDA kernels push Qwen3.5-27B to 207 tokens per second on a single RTX 3090, a 5.46x speedup over autoregressive inference. The project combines DFlash speculative decoding with a DDTree budget of 22 and Q4_K_M quantisation. A companion megakernel runs the smaller Qwen3.5-0.8B in a single CUDA dispatch, matching Apple silicon efficiency at double the throughput.
Trail of Bits released a sandboxed devcontainer for running Claude Code with bypassPermissions safely enabled. The container provides filesystem isolation so agents can modify code freely without risking the host system. Designed for security audit workflows, untrusted repositories, and experimental work, it supports multi-repo engagements and ships with optimised Docker and Apple Silicon configurations.
Vercel Labs released Portless, which replaces port numbers with stable, named local URLs for both humans and coding agents. Instead of remembering localhost:3000, developers and agents get consistent, readable addresses that persist across sessions. The project gathered 7,000 GitHub stars in its first week. Built in TypeScript, it works with any local development server.