Microsoft's Bing team released Harrier under an MIT licence: three embedding models (27B, 0.6B, 270M) that rank first on multilingual MTEB v2. All share a 32k-token context window and support 100+ languages. Trained on over two billion examples, Harrier beats proprietary alternatives from OpenAI and Amazon. It is the first open-source model to claim the top position on the benchmark.
Perplexity Hits $450M ARR + Claude Managed Agents Ship
Microsoft open-sources Harrier; Vercel says 30% of deploys now come from coding agents
Anthropic shipped a suite of composable APIs that handle sandboxed execution, checkpointing, credential management, and scoped permissions for cloud-hosted agents. Long-running sessions persist through disconnections, and a multi-agent coordination layer lets agents spin up and direct other agents in parallel. The goal is to collapse months of infrastructure work into days. Multi-agent coordination is available in research preview.
Amazon's CEO argues AI adoption is moving ten times faster than electricity did and rejects bubble comparisons. AWS's AI revenue run rate crossed $15 billion in Q1 2026, 260 times larger than AWS itself at the same stage. Jassy positions Amazon's breadth, from Trainium custom silicon to the Strands agent framework, as the differentiator. He predicts every customer experience will be reinvented by AI.
Perplexity's estimated annual recurring revenue rose to over $450 million in March, driven by its new Computer agent tool and a shift to usage-based pricing. The startup now has more than 100 million monthly active users across search and agent products. The pivot marks a strategic retreat from its chatbot-style search engine toward AI agents that act on users' behalf, with subscription tiers ranging from $20 to $200 monthly.
Anthropic's technical deep-dive explains the OS-inspired design behind Claude Managed Agents. The architecture separates three components: the brain (Claude model), the hands (sandboxes and tools), and the session (event log). Each can be independently scaled, swapped, or upgraded without touching the others. The decoupling delivers lower time-to-first-token latency and lets security boundaries sit between components rather than around the whole system.
Vercel reports that coding agents now drive over 30% of its weekly deployments, a 1,000% increase from six months ago. Claude Code accounts for 75% of agent-initiated deploys. Projects deployed by agents are 20 times more likely to call AI inference providers than human-deployed ones. Vercel argues that immutable deployments, preview URLs, and instant rollbacks are no longer developer experience upgrades but prerequisites for machine-driven development.
Meta spent years running a diverged WebRTC fork that powered Messenger, Instagram video, cloud gaming, and VR casting. As upstream evolved, merging became prohibitively expensive. The engineering team built a dual-stack architecture that runs the legacy and upstream versions simultaneously inside a single library, enabling A/B testing across 50+ use cases before rolling out changes to billions of users.
A new Apple paper proposes LaCy, a pretraining method that teaches small language models which tokens to predict themselves and which to delegate to a larger model via a special CALL token. Using a spaCy grammar parser to distinguish factual errors from acceptable alternative continuations, LaCy-trained models achieve higher FactScores when paired with a bigger model in a cascade, outperforming loss-based baselines while being simpler and cheaper.
HeyGen published the architecture behind Avatar V, its latest avatar video generation system. The model conditions on the full token sequence of a reference video at every transformer layer, capturing both static identity (facial geometry, skin texture) and dynamic patterns (talking rhythm, micro-expressions). Sparse Reference Attention scales with reference length without the fidelity loss of low-dimensional bottlenecks. No identity-specific fine-tuning is needed at inference time.
Tunguz classifies work along two axes: whether demand is infinite or finite, and whether the feedback loop is open or closed. Software engineering sits in the high-value quadrant (infinite demand, closed loop) where AI automates and scales. Marketing and content creation land in the open-loop quadrant where humans still judge correctness. GitHub's numbers underline the scale: commits are on pace for 14 billion this year, up from 1 billion in 2025.
A 15-year engineering manager documents a paradox: developers with AI tools finish tasks faster but report higher exhaustion. The culprit is decision fatigue from continuous micro-evaluations, accept or reject each autocomplete suggestion, rewrite the prompt or regenerate output, with no reflection pauses between them. The author calls it "AI brain fry" and recommends tracking cognitive load alongside velocity as a team health metric.
The Jazzband Python collective shut down this year, citing unsustainable volumes of AI-generated spam. Curl creator Daniel Stenberg cancelled bug bounty programmes for the same reason. The core problem is throughput asymmetry: agents made code generation dramatically cheaper, but review and validation have not sped up at all. The article argues enterprise teams running internal coding agents will face the same bottleneck within months.
The bottleneck in AI-assisted work has shifted from creation to validation. Coding agents, document generators, and AI co-workers have made output cheap, but checking whether that output matches intent remains slow and expensive. The deeper problem is that most people lack the upfront clarity to validate agent work even in principle. When a designer used to spend days thinking through edge cases, that effort was the specification.
AWS engineer and Kiro contributor Marc Brooker argues that spec-driven development iterates on the specification, not the implementation. Specifications are living, versioned artefacts that flow into code, not static documents produced upfront and thrown over the wall. AI accelerates the cycle because generating code from a revised spec costs almost nothing. The distinction matters as tools like Kiro bring specification-first workflows to more teams.
Deep Agents Deploy bundles agent orchestration, sandboxes, memory, and 30+ API endpoints into a single deployment command. It is model-agnostic, working with OpenAI, Google, Anthropic, and local models via Ollama. The pitch is ownership: by choosing an open harness, builders own their agent memory rather than having it locked into a proprietary system. Sandbox integrations include Daytona, Runloop, and Modal out of the box.
An open-source project used spectral analysis to reverse-engineer the invisible watermark Google embeds in every Gemini-generated image. The team built a detector that identifies SynthID watermarks with 90% accuracy and a multi-resolution bypass that drops 75% of carrier energy while preserving image quality at 43+ dB PSNR. The key finding: SynthID's carrier frequencies shift with image resolution, so a one-size codebook does not work.
Clicky is a macOS app that puts an AI tutor beside your cursor. It can see your screen, talk to you, and point at elements on the display. The open-source version includes a Cloudflare Worker proxy for API keys and supports Anthropic, AssemblyAI, and ElevenLabs backends. The quickest setup path is pasting a single Claude Code prompt that handles cloning, configuration, and Xcode build automatically.