RL gets practical with GRPO++, Eurostar chatbot fails badly, and AI VC money concentrates.
One analysis yesterday framed Claude's code base as MS-DOS, posing the question of who builds the 'Windows' for the LLM era. For those tackling reinforcement learning, GRPO++ brings practical tricks to make RL actually work. Plus, a Eurostar chatbot demonstrated how quickly LLMs can go off-script, a critical lesson for anyone shipping agentic features.
OpenAI reports that 40 million people globally use ChatGPT daily for health information, accounting for around 5% of all messages. Users apply the AI for symptom triage, billing decoding, and insurance appeals.
At CES 2026, Ludens AI unveiled Cocomo and INU, new AI companions. CES 2026 demos emphasize presence, personality, and memory as the product surface, not productivity. Cocomo is an evolving robot, while INU is a smaller, expressive 'desktop alien dog'.
Group Relative Policy Optimization (GRPO) for reasoning LLMs faces issues like entropy collapse and reward noise. The article details techniques such as Decoupled Advantage Policy Optimization (DAPO) and Truncated Importance Sampling (TIS) to improve training stability and efficiency, exemplified by Olmo 3. TIS used in DAPO shows performance gains.
Eurostar's AI chatbot had four issues: guardrail bypass, prompt injection for system prompt disclosure, HTML injection, and manipulated chat history leading to server-side guardrail bypass. The chatbot was not connected to customer information databases, reducing direct data exposure risk.
A developer built a setup to run multiple Claude agents in parallel from an iPhone. This involves kicking off Claude Code on a Vultr cloud VM via Tailscale, getting push pings when it needs input, and responding directly from their phone, enabling asynchronous development.
Claude Code, while useful for developers with its file system access and autonomous agency, is too low-level for mass adoption, akin to MS-DOS. Its value comes from accessing full project context, but its command-line interface limits reach. The missing layer is a GUI that exposes repo-scoped permissions, safe tool invocation, and reversible actions for non-CLI users.
Draft AI companion regulations in China propose separate consent for user data training and adherence to 'core socialist values'. Industry feedback historically changes drafts; the author expects these stringent provisions will be softened (speculation).
SVB's H2 2025 State of the Markets report reveals AI companies are less efficient, showing lower revenue per employee and worse profit margins despite significant capital. VC funding is increasingly concentrated in mega-deals, and startup graduation rates have plummeted.
A Principal Engineer argues agentic AI tools create an unprecedented opportunity for senior engineers to reach staff/principal roles. AI can accelerate learning, boost visibility through rapid content generation, and refine strategic planning.
An analysis explores the tension between AI's ability to create personalized content at scale and the enduring value of human-produced content for shared experiences and community. It argues that new jobs and human desires will ensure human labor persists, countering predictions of extreme wealth concentration.
Zapier CEO Wade Foster detailed his executive AI workflows, sharing how he uses tools like Grok and Zapier agents. He applies AI to analyze company culture, and uses an agent that scores interview transcripts against job rubrics and company values, then flags discrepancies.
NVIDIA published a step-by-step guide to building a multimodal AI agent using their open models, Nemotron, Isaac GR00T, and Cosmos, alongside the Reachy Mini robot. The instructions detail how to replicate a voice plus vision loop, reasoning, tool orchestration, and robot interaction primitives using the NeMo Agent Toolkit for orchestration and Pipecat for real-time voice and vision.