5.6.26
- ProgramBench Moves the Goalposts for Coding Agents 🧪🏗️📈: ProgramBench tests whether agents can build whole programs from a target executable and usage docs, with 200 tasks, behavioral tests, no internet access, and anti-cheating constraints. (The official best score is still 0%, which is exactly the point) — J. Yang on X
- Vercel Ships a Security Harness for Coding Agents 🧪🔐⌨️: Vercel introduced deepsec, an open source coding-security harness built CLI-first, sandbox-based, and pluggable across coding agents for large-scale repos — Vercel Developers on X
- Anthropic Packages Finance Agents With Microsoft and Moody’s 💼📊🧠: Anthropic released ten ready-to-run agent templates for financial services, covering pitchbooks, KYC screening, valuation review, month-end close, and market research. Each bundles skills, connectors, and subagents, with Microsoft 365 add-ins and a Moody’s MCP app for interactive credit-data workflows — Anthropic