Shared components of AI lab commitments to evaluate and mitigate severe risks.
External review from METR of Anthropic's Summer 2025 Sabotage Risk Report
Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.
How we think about tradeoffs when communicating surprising or nuanced findings.
Current views on information relevant for visibility into frontier AI risk.
Suggested priorities for the Office of Science and Technology Policy as it develops an AI Action Plan.
Why legible and faithful reasoning is valuable for safely developing powerful AI
List of frontier safety policies published by AI companies, including Amazon, Anthropic, Google DeepMind, G42, Meta, Microsoft, OpenAI, and xAI.
Why pre-deployment testing is not an adequate framework for AI risk management
Red-teaming and security suggestions regarding proposed rule by the Bureau of Industry and Security, “Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters.”
Funding for Canary will enable research and implementation at scale
Suggestions for expanded guidance on capability elicitation and robust model safeguards in the U.S. AI Safety Institute’s draft document “Managing Misuse Risk for Dual-Use Foundation Models” (NIST AI 800-1).
Comments on NIST’s draft document “AI Risk Management Framework: Generative AI Profile.”
METR is hiring ML engineers and researchers.
Emma moves from President to Executive Director, Beth moves to Head of Research.
A summary of what METR accomplished in 2023 – our first full year of operation.
METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.
ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.
We describe the basic components of Responsible Scaling Policies (RSPs) as well as why we find them promising for reducing catastrophic risks from AI.
ARC Evals plans to spin out from the Alignment Research Center (ARC) in the coming months, and become its own standalone organization.
Input to NTIA’s AI Accountability Policy Request for Comment.
No posts found matching your search.