Risk Assessment

Frontier Risk Report
(February to March 2026)

A pilot assessment of rogue deployment risk at frontier AI companies.

Non-public information and model access provided by:

Read report

Reviews of AI developers' risk assessments

Review of the "Risks from automated R&D" section in the Anthropic Risk Report (February 2026)

May 8, 2026

External review from METR of the "Risks from automated R&D" section in Anthropic's February 2026 Risk Report

Red-Teaming Anthropic's Internal Agent Monitoring Systems

March 26, 2026

A METR staff member spent three weeks red-teaming a subset of Anthropic's internal agent monitoring and security systems, discovering several novel vulnerabilities.

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

March 12, 2026

External review from METR of Anthropic's Sabotage Risk Report for Claude Opus 4.6

Review of the Anthropic Summer 2025 Pilot Sabotage Risk Report

October 28, 2025

External review from METR of Anthropic's Summer 2025 Sabotage Risk Report

Summary of our gpt-oss methodology review

October 23, 2025

Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.

Evaluation reports

We conduct evaluations of the autonomous capabilities of frontier AI models, with some in partnership with AI developers such as Anthropic and OpenAI. We do this both to understand the models' capabilities and to pilot third-party evaluator arrangements.

GPT-5.6 Sol

June 26, 2026 •
Partnership

GPT-5.1-Codex-Max

November 19, 2025 •
Partnership

GPT-5

August 7, 2025 •
Partnership

DeepSeek and Qwen

June 27, 2025 •
No company involvement

OpenAI o3 and o4-mini

April 16, 2025 •
Partnership

Claude 3.7

April 4, 2025 •
Partnership

DeepSeek-R1

March 5, 2025 •
No company involvement

GPT-4.5

February 27, 2025 •
Partnership

DeepSeek-V3

February 12, 2025 •
No company involvement

Claude 3.5 Sonnet and o1

January 31, 2025 •
Partnership

Claude 3.5 Sonnet (original)

October 30, 2024 •
Partnership

o1-preview

September 12, 2024 •
Partnership

GPT-4o

August 7, 2024 •
Partnership

GPT-4 and Claude

March 17, 2023 •
Partnership

METR does not accept compensation for this work.

Companies such as OpenAI, Anthropic, and xAI have provided access and compute credits to support evaluation research. We also occasionally evaluate models independently after they are released, without involvement from the model's developers. Recent public reports resulting from this work are above, with additional discussion in the respective system cards.