Reviews of AI developers' risk assessments
Evaluation reports
We conduct evaluations of the autonomous capabilities of frontier AI models, with some in partnership with AI developers such as Anthropic and OpenAI. We do this both to understand the models' capabilities and to pilot third-party evaluator arrangements.
GPT-5.1-Codex-Max
November 19, 2025
•
Partnership
GPT-5
August 7, 2025
•
Partnership
DeepSeek and Qwen
June 27, 2025
•
No company involvement
OpenAI o3 and o4-mini
April 16, 2025
•
Partnership
Claude 3.7
April 4, 2025
•
Partnership
DeepSeek-R1
March 5, 2025
•
No company involvement
GPT-4.5
February 27, 2025
•
Partnership
DeepSeek-V3
February 12, 2025
•
No company involvement
Claude 3.5 Sonnet and o1
January 31, 2025
•
Partnership
Claude 3.5 Sonnet (original)
October 30, 2024
•
Partnership
o1-preview
September 12, 2024
•
Partnership
GPT-4o
August 7, 2024
•
Partnership
GPT-4 and Claude
March 17, 2023
•
Partnership
METR does not accept compensation for this work.