Evaluation reports
We conduct evaluations of the autonomous capabilities of frontier AI models, with some in partnership with AI developers such as Anthropic and OpenAI. We do this both to understand the models' capabilities and to pilot third-party evaluator arrangements.
GPT-5.1-Codex-Max
19 November 2025
•
Partnership
GPT-5
7 August 2025
•
Partnership
DeepSeek and Qwen
27 June 2025
•
No company involvement
OpenAI o3 and o4-mini
16 April 2025
•
Partnership
Claude 3.7
4 April 2025
•
Partnership
DeepSeek-R1
5 March 2025
•
No company involvement
GPT-4.5
27 February 2025
•
Partnership
DeepSeek-V3
12 February 2025
•
No company involvement
Claude 3.5 Sonnet and o1
31 January 2025
•
Partnership
Claude 3.5 Sonnet (original)
30 October 2024
•
Partnership
o1-preview
12 September 2024
•
Partnership
GPT-4o
7 August 2024
•
Partnership
GPT-4 and Claude
17 March 2023
•
Partnership
METR does not accept compensation for this work.