METR

Model Evaluation & Threat Research

AI companies and wider society want to understand the capabilities of frontier AI systems, and what risks they pose.

METR researches, develops and runs cutting-edge tests of AI capabilities, including broad autonomous capabilities and the ability of AI systems to conduct AI R&D.

Top resources

RE-Bench — benchmark and paper tracking automation of AI R&D

Measuring the performance of humans and AI agents on day-long ML research engineering tasks

Read More
Measuring autonomous AI capabilities — resource collection

An index of our research and guidance on how to measure AI systems' ability to autonomously complete a wide range of multi-hour tasks

Read More
Vivaria — our evaluation platform

Open source platform for developing and running agent evaluations

Read More
Frontier AI safety policies — summary of common elements

How frontier AI companies are using evaluations to understand and manage emerging risks from their systems

Read More

Evaluation reports

We have worked with Anthropic, OpenAI, and other companies to conduct preliminary evaluations of the autonomous capabilities of several frontier AI models. We do this both to understand the capabilities of frontier models and to pilot third-party evaluator arrangements. Recent public reports resulting from this work are below, with additional discussion in the respective system cards.

OpenAI o1-preview and o1-mini
Read More
Claude 3.5 Sonnet (June 2024 version)
Read More
GPT-4o
Read More

Partnerships

In addition to evaluations of new models, discussed above, these companies have also provided access and compute credits to support evaluation research.

We think it’s important for there to be third-party evaluators with formal arrangements and access commitments — both for evaluating new frontier models before they are scaled up or deployed, and for conducting research to improve evaluations. We do not yet have such arrangements, but we are excited about taking more steps in this direction.

We are also partnering with the UK AI Safety Institute and are part of the NIST AI Safety Institute Consortium.

Recent updates