Evaluation reports

We conduct evaluations of the autonomous capabilities of frontier AI models, with some in partnership with AI developers such as Anthropic and OpenAI. We do this both to understand the models' capabilities and to pilot third-party evaluator arrangements.

METR does not accept compensation for this work.

Companies such as OpenAI and Anthropic have provided access and compute credits to support evaluation research. We also occasionally evaluate models independently after they are released, without involvement from the model's developers. Recent public reports resulting from this work are above, with additional discussion in the respective system cards.