AI companies and wider society want to understand the capabilities of frontier AI systems, and what risks they pose.
METR researches, develops and runs cutting-edge tests of AI capabilities, including broad autonomous capabilities and the ability of AI systems to conduct AI R&D.
Top resources
Evaluation reports
We have worked with Anthropic, OpenAI, and other companies to conduct preliminary evaluations of the autonomous capabilities of several frontier AI models. We do this both to understand the capabilities of frontier models and to pilot third-party evaluator arrangements. Recent public reports resulting from this work are below, with additional discussion in the respective system cards.
Partnerships
In addition to evaluations of new models, discussed above, these companies have also provided access and compute credits to support evaluation research.
We think it’s important for there to be third-party evaluators with formal arrangements and access commitments — both for evaluating new frontier models before they are scaled up or deployed, and for conducting research to improve evaluations. We do not yet have such arrangements, but we are excited about taking more steps in this direction.
We are also partnering with the UK AI Safety Institute and are part of the NIST AI Safety Institute Consortium.