METR

METR is a research nonprofit that works on assessing whether cutting-edge AI systems could pose catastrophic risks to society.

We build the science of accurately assessing risks, so that humanity is informed before developing transformative AI systems.

AI’s Transformative Potential

We believe that AI could change the world quickly and drastically, with potential for both enormous good and enormous harm. We also believe it’s hard to predict exactly when and how this might happen.

At some point, AIs will probably be able to do most of what humans can do, including developing new technologies; starting businesses and making money; finding new cybersecurity exploits and fixes; and more.

We think it’s very plausible that AI systems could end up pursuing goals that are at odds with a thriving civilization. This could be due to someone’s deliberate effort to cause chaos¹, or happen despite the intention to only develop AI systems that are safe².

Given how quickly things could play out, we don’t think it’s good enough to “wait and see” whether there are dangers.

We believe in vigilantly, continually assessing risks. If an AI brings significant risk of a global catastrophe, the decision to develop and/or release it can’t lie only with the company that creates it.

Partnerships

We currently partner with Anthropic and OpenAI to develop evaluations using their AI systems, and are exploring partnerships with other AI companies as well.

We are also partnering with the UK AI Safety Institute, and are part of the NIST AI Safety Institute Consortium.

Our Work

Our research is focused on how to evaluate AI systems for dangerous capabilities:

Autonomy Evaluation Resources

These are resources to evaluate AI systems for risks from autonomous capabilities. They include a suite of tasks, software tooling, and an example overall evaluation protocol.

Task Standard

A standard way to define tasks for evaluating the capabilities of AI agents.

Responsible Scaling Policies (RSPs)

A proposal for evals-based catastrophic risk reduction that AI developers can adopt.

Blog

07 February 2024

2023 Year In Review

A summary of what METR accomplished in 2023 – our first full year of operation.

16 December 2023

Bounty: Diverse hard tasks for LLM agents

METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.

04 December 2023

ARC Evals is now METR

ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.

19 September 2023

ARC Evals is spinning out from ARC

ARC Evals plans to spin out from the Alignment Research Center (ARC) in the coming months, and become its own standalone organization.

31 July 2023

New report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

We have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild.

17 March 2023

Update on ARC's recent eval efforts

More information about ARC's evaluations of GPT-4 and Claude

For instance, an anonymous user set up ChatGPT with the ability to run code and access the internet, prompted it with the goal of ”destroy humanity,” and set it running. ↩
There are reasons to expect goal-directed behavior to emerge in AI systems, and to expect that superficial attempts to align ML systems will result in sycophantic or deceptive behavior - “playing the training game” - rather than successful alignment. ↩

METR

Model Evaluation and Threat Research

AI’s Transformative Potential

Partnerships

Our Work

Blog