How We Protect Confidential Information

METR works with AI developers, governments, and other research organizations who sometimes provide nonpublic model access and proprietary information. Over time, we’ve developed confidentiality and security measures to protect such access and information. This post describes our approach at a high level.

Confidentiality measures

Our confidentiality policy, setup, and norms primarily address the risk of leaks during conversation and in infrastructure, though they also reduce insider threat risk by limiting who knows what.

Policy

Our confidentiality policy assigns information—including (but not limited to) nonpublic access, lab relationships, policy work, and funding—to our six confidentiality levels, ranging from public to internally siloed, based on sensitivity. At the most restricted end, information about nonpublic models (including capabilities, evaluation timelines, and which developer we’re working with) is limited to researchers directly involved and discussed only by codename. Our own methodology, tasks, and infrastructure are available more broadly within METR, and much of this work is eventually published.

Our policy also provides standard responses for sensitive questions, guidance on edge cases, quick rules of thumb with examples and FAQs, and possible slip-ups to watch out for.

Don’t comment on labs based on non-public info. Any comments […] should be rigorously substantiated by public information only and caveated as such.

OK to talk about specific points […], but please do not make overall / blanket statements. e.g.:

OK to say “[Lab A]’s FSP doesn’t contain X component which we recommend”, […] or “METR found [Lab C’s] model had higher performance on general autonomy tasks than [Lab D’s] model”

Please avoid saying “[Lab A] has nothing that comes close to this level of capability”, or “No lab currently has anything close to being able to do this task”, except in official communication.

Excerpt from our confidentiality policy's section 2.d.i., which is about commenting on AI developers.

Flagging easy places to slip up:

Sensitive information in calendar event names (visible on room booking displays)

Confirming we don't have access to something (which reveals information by exclusion)

Inadequate soundproofing for sensitive discussions

Forecasting on public prediction markets about topics we may have inside knowledge of

[...]

Paraphrased excerpt from our confidentiality policy's section on easy places to slip up

As part of our onboarding process we conduct 1-1 confidentiality training that includes live mock questioning to practice responding to sensitive questions in realistic settings. We also conduct background and reference checks during hiring.

Setup

These six confidentiality levels are used as prefixes across Slack, documents, and other platforms, so confidentiality expectations are visible without relying on memory. Technical controls prevent accidental sharing; for example, documents can’t be shared externally without explicit marking, and channel membership is centrally managed. Even within METR, we do project-specific siloing for sensitive technical and policy projects. For example, to preserve confidentiality around nonpublic model access:

Only researchers involved in a model’s evaluation can generate and see the model’s completions.
When a lab gives us access to a non-public model, we generate an animal codename (e.g. “playful-panda”), and all discussion and references to the model within METR use this codename.
Although our agents are public and our analysis pipeline code is accessible to core team members, for each evaluation we create a secret fork of these repos only accessible by people working on the evaluation.

These steps help prevent inadvertent leaks and limit exposure if our infrastructure were compromised.

Norms

Our confidentiality setup enforces some constraints, but we maintain additional norms to reduce the risk of slips in conversation. For example:

We use codenames (like “playful-panda”) for all nonpublic model access even in conversations where both parties know the identity, and we default to saying “[playful-panda] lab” rather than using the developer name.
We actively recognize people for handling confidentiality carefully and encourage staff to flag potential lapses in a dedicated Slack channel.
We maintain a log of slips and near-misses and do retros for them.

Security measures

The table below summarizes our main security controls that protect against breaches and help limit damage from insider threats. These measures, alongside others, contributed to our SOC 2 Type I certification.

IAM	A central identity provider permits only FIDO2 authentication. Access to most services is further restricted to logins from preapproved devices. Model access is protected by layered access controls: our VPN requires SSO authentication to reach internal infrastructure, API endpoints are scoped within private VPCs, and our internal proxy enforces per-request token-based authorization. Within our evaluation platform, access controls ensure researchers can only view transcripts from models they are authorized to access. Dedicated admin accounts are enrolled in Google’s Advanced Protection Program.
Endpoints	Mobile device management (MDM) enforces security measures and automatic updates on METR computers. An application allowlist restricts what can run on METR computers. Access to METR systems requires using a dedicated Chrome profile, so we can enforce an extension allowlist and additional hardening. For certain sensitive materials, Google Workspace's client-side encryption ensures content is decrypted only on authorized endpoints.
Monitoring & Response	Logs from most platforms are streamed to a SIEM, where default and custom detection rules monitor for indicators of account compromise, unauthorized data access, and anomalous endpoint behavior. Automated workflows help handle alerting and response, like requesting confirmation of new logins, scanning binaries, and escalating alerts that remain unresolved for ≥12 hours.
Governance & Assurance	External cybersecurity experts provide ongoing feedback on our security posture, and we conduct security testing and audits at least annually. Periodic access reviews backstop standard provisioning and deprovisioning, though the latter handle the vast majority of changes.

For questions about our measures, contact security@metr.org.

The measures described above are accurate as of February 17, 2026 and are subject to change.

CONTRIBUTORS

DATE

Confidentiality measures

Policy

Setup

Norms

Security measures

Featured research

GPT-5.1 Evaluation Results

Measuring AI Ability to Complete Long Tasks

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

CONTRIBUTORS

DATE

SHARE

Confidentiality measures

Policy

Setup

Norms

Security measures

Featured research

GPT-5.1 Evaluation Results

Measuring AI Ability to Complete Long Tasks

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity