动态 - METR

本页由 AI 翻译，可能有错误或不自然的地方。不清楚时，请参阅英文原文。

How independent researchers could investigate AI propensities after misalignment incidents

2026年7月28日

AI agents sometimes take sophisticated actions in violation of human intent. We outline the questions that thorough external investigations of these behaviors should answer, the access this might require, and how the resulting findings should be shared.

阅读英文原文

Summary of METR's predeployment evaluation of GPT-5.6 Sol

2026年6月26日

A summary of METR's independent, predeployment evaluation of GPT-5.6 Sol

阅读英文原文

Review of the "Risks from automated R&D" section in the Anthropic Risk Report (February 2026)

2026年5月8日

External review from METR of the "Risks from automated R&D" section in Anthropic's February 2026 Risk Report

阅读英文原文

Red-Teaming Anthropic's Internal Agent Monitoring Systems

2026年3月26日

A METR staff member spent three weeks red-teaming a subset of Anthropic's internal agent monitoring and security systems, discovering several novel vulnerabilities.

阅读英文原文

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

2026年3月12日

External review from METR of Anthropic's Sabotage Risk Report for Claude Opus 4.6

阅读英文原文

How We Protect Confidential Information

2026年2月17日

Our high-level approach to protecting confidential access and information

阅读英文原文

Common Elements of Frontier AI Safety Policies (December 2025 Update)

2025年12月9日

Shared components of AI lab commitments to evaluate and mitigate severe risks.

阅读英文原文

Review of the Anthropic Summer 2025 Pilot Sabotage Risk Report

2025年10月28日

External review from METR of Anthropic's Summer 2025 Sabotage Risk Report

阅读英文原文

Summary of our gpt-oss methodology review

2025年10月23日

Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.

阅读英文原文

Notes on Scientific Communication at METR

2025年8月12日

How we think about tradeoffs when communicating surprising or nuanced findings.

阅读英文原文

What should companies share about risks from frontier AI models?

2025年6月27日

Current views on information relevant for visibility into frontier AI risk.

阅读英文原文

Response to OSTP on AI Action Plan

2025年3月15日

Suggested priorities for the Office of Science and Technology Policy as it develops an AI Action Plan.

阅读英文原文

为什么 AI 推理应当可读，并如实反映模型的实际决策过程

2025年3月11日

为什么可读且忠实的推理有助于安全开发强大的 AI

Frontier AI Safety Policies

2025年2月8日

List of frontier safety policies published by AI companies, including Amazon, Anthropic, Google DeepMind, G42, Meta, Microsoft, OpenAI, and xAI.

阅读英文原文

AI models can be dangerous before public deployment

2025年1月17日

Why pre-deployment testing is not an adequate framework for AI risk management

阅读英文原文

Response to Bureau of Industry and Security’s proposed AI reporting requirements

2024年10月11日

Red-teaming and security suggestions regarding proposed rule by the Bureau of Industry and Security, “Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters.”

New Support Through The Audacious Project

2024年10月9日

Funding for Canary will enable research and implementation at scale

阅读英文原文

Response to U.S. AISI Draft “Managing Misuse Risk for Dual-Use Foundation Models”

2024年9月8日

Suggestions for expanded guidance on capability elicitation and robust model safeguards in the U.S. AI Safety Institute’s draft document “Managing Misuse Risk for Dual-Use Foundation Models” (NIST AI 800-1).

Response to NIST Draft Generative AI Profile

2024年6月2日

Comments on NIST’s draft document “AI Risk Management Framework: Generative AI Profile.”

ML Engineers Needed for New AI R&D Evals Project

2024年5月16日

METR is hiring ML engineers and researchers.

阅读英文原文

Emma Abele is METR’s new Executive Director

2024年4月26日

Emma moves from President to Executive Director, Beth moves to Head of Research.

阅读英文原文

2023 Year In Review

2024年2月7日

A summary of what METR accomplished in 2023 – our first full year of operation.

阅读英文原文

Bounty: Diverse hard tasks for LLM agents

2023年12月16日

METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.

阅读英文原文

ARC Evals is now METR

2023年12月4日

ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.

阅读英文原文

负责任扩展政策（RSP）

2023年9月26日

本文介绍负责任扩展政策（RSP）的基本内容，以及为什么我们认为 RSP 有望降低 AI 带来的灾难性风险。

ARC Evals is spinning out from ARC

2023年9月19日

ARC Evals plans to spin out from the Alignment Research Center (ARC) in the coming months, and become its own standalone organization.

阅读英文原文

Response to RfC on AI Accountability Policy

2023年6月11日

Input to NTIA’s AI Accountability Policy Request for Comment.