本页由 AI 翻译,可能有错误或不自然的地方。 不清楚时,请参阅英文原文
动态
Review of the "Risks from automated R&D" section in the Anthropic Risk Report (February 2026)
2026年5月8日

External review from METR of the "Risks from automated R&D" section in Anthropic's February 2026 Risk Report

阅读英文原文
Red-Teaming Anthropic's Internal Agent Monitoring Systems
2026年3月26日

A METR staff member spent three weeks red-teaming a subset of Anthropic's internal agent monitoring and security systems, discovering several novel vulnerabilities.

阅读英文原文
Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6
2026年3月12日

External review from METR of Anthropic's Sabotage Risk Report for Claude Opus 4.6

阅读英文原文
How We Protect Confidential Information
2026年2月17日

Our high-level approach to protecting confidential access and information

阅读英文原文
Common Elements of Frontier AI Safety Policies (December 2025 Update)
2025年12月9日

Shared components of AI lab commitments to evaluate and mitigate severe risks.

阅读英文原文
Review of the Anthropic Summer 2025 Pilot Sabotage Risk Report
2025年10月28日

External review from METR of Anthropic's Summer 2025 Sabotage Risk Report

阅读英文原文
Summary of our gpt-oss methodology review
2025年10月23日

Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.

阅读英文原文
Notes on Scientific Communication at METR
2025年8月12日

How we think about tradeoffs when communicating surprising or nuanced findings.

阅读英文原文
What should companies share about risks from frontier AI models?
2025年6月27日

Current views on information relevant for visibility into frontier AI risk.

阅读英文原文
Response to OSTP on AI Action Plan
2025年3月15日

Suggested priorities for the Office of Science and Technology Policy as it develops an AI Action Plan.

阅读英文原文
为什么 AI 推理应当可读,并如实反映模型的实际决策过程
2025年3月11日

为什么可读且忠实的推理有助于安全开发强大的 AI

阅读全文
Frontier AI Safety Policies
2025年2月8日

List of frontier safety policies published by AI companies, including Amazon, Anthropic, Google DeepMind, G42, Meta, Microsoft, OpenAI, and xAI.

阅读英文原文
AI models can be dangerous before public deployment
2025年1月17日

Why pre-deployment testing is not an adequate framework for AI risk management

阅读英文原文
Response to Bureau of Industry and Security’s proposed AI reporting requirements
2024年10月11日

Red-teaming and security suggestions regarding proposed rule by the Bureau of Industry and Security, “Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters.”

阅读全文
New Support Through The Audacious Project
2024年10月9日

Funding for Canary will enable research and implementation at scale

阅读英文原文
Response to U.S. AISI Draft “Managing Misuse Risk for Dual-Use Foundation Models”
2024年9月8日

Suggestions for expanded guidance on capability elicitation and robust model safeguards in the U.S. AI Safety Institute’s draft document “Managing Misuse Risk for Dual-Use Foundation Models” (NIST AI 800-1).

阅读全文
Response to NIST Draft Generative AI Profile
2024年6月2日

Comments on NIST’s draft document “AI Risk Management Framework: Generative AI Profile.”

阅读全文
ML Engineers Needed for New AI R&D Evals Project
2024年5月16日

METR is hiring ML engineers and researchers.

阅读英文原文
Emma Abele is METR’s new Executive Director
2024年4月26日

Emma moves from President to Executive Director, Beth moves to Head of Research.

阅读英文原文
2023 Year In Review
2024年2月7日

A summary of what METR accomplished in 2023 – our first full year of operation.

阅读英文原文
Bounty: Diverse hard tasks for LLM agents
2023年12月16日

METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.

阅读英文原文
ARC Evals is now METR
2023年12月4日

ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.

阅读英文原文
负责任扩展政策(RSP)
2023年9月26日

本文介绍负责任扩展政策(RSP)的基本内容,以及为什么我们认为 RSP 有望降低 AI 带来的灾难性风险。

阅读全文
ARC Evals is spinning out from ARC
2023年9月19日

ARC Evals plans to spin out from the Alignment Research Center (ARC) in the coming months, and become its own standalone organization.

阅读英文原文
Response to RfC on AI Accountability Policy
2023年6月11日

Input to NTIA’s AI Accountability Policy Request for Comment.

阅读全文