本页由 AI 翻译,可能有错误或不自然的地方。 不清楚时,请参阅英文原文
研究笔记
初步研究进展、想法与推测
Evidence on AI R&D Progress from NanoGPT
Evidence on AI R&D Progress from NanoGPT
2026年4月21日

Classifying human and agent contributions to the NanoGPT speedrun, and what publicly tracked challenges can tell us about AI R&D acceleration.

阅读英文原文
Fine-tuning experiments on CoT controllability
Fine-tuning experiments on CoT controllability
2026年4月1日

We find that a small amount of fine-tuning on instruction following in the CoT generalizes to meaningful increases in CoT controllability on an out-of-distribution set of tasks. We fine-tune four reasoning models on small datasets of instruction-following reasoning data and OOD controllability rises from an average of 2.9% to 8.8% across four models.

阅读英文原文
Impact of modelling assumptions on time horizon results
Impact of modelling assumptions on time horizon results
2026年3月20日

Alexander Barry examines how different modelling choices affect METR's time horizon estimates.

阅读英文原文
We spent 2 hours working in the future
We spent 2 hours working in the future
2026年3月19日

Thomas Kwa describes a tabletop exercise where METR researchers simulated having access to ~200-hour time horizon AIs.

阅读英文原文
Many SWE-bench-Passing PRs Would Not Be Merged into Main
Many SWE-bench-Passing PRs Would Not Be Merged into Main
2026年3月10日

We find that roughly half of test-passing SWE-bench Verified PRs written by recent AI agents would not be merged into main by repo maintainers. A naive interpretation of benchmark scores may lead one to overestimate how useful agents are without more elicitation or human feedback.

阅读英文原文
Observations from two CLI game reimplementation runs with Opus 4.6
Observations from two CLI game reimplementation runs with Opus 4.6
2026年3月3日

Nikola Jurkovic describes observations from tasking Opus 4.6 with reimplementing Slay the Spire and Balatro in the CLI.

阅读英文原文
Five lessons from having helped run an AI-Biology RCT
Five lessons from having helped run an AI-Biology RCT
2026年2月19日

Luca Righetti shares takeaways on the role of randomized controlled trials in AI safety testing.

阅读英文原文
Analyzing coding agent transcripts to upper bound productivity gains from AI agents
Analyzing coding agent transcripts to upper bound productivity gains from AI agents
2026年2月17日

Amy Deng investigates whether coding agent transcripts could serve as an alternative for estimating AI productivity uplift, using 5305 Claude Code transcripts from METR technical staff.

阅读英文原文
Measuring Time Horizon using Claude Code and Codex
Measuring Time Horizon using Claude Code and Codex
2026年2月13日

Nikola Jurkovic describes our measurements of time horizon using Claude Code and Codex scaffolds.

阅读英文原文
A simpler AI timelines model predicts 99% AI R&D automation in ~2032
A simpler AI timelines model predicts 99% AI R&D automation in ~2032
2026年2月10日

Thomas Kwa describes a simple model for forecasting when AI will automate AI development, based on the AI Futures model but with only 8 parameters.

阅读英文原文
前沿 AI 安全法规:AI 公司员工参考指南
前沿 AI 安全法规:AI 公司员工参考指南
2026年1月29日

Miles Kodama 与 Michael Chen 梳理了加州 SB 53、欧盟实践准则和纽约 RAISE 法案的关键条款,说明前沿 AI 开发者需要关注哪些要求。

阅读全文
Clarifying limitations of time horizon
Clarifying limitations of time horizon
2026年1月22日

Thomas Kwa responds to some misinterpretations of our time horizon work, and explains limitations and the core finding.

阅读英文原文
Early Results on Monitorability in QA Settings
Early Results on Monitorability in QA Settings
2025年10月6日

Vincent Cheng, Thomas Kwa, and Neev Parikh share research on how AI agents can hide secondary task-solving from monitors, finding that harder tasks are more detectable and small models can learn to evade larger monitors.

阅读英文原文
Claude, GPT, and Gemini All Struggle to Evade Monitors
Claude, GPT, and Gemini All Struggle to Evade Monitors
2025年8月22日

Vincent Cheng and Thomas Kwa replicate a Google DeepMind paper on chain-of-thought monitoring, showing evidence that monitoring works on other companies' models.

阅读英文原文