ARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. While the goal of our experiments is to develop methods to detect dangerous capabilities, if they were not supervised appropriately they could cause harm, and publishing the full details of our methods could make it easier to develop capabilities that cause harm. ARC Evals has primary responsibility for ensuring that its activities do not impose unreasonable risk on society, and that these risks are imposed only when they are small compared to risks imposed by AI developers themselves and provide commensurate benefits.
To help us fulfill this responsibility and hold us accountable if we fail to do so, we have appointed a Catastrophic Risk Oversight Committee. While the Oversight Committee has no power to legally bind or represent ARC Evals, it will advise ARC Evals on its decisions and activities with highest potential for catastrophic damages, and it will be empowered to publicly criticize ARC Evals’ decisions when they believe ARC Evals fails to live up to this responsibility.
The initial membership of the Oversight Committee is: