Research on how AI agents can hide secondary task-solving from monitors, finding that harder tasks are more detectable and small models can learn to evade larger monitors.
A replication of a Google DeepMind paper on chain-of-thought monitoring, showing evidence that monitoring works on other companies' models.