Imperfect Recall and AI Delegation

Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)

GPI Working Paper No. 30-2024

A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.

Other working papers

High risk, low reward: A challenge to the astronomical value of existential risk mitigation – David Thorstad (Global Priorities Institute, University of Oxford)

Many philosophers defend two claims: the astronomical value thesis that it is astronomically important to mitigate existential risks to humanity, and existential risk pessimism, the claim that humanity faces high levels of existential risk. It is natural to think that existential risk pessimism supports the astronomical value thesis. In this paper, I argue that precisely the opposite is true. Across a range of assumptions, existential risk pessimism significantly reduces the value of existential risk mitigation…

Doomsday rings twice – Andreas Mogensen (Global Priorities Institute, Oxford University)

This paper considers the argument according to which, because we should regard it as a priori very unlikely that we are among the most important people who will ever exist, we should increase our confidence that the human species will not persist beyond the current historical era, which seems to represent…

Economic growth under transformative AI – Philip Trammell (Global Priorities Institute, Oxford University) and Anton Korinek (University of Virginia)

Industrialized countries have long seen relatively stable growth in output per capita and a stable labor share. AI may be transformative, in the sense that it may break one or both of these stylized facts. This review outlines the ways this may happen by placing several strands of the literature on AI and growth within a common framework. We first evaluate models in which AI increases output production, for example via increases in capital’s substitutability for labor…