Imperfect Recall and AI Delegation

Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)

GPI Working Paper No. 30-2024

A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.

Other working papers

Future Suffering and the Non-Identity Problem – Theron Pummer (University of St Andrews)

I present and explore a new version of the Person-Affecting View, according to which reasons to do an act depend wholly on what would be said for or against this act from the points of view of particular individuals. According to my view, (i) there is a morally requiring reason not to bring about lives insofar as they contain suffering (negative welfare), (ii) there is no morally requiring reason to bring about lives insofar as they contain happiness (positive welfare), but (iii) there is a permitting reason to bring about lives insofar as they…

When should an effective altruist donate? – William MacAskill (Global Priorities Institute, Oxford University)

Effective altruism is the use of evidence and careful reasoning to work out how to maximize positive impact on others with a given unit of resources, and the taking of action on that basis. It’s a philosophy and a social movement that is gaining considerable steam in the philanthropic world. For example,…

The evidentialist’s wager – William MacAskill, Aron Vallinder (Global Priorities Institute, Oxford University) Caspar Österheld (Duke University), Carl Shulman (Future of Humanity Institute, Oxford University), Johannes Treutlein (TU Berlin)

Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act …