Imperfect Recall and AI Delegation
Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)
GPI Working Paper No. 30-2024
A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.
Other working papers
Is In-kind Kinder than Cash? The Impact of Money vs Food Aid on Social Emotions and Aid Take-up – Samantha Kassirer, Ata Jami, & Maryam Kouchaki (Northwestern University)
There has been widespread endorsement from the academic and philanthropic communities on the new model of giving cash to those in need. Yet the recipient’s perspective has mostly been ignored. The present research explores how food-insecure individuals feel and respond when offered either monetary or food aid from a charity. Our results reveal that individuals are less likely to accept money than food aid from charity because receiving money feels relatively more shameful and relatively less socially positive. Since many…
Critical-set views, biographical identity, and the long term – Elliott Thornley (Global Priorities Institute, University of Oxford)
Critical-set views avoid the Repugnant Conclusion by subtracting some constant from the welfare score of each life in a population. These views are thus sensitive to facts about biographical identity: identity between lives. In this paper, I argue that questions of biographical identity give us reason to reject critical-set views and embrace the total view. I end with a practical implication. If we shift our credences towards the total view, we should also shift our efforts towards ensuring that humanity survives for the long term.
The Hinge of History Hypothesis: Reply to MacAskill – Andreas Mogensen (Global Priorities Institute, University of Oxford)
Some believe that the current era is uniquely important with respect to how well the rest of human history goes. Following Parfit, call this the Hinge of History Hypothesis. Recently, MacAskill has argued that our era is actually very unlikely to be especially influential in the way asserted by the Hinge of History Hypothesis. I respond to MacAskill, pointing to important unresolved ambiguities in his proposed definition of what it means for a time to be influential and criticizing the two arguments…