Imperfect Recall and AI Delegation
Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)
GPI Working Paper No. 30-2024
A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.
Other working papers
Future Suffering and the Non-Identity Problem – Theron Pummer (University of St Andrews)
I present and explore a new version of the Person-Affecting View, according to which reasons to do an act depend wholly on what would be said for or against this act from the points of view of particular individuals. According to my view, (i) there is a morally requiring reason not to bring about lives insofar as they contain suffering (negative welfare), (ii) there is no morally requiring reason to bring about lives insofar as they contain happiness (positive welfare), but (iii) there is a permitting reason to bring about lives insofar as they…
Misjudgment Exacerbates Collective Action Problems – Joshua Lewis (New York University) et al.
In collective action problems, suboptimal collective outcomes arise from each individual optimizing their own wellbeing. Past work assumes individuals do this because they care more about themselves than others. Yet, other factors could also contribute. We examine the role of empirical beliefs. Our results suggest people underestimate individual impact on collective problems. When collective action seems worthwhile, individual action often does not, even if the expected ratio of costs to benefits is the same. …
The cross-sectional implications of the social discount rate – Maya Eden (Brandeis University)
How should policy discount future returns? The standard approach to this normative question is to ask how much society should care about future generations relative to people alive today. This paper establishes an alternative approach, based on the social desirability of redistributing from the current old to the current young. …