Imperfect Recall and AI Delegation
Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)
GPI Working Paper No. 30-2024
A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.
Other working papers
Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making – David Thorstad and Andreas Mogensen (Global Priorities Institute, Oxford University)
Even our most mundane decisions have the potential to significantly impact the long-term future, but we are often clueless about what this impact may be. In this paper, we aim to characterize and solve two problems raised by recent discussions of cluelessness, which we term the Problems of Decision Paralysis and the Problem of Decision-Making Demandingness. After reviewing and rejecting existing solutions to both problems, we argue that the way forward is to be found in the distinction between procedural and substantive rationality…
Should longtermists recommend hastening extinction rather than delaying it? – Richard Pettigrew (University of Bristol)
Longtermism is the view that the most urgent global priorities, and those to which we should devote the largest portion of our current resources, are those that focus on ensuring a long future for humanity, and perhaps sentient or intelligent life more generally, and improving the quality of those lives in that long future. The central argument for this conclusion is that, given a fixed amount of are source that we are able to devote to global priorities, the longtermist’s favoured interventions have…
When should an effective altruist donate? – William MacAskill (Global Priorities Institute, Oxford University)
Effective altruism is the use of evidence and careful reasoning to work out how to maximize positive impact on others with a given unit of resources, and the taking of action on that basis. It’s a philosophy and a social movement that is gaining considerable steam in the philanthropic world. For example,…