Imperfect Recall and AI Delegation

Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)

GPI Working Paper No. 30-2024

A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.

Other working papers

Moral demands and the far future – Andreas Mogensen (Global Priorities Institute, Oxford University)

I argue that moral philosophers have either misunderstood the problem of moral demandingness or at least failed to recognize important dimensions of the problem that undermine many standard assumptions. It has been assumed that utilitarianism concretely directs us to maximize welfare within a generation by transferring resources to people currently living in extreme poverty. In fact, utilitarianism seems to imply that any obligation to help people who are currently badly off is trumped by obligations to undertake actions targeted at improving the value…

How effective is (more) money? Randomizing unconditional cash transfer amounts in the US – Ania Jaroszewicz (University of California San Diego), Oliver P. Hauser (University of Exeter), Jon M. Jachimowicz (Harvard Business School) and Julian Jamison (University of Oxford and University of Exeter)

We randomized 5,243 Americans in poverty to receive a one-time unconditional cash transfer (UCT) of $2,000 (two months’ worth of total household income for the median participant), $500 (half a month’s income), or nothing. We measured the effects of the UCTs on participants’ financial well-being, psychological well-being, cognitive capacity, and physical health through surveys administered one week, six weeks, and 15 weeks later. While bank data show that both UCTs increased expenditures, we find no evidence that…

Desire-Fulfilment and Consciousness – Andreas Mogensen (Global Priorities Institute, University of Oxford)

I show that there are good reasons to think that some individuals without any capacity for consciousness should be counted as welfare subjects, assuming that desire-fulfilment is a welfare good and that any individuals who can accrue welfare goods are welfare subjects. While other philosophers have argued for similar conclusions, I show that they have done so by relying on a simplistic understanding of the desire-fulfilment theory. My argument is intended to be sensitive to the complexities and nuances of contemporary…