The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
The evidentialist’s wager – William MacAskill, Aron Vallinder (Global Priorities Institute, Oxford University) Caspar Österheld (Duke University), Carl Shulman (Future of Humanity Institute, Oxford University), Johannes Treutlein (TU Berlin)
Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act …
Moral demands and the far future – Andreas Mogensen (Global Priorities Institute, Oxford University)
I argue that moral philosophers have either misunderstood the problem of moral demandingness or at least failed to recognize important dimensions of the problem that undermine many standard assumptions. It has been assumed that utilitarianism concretely directs us to maximize welfare within a generation by transferring resources to people currently living in extreme poverty. In fact, utilitarianism seems to imply that any obligation to help people who are currently badly off is trumped by obligations to undertake actions targeted at improving the value…
On the desire to make a difference – Hilary Greaves, William MacAskill, Andreas Mogensen and Teruji Thomas (Global Priorities Institute, University of Oxford)
True benevolence is, most fundamentally, a desire that the world be better. It is natural and common, however, to frame thinking about benevolence indirectly, in terms of a desire to make a difference to how good the world is. This would be an innocuous shift if desires to make a difference were extensionally equivalent to desires that the world be better. This paper shows that at least on some common ways of making a “desire to make a difference” precise, this extensional equivalence fails.