The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Doomsday rings twice – Andreas Mogensen (Global Priorities Institute, Oxford University)

This paper considers the argument according to which, because we should regard it as a priori very unlikely that we are among the most important people who will ever exist, we should increase our confidence that the human species will not persist beyond the current historical era, which seems to represent…

The scope of longtermism – David Thorstad (Global Priorities Institute, University of Oxford)

Longtermism holds roughly that in many decision situations, the best thing we can do is what is best for the long-term future. The scope question for longtermism asks: how large is the class of decision situations for which longtermism holds? Although longtermism was initially developed to describe the situation of…

The unexpected value of the future – Hayden Wilkinson (Global Priorities Institute, University of Oxford)

Various philosophers accept moral views that are impartial, additive, and risk-neutral with respect to betterness. But, if that risk neutrality is spelt out according to expected value theory alone, such views face a dire reductio ad absurdum. If the expected sum of value in humanity’s future is undefined—if, e.g., the probability distribution over possible values of the future resembles the Pasadena game, or a Cauchy distribution—then those views say that no real-world option is ever better than any other. And, as I argue…