The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

The case for strong longtermism – Hilary Greaves and William MacAskill (Global Priorities Institute, University of Oxford)

A striking fact about the history of civilisation is just how early we are in it. There are 5000 years of recorded history behind us, but how many years are still to come? If we merely last as long as the typical mammalian species…

Respect for others’ risk attitudes and the long-run future – Andreas Mogensen (Global Priorities Institute, University of Oxford)

When our choice affects some other person and the outcome is unknown, it has been argued that we should defer to their risk attitude, if known, or else default to use of a risk avoidant risk function. This, in turn, has been claimed to require the use of a risk avoidant risk function when making decisions that primarily affect future people, and to decrease the desirability of efforts to prevent human extinction, owing to the significant risks associated with continued human survival. …

The weight of suffering – Andreas Mogensen (Global Priorities Institute, University of Oxford)

How should we weigh suffering against happiness? This paper highlights the existence of an argument from intuitively plausible axiological principles to the striking conclusion that in comparing different populations, there exists some depth of suffering that cannot be compensated for by any measure of well-being. In addition to a number of structural principles, the argument relies on two key premises. The first is the contrary of the so-called Reverse Repugnant Conclusion…