The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Respect for others’ risk attitudes and the long-run future – Andreas Mogensen (Global Priorities Institute, University of Oxford)

When our choice affects some other person and the outcome is unknown, it has been argued that we should defer to their risk attitude, if known, or else default to use of a risk avoidant risk function. This, in turn, has been claimed to require the use of a risk avoidant risk function when making decisions that primarily affect future people, and to decrease the desirability of efforts to prevent human extinction, owing to the significant risks associated with continued human survival. …

On the desire to make a difference – Hilary Greaves, William MacAskill, Andreas Mogensen and Teruji Thomas (Global Priorities Institute, University of Oxford)

True benevolence is, most fundamentally, a desire that the world be better. It is natural and common, however, to frame thinking about benevolence indirectly, in terms of a desire to make a difference to how good the world is. This would be an innocuous shift if desires to make a difference were extensionally equivalent to desires that the world be better. This paper shows that at least on some common ways of making a “desire to make a difference” precise, this extensional equivalence fails.

Consequentialism, Cluelessness, Clumsiness, and Counterfactuals – Alan Hájek (Australian National University)

According to a standard statement of objective consequentialism, a morally right action is one that has the best consequences. More generally, given a choice between two actions, one is morally better than the other just in case the consequences of the former action are better than those of the latter. (These are not just the immediate consequences of the actions, but the long-term consequences, perhaps until the end of history.) This account glides easily off the tongue—so easily that…