The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

High risk, low reward: A challenge to the astronomical value of existential risk mitigation – David Thorstad (Global Priorities Institute, University of Oxford)

Many philosophers defend two claims: the astronomical value thesis that it is astronomically important to mitigate existential risks to humanity, and existential risk pessimism, the claim that humanity faces high levels of existential risk. It is natural to think that existential risk pessimism supports the astronomical value thesis. In this paper, I argue that precisely the opposite is true. Across a range of assumptions, existential risk pessimism significantly reduces the value of existential risk mitigation…

Concepts of existential catastrophe – Hilary Greaves (University of Oxford)

The notion of existential catastrophe is increasingly appealed to in discussion of risk management around emerging technologies, but it is not completely clear what this notion amounts to. Here, I provide an opinionated survey of the space of plausibly useful definitions of existential catastrophe. Inter alia, I discuss: whether to define existential catastrophe in ex post or ex ante terms, whether an ex ante definition should be in terms of loss of expected value or loss of potential…

The Conservation Multiplier – Bård Harstad (University of Oslo)

Every government that controls an exhaustible resource must decide whether to exploit it or to conserve and thereby let the subsequent government decide whether to exploit or conserve. This paper develops a positive theory of this situation and shows when a small change in parameter values has a multiplier effect on exploitation. The multiplier strengthens the influence of a lobby paying for exploitation, and of a donor compensating for conservation. …