The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Population ethics with thresholds – Walter Bossert (University of Montreal), Susumu Cato (University of Tokyo) and Kohei Kamaga (Sophia University)

We propose a new class of social quasi-orderings in a variable-population setting. In order to declare one utility distribution at least as good as another, the critical-level utilitarian value of the former must reach or surpass the value of the latter. For each possible absolute value of the difference between the population sizes of two distributions to be compared, we specify a non-negative threshold level and a threshold inequality. This inequality indicates whether the corresponding threshold level must be reached or surpassed in…

Concepts of existential catastrophe – Hilary Greaves (University of Oxford)

The notion of existential catastrophe is increasingly appealed to in discussion of risk management around emerging technologies, but it is not completely clear what this notion amounts to. Here, I provide an opinionated survey of the space of plausibly useful definitions of existential catastrophe. Inter alia, I discuss: whether to define existential catastrophe in ex post or ex ante terms, whether an ex ante definition should be in terms of loss of expected value or loss of potential…

Economic inequality and the long-term future – Andreas T. Schmidt (University of Groningen) and Daan Juijn (CE Delft)

Why, if at all, should we object to economic inequality? Some central arguments – the argument from decreasing marginal utility for example – invoke instrumental reasons and object to inequality because of its effects…