The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
How important is the end of humanity? Lay people prioritize extinction prevention but not above all other societal issues. – Matthew Coleman (Northeastern University), Lucius Caviola (Global Priorities Institute, University of Oxford) et al.
Human extinction would mean the deaths of eight billion people and the end of humanity’s achievements, culture, and future potential. On several ethical views, extinction would be a terrible outcome. How do people think about human extinction? And how much do they prioritize preventing extinction over other societal issues? Across six empirical studies (N = 2,541; U.S. and China) we find that people consider extinction prevention a global priority and deserving of greatly increased societal resources. …
Concepts of existential catastrophe – Hilary Greaves (University of Oxford)
The notion of existential catastrophe is increasingly appealed to in discussion of risk management around emerging technologies, but it is not completely clear what this notion amounts to. Here, I provide an opinionated survey of the space of plausibly useful definitions of existential catastrophe. Inter alia, I discuss: whether to define existential catastrophe in ex post or ex ante terms, whether an ex ante definition should be in terms of loss of expected value or loss of potential…
The structure of critical sets – Walter Bossert (University of Montreal), Susumu Cato (University of Tokyo) and Kohei Kamaga (Sophia University)
The purpose of this paper is to address some ambiguities and misunderstandings that appear in previous studies of population ethics. In particular, we examine the structure of intervals that are employed in assessing the value of adding people to an existing population. Our focus is on critical-band utilitarianism and critical-range utilitarianism, which are commonly-used population theories that employ intervals, and we show that some previously assumed equivalences are not true in general. The possible discrepancies can be…