The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Are we living at the hinge of history? – William MacAskill (Global Priorities Institute, Oxford University)

In the final pages of On What Matters, Volume II, Derek Parfit comments: ‘We live during the hinge of history… If we act wisely in the next few centuries, humanity will survive its most dangerous and decisive period… What now matters most is that we avoid ending human history.’ This passage echoes Parfit’s comment, in Reasons and Persons, that ‘the next few centuries will be the most important in human history’. …

Misjudgment Exacerbates Collective Action Problems – Joshua Lewis (New York University) et al.

In collective action problems, suboptimal collective outcomes arise from each individual optimizing their own wellbeing. Past work assumes individuals do this because they care more about themselves than others. Yet, other factors could also contribute. We examine the role of empirical beliefs. Our results suggest people underestimate individual impact on collective problems. When collective action seems worthwhile, individual action often does not, even if the expected ratio of costs to benefits is the same. …

Social Beneficence – Jacob Barrett (Global Priorities Institute, University of Oxford)

A background assumption in much contemporary political philosophy is that justice is the first virtue of social institutions, taking priority over other values such as beneficence. This assumption is typically treated as a methodological starting point, rather than as following from any particular moral or political theory. In this paper, I challenge this assumption.