The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Crying wolf: Warning about societal risks can be reputationally risky – Lucius Caviola (Global Priorities Institute, University of Oxford) et al.

Society relies on expert warnings about large-scale risks like pandemics and natural disasters. Across ten studies (N = 5,342), we demonstrate people’s reluctance to warn about unlikely but large-scale risks because they are concerned about being blamed for being wrong. In particular, warners anticipate that if the risk doesn’t occur, they will be perceived as overly alarmist and responsible for wasting societal resources. This phenomenon appears in the context of natural, technological, and financial risks…

Longtermist political philosophy: An agenda for future research – Jacob Barrett (Global Priorities Institute, University of Oxford) and Andreas T. Schmidt (University of Groningen)

We set out longtermist political philosophy as a research field. First, we argue that the standard case for longtermism is more robust when applied to institutions than to individual action. This motivates “institutional longtermism”: when building or shaping institutions, positively affecting the value of the long-term future is a key moral priority. Second, we briefly distinguish approaches to pursuing longtermist institutional reform along two dimensions: such approaches may be more targeted or more broad, and more urgent or more patient.

A bargaining-theoretic approach to moral uncertainty – Owen Cotton-Barratt (Future of Humanity Institute, Oxford University), Hilary Greaves (Global Priorities Institute, Oxford University)

This paper explores a new approach to the problem of decision under relevant moral uncertainty. We treat the case of an agent making decisions in the face of moral uncertainty on the model of bargaining theory, as if the decision-making process were one of bargaining among different internal parts of the agent, with different parts committed to different moral theories. The resulting approach contrasts interestingly with the extant “maximise expected choiceworthiness”…