The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Critical-set views, biographical identity, and the long term – Elliott Thornley (Global Priorities Institute, University of Oxford)

Critical-set views avoid the Repugnant Conclusion by subtracting some constant from the welfare score of each life in a population. These views are thus sensitive to facts about biographical identity: identity between lives. In this paper, I argue that questions of biographical identity give us reason to reject critical-set views and embrace the total view. I end with a practical implication. If we shift our credences towards the total view, we should also shift our efforts towards ensuring that humanity survives for the long term.

On two arguments for Fanaticism – Jeffrey Sanford Russell (University of Southern California)

Should we make significant sacrifices to ever-so-slightly lower the chance of extremely bad outcomes, or to ever-so-slightly raise the chance of extremely good outcomes? Fanaticism says yes: for every bad outcome, there is a tiny chance of of extreme disaster that is even worse, and for every good outcome, there is a tiny chance of an enormous good that is even better.

Longtermist political philosophy: An agenda for future research – Jacob Barrett (Global Priorities Institute, University of Oxford) and Andreas T. Schmidt (University of Groningen)

We set out longtermist political philosophy as a research field. First, we argue that the standard case for longtermism is more robust when applied to institutions than to individual action. This motivates “institutional longtermism”: when building or shaping institutions, positively affecting the value of the long-term future is a key moral priority. Second, we briefly distinguish approaches to pursuing longtermist institutional reform along two dimensions: such approaches may be more targeted or more broad, and more urgent or more patient.