The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Is Existential Risk Mitigation Uniquely Cost-Effective? Not in Standard Population Models – Gustav Alexandrie (Global Priorities Institute, University of Oxford) and Maya Eden (Brandeis University)

What socially beneficial causes should philanthropists prioritize if they give equal ethical weight to the welfare of current and future generations? Many have argued that, because human extinction would result in a permanent loss of all future generations, extinction risk mitigation should be the top priority given this impartial stance. Using standard models of population dynamics, we challenge this conclusion. We first introduce a theoretical framework for quantifying undiscounted cost-effectiveness over…

Can an evidentialist be risk-averse? – Hayden Wilkinson (Global Priorities Institute, University of Oxford)

Two key questions of normative decision theory are: 1) whether the probabilities relevant to decision theory are evidential or causal; and 2) whether agents should be risk-neutral, and so maximise the expected value of the outcome, or instead risk-averse (or otherwise sensitive to risk). These questions are typically thought to be independent – that our answer to one bears little on our answer to the other. …

Meaning, medicine and merit – Andreas Mogensen (Global Priorities Institute, Oxford University)

Given the inevitability of scarcity, should public institutions ration healthcare resources so as to prioritize those who contribute more to society? Intuitively, we may feel that this would be somehow inegalitarian. I argue that the egalitarian objection to prioritizing treatment on the basis of patients’ usefulness to others is best thought…