The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
The asymmetry, uncertainty, and the long term – Teruji Thomas (Global Priorities Institute, Oxford University)
The Asymmetry is the view in population ethics that, while we ought to avoid creating additional bad lives, there is no requirement to create additional good ones. The question is how to embed this view in a complete normative theory, and in particular one that treats uncertainty in a plausible way. After reviewing…
Longtermism in an Infinite World – Christian J. Tarsney (Population Wellbeing Initiative, University of Texas at Austin) and Hayden Wilkinson (Global Priorities Institute, University of Oxford)
The case for longtermism depends on the vast potential scale of the future. But that same vastness also threatens to undermine the case for longtermism: If the future contains infinite value, then many theories of value that support longtermism (e.g., risk-neutral total utilitarianism) seem to imply that no available action is better than any other. And some strategies for avoiding this conclusion (e.g., exponential time discounting) yield views that…
The structure of critical sets – Walter Bossert (University of Montreal), Susumu Cato (University of Tokyo) and Kohei Kamaga (Sophia University)
The purpose of this paper is to address some ambiguities and misunderstandings that appear in previous studies of population ethics. In particular, we examine the structure of intervals that are employed in assessing the value of adding people to an existing population. Our focus is on critical-band utilitarianism and critical-range utilitarianism, which are commonly-used population theories that employ intervals, and we show that some previously assumed equivalences are not true in general. The possible discrepancies can be…