The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
Is Existential Risk Mitigation Uniquely Cost-Effective? Not in Standard Population Models – Gustav Alexandrie (Global Priorities Institute, University of Oxford) and Maya Eden (Brandeis University)
What socially beneficial causes should philanthropists prioritize if they give equal ethical weight to the welfare of current and future generations? Many have argued that, because human extinction would result in a permanent loss of all future generations, extinction risk mitigation should be the top priority given this impartial stance. Using standard models of population dynamics, we challenge this conclusion. We first introduce a theoretical framework for quantifying undiscounted cost-effectiveness over…
The Conservation Multiplier – Bård Harstad (University of Oslo)
Every government that controls an exhaustible resource must decide whether to exploit it or to conserve and thereby let the subsequent government decide whether to exploit or conserve. This paper develops a positive theory of this situation and shows when a small change in parameter values has a multiplier effect on exploitation. The multiplier strengthens the influence of a lobby paying for exploitation, and of a donor compensating for conservation. …
Existential Risk and Growth – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford) and Leopold Aschenbrenner
Technologies may pose existential risks to civilization. Though accelerating technological development may increase the risk of anthropogenic existential catastrophe per period in the short run, two considerations suggest that a sector-neutral acceleration decreases the risk that such a catastrophe ever occurs. First, acceleration decreases the time spent at each technology level. Second, since a richer society is willing to sacrifice more for safety, optimal policy can yield an “existential risk Kuznets curve”; acceleration…