The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Is Existential Risk Mitigation Uniquely Cost-Effective? Not in Standard Population Models – Gustav Alexandrie (Global Priorities Institute, University of Oxford) and Maya Eden (Brandeis University)

What socially beneficial causes should philanthropists prioritize if they give equal ethical weight to the welfare of current and future generations? Many have argued that, because human extinction would result in a permanent loss of all future generations, extinction risk mitigation should be the top priority given this impartial stance. Using standard models of population dynamics, we challenge this conclusion. We first introduce a theoretical framework for quantifying undiscounted cost-effectiveness over…

Quadratic Funding with Incomplete Information – Luis M. V. Freitas (Global Priorities Institute, University of Oxford) and Wilfredo L. Maldonado (University of Sao Paulo)

Quadratic funding is a public good provision mechanism that satisfies desirable theoretical properties, such as efficiency under complete information, and has been gaining popularity in practical applications. We evaluate this mechanism in a setting of incomplete information regarding individual preferences, and show that this result only holds under knife-edge conditions. We also estimate the inefficiency of the mechanism in a variety of settings and show, in particular, that inefficiency increases…

The evidentialist’s wager – William MacAskill, Aron Vallinder (Global Priorities Institute, Oxford University) Caspar Österheld (Duke University), Carl Shulman (Future of Humanity Institute, Oxford University), Johannes Treutlein (TU Berlin)

Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act …