The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

Other working papers

Calibration dilemmas in the ethics of distribution – Jacob M. Nebel (University of Southern California) and H. Orri Stefánsson (Stockholm University and Swedish Collegium for Advanced Study)

This paper presents a new kind of problem in the ethics of distribution. The problem takes the form of several “calibration dilemmas,” in which intuitively reasonable aversion to small-stakes inequalities requires leading theories of distribution to recommend intuitively unreasonable aversion to large-stakes inequalities—e.g., inequalities in which half the population would gain an arbitrarily large quantity of well-being or resources…

Against Anti-Fanaticism – Christian Tarsney (Population Wellbeing Initiative, University of Texas at Austin)

Should you be willing to forego any sure good for a tiny probability of a vastly greater good? Fanatics say you should, anti-fanatics say you should not. Anti-fanaticism has great intuitive appeal. But, I argue, these intuitions are untenable, because satisfying them in their full generality is incompatible with three very plausible principles: acyclicity, a minimal dominance principle, and the principle that any outcome can be made better or worse. This argument against anti-fanaticism can be…

Intergenerational equity under catastrophic climate change – Aurélie Méjean (CNRS, Paris), Antonin Pottier (EHESS, CIRED, Paris), Stéphane Zuber (CNRS, Paris) and Marc Fleurbaey (CNRS, Paris School of Economics)

Climate change raises the issue of intergenerational equity. As climate change threatens irreversible and dangerous impacts, possibly leading to extinction, the most relevant trade-off may not be between present and future consumption, but between present consumption and the mere existence of future generations. To investigate this trade-off, we build an integrated assessment model that explicitly accounts for the risk of extinction of future generations…