The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
Meaning, medicine and merit – Andreas Mogensen (Global Priorities Institute, Oxford University)
Given the inevitability of scarcity, should public institutions ration healthcare resources so as to prioritize those who contribute more to society? Intuitively, we may feel that this would be somehow inegalitarian. I argue that the egalitarian objection to prioritizing treatment on the basis of patients’ usefulness to others is best thought…
Existential risks from a Thomist Christian perspective – Stefan Riedener (University of Zurich)
Let’s say with Nick Bostrom that an ‘existential risk’ (or ‘x-risk’) is a risk that ‘threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development’ (2013, 15). There are a number of such risks: nuclear wars, developments in biotechnology or artificial intelligence, climate change, pandemics, supervolcanos, asteroids, and so on (see e.g. Bostrom and Ćirković 2008). …
Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Rethink Priorities), Jojo Lee and Victor Yaneng Wang (Global Priorities Institute, University of Oxford)
The surrogate index method allows policymakers to estimate long-run treatment effects before long-run outcomes are observable. We meta-analyse this approach over nine long-run RCTs in development economics, comparing surrogate estimates to estimates from actual long-run RCT outcomes. We introduce the M-lasso algorithm for constructing the surrogate approach’s first-stage predictive model and compare its performance with other surrogate estimation methods. …