The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
Existential Risk and Growth – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford) and Leopold Aschenbrenner
Technologies may pose existential risks to civilization. Though accelerating technological development may increase the risk of anthropogenic existential catastrophe per period in the short run, two considerations suggest that a sector-neutral acceleration decreases the risk that such a catastrophe ever occurs. First, acceleration decreases the time spent at each technology level. Second, since a richer society is willing to sacrifice more for safety, optimal policy can yield an “existential risk Kuznets curve”; acceleration…
Longtermist political philosophy: An agenda for future research – Jacob Barrett (Global Priorities Institute, University of Oxford) and Andreas T. Schmidt (University of Groningen)
We set out longtermist political philosophy as a research field. First, we argue that the standard case for longtermism is more robust when applied to institutions than to individual action. This motivates “institutional longtermism”: when building or shaping institutions, positively affecting the value of the long-term future is a key moral priority. Second, we briefly distinguish approaches to pursuing longtermist institutional reform along two dimensions: such approaches may be more targeted or more broad, and more urgent or more patient.
Against Willing Servitude: Autonomy in the Ethics of Advanced Artificial Intelligence – Adam Bales (Global Priorities Institute, University of Oxford)
Some people believe that advanced artificial intelligence systems (AIs) might, in the future, come to have moral status. Further, humans might be tempted to design such AIs that they serve us, carrying out tasks that make our lives better. This raises the question of whether designing AIs with moral status to be willing servants would problematically violate their autonomy. In this paper, I argue that it would in fact do so.