The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
Measuring AI-Driven Risk with Stock Prices – Susana Campos-Martins (Global Priorities Institute, University of Oxford)
We propose an empirical approach to identify and measure AI-driven shocks based on the co-movements of relevant financial asset prices. For that purpose, we first calculate the common volatility of the share prices of major US AI-relevant companies. Then we isolate the events that shake this industry only from those that shake all sectors of economic activity at the same time. For the sample analysed, AI shocks are identified when there are announcements about (mergers and) acquisitions in the AI industry, launching of…
Doomsday rings twice – Andreas Mogensen (Global Priorities Institute, Oxford University)
This paper considers the argument according to which, because we should regard it as a priori very unlikely that we are among the most important people who will ever exist, we should increase our confidence that the human species will not persist beyond the current historical era, which seems to represent…
Minimal and Expansive Longtermism – Hilary Greaves (University of Oxford) and Christian Tarsney (Population Wellbeing Initiative, University of Texas at Austin)
The standard case for longtermism focuses on a small set of risks to the far future, and argues that in a small set of choice situations, the present marginal value of mitigating those risks is very great. But many longtermists are attracted to, and many critics of longtermism worried by, a farther-reaching form of longtermism. According to this farther-reaching form, there are many ways of improving the far future, which determine the value of our options in all or nearly all choice situations…