The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 10-2024, forthcoming in Philosophical Studies
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Other working papers
Longtermism in an Infinite World – Christian J. Tarsney (Population Wellbeing Initiative, University of Texas at Austin) and Hayden Wilkinson (Global Priorities Institute, University of Oxford)
The case for longtermism depends on the vast potential scale of the future. But that same vastness also threatens to undermine the case for longtermism: If the future contains infinite value, then many theories of value that support longtermism (e.g., risk-neutral total utilitarianism) seem to imply that no available action is better than any other. And some strategies for avoiding this conclusion (e.g., exponential time discounting) yield views that…
Measuring AI-Driven Risk with Stock Prices – Susana Campos-Martins (Global Priorities Institute, University of Oxford)
We propose an empirical approach to identify and measure AI-driven shocks based on the co-movements of relevant financial asset prices. For that purpose, we first calculate the common volatility of the share prices of major US AI-relevant companies. Then we isolate the events that shake this industry only from those that shake all sectors of economic activity at the same time. For the sample analysed, AI shocks are identified when there are announcements about (mergers and) acquisitions in the AI industry, launching of…
Non-additive axiologies in large worlds – Christian Tarsney and Teruji Thomas (Global Priorities Institute, Oxford University)
Is the overall value of a world just the sum of values contributed by each value-bearing entity in that world? Additively separable axiologies (like total utilitarianism, prioritarianism, and critical level views) say ‘yes’, but non-additive axiologies (like average utilitarianism, rank-discounted utilitarianism, and variable value views) say ‘no’…