What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Respect for others’ risk attitudes and the long-run future – Andreas Mogensen (Global Priorities Institute, University of Oxford)

When our choice affects some other person and the outcome is unknown, it has been argued that we should defer to their risk attitude, if known, or else default to use of a risk avoidant risk function. This, in turn, has been claimed to require the use of a risk avoidant risk function when making decisions that primarily affect future people, and to decrease the desirability of efforts to prevent human extinction, owing to the significant risks associated with continued human survival. …

Tough enough? Robust satisficing as a decision norm for long-term policy analysis – Andreas Mogensen and David Thorstad (Global Priorities Institute, Oxford University)

This paper aims to open a dialogue between philosophers working in decision theory and operations researchers and engineers whose research addresses the topic of decision making under deep uncertainty. Specifically, we assess the recommendation to follow a norm of robust satisficing when making decisions under deep uncertainty in the context of decision analyses that rely on the tools of Robust Decision Making developed by Robert Lempert and colleagues at RAND …

Welfare and felt duration – Andreas Mogensen (Global Priorities Institute, University of Oxford)

How should we understand the duration of a pleasant or unpleasant sensation, insofar as its duration modulates how good or bad the experience is overall? Given that we seem able to distinguish between subjective and objective duration and that how well or badly someone’s life goes is naturally thought of as something to be assessed from her own perspective, it seems intuitive that it is subjective duration that modulates how good or bad an experience is from the perspective of an individual’s welfare. …