What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

The scope of longtermism – David Thorstad (Global Priorities Institute, University of Oxford)

Longtermism holds roughly that in many decision situations, the best thing we can do is what is best for the long-term future. The scope question for longtermism asks: how large is the class of decision situations for which longtermism holds? Although longtermism was initially developed to describe the situation of…

Towards shutdownable agents via stochastic choice – Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University), and Louis Thomson (University of Oxford)

Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that does not happen. A key part of the IPP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose…

Egyptology and Fanaticism – Hayden Wilkinson (Global Priorities Institute, University of Oxford)

Various decision theories share a troubling implication. They imply that, for any finite amount of value, it would be better to wager it all for a vanishingly small probability of some greater value. Counterintuitive as it might be, this fanaticism has seemingly compelling independent arguments in its favour. In this paper, I consider perhaps the most prima facie compelling such argument: an Egyptology argument (an analogue of the Egyptology argument from population ethics). …