What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Towards shutdownable agents via stochastic choice – Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University), and Louis Thomson (University of Oxford)

Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn’t happen. A key part of the IPP is using a novel ‘Discounted REward for Same-Length Trajectories (DREST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics…

On the desire to make a difference – Hilary Greaves, William MacAskill, Andreas Mogensen and Teruji Thomas (Global Priorities Institute, University of Oxford)

True benevolence is, most fundamentally, a desire that the world be better. It is natural and common, however, to frame thinking about benevolence indirectly, in terms of a desire to make a difference to how good the world is. This would be an innocuous shift if desires to make a difference were extensionally equivalent to desires that the world be better. This paper shows that at least on some common ways of making a “desire to make a difference” precise, this extensional equivalence fails.

Ethical Consumerism – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford)

I study a static production economy in which consumers have not only preferences over their own consumption but also external, or “ethical”, preferences over the supply of each good. Though existing work on the implications of external preferences assumes price-taking, I show that ethical consumers generically prefer not to act even approximately as price-takers. I therefore introduce a near-Nash equilibrium concept that generalizes the near-Nash equilibria found in literature on strategic foundations of general equilibrium…