What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

A paradox for tiny probabilities and enormous values – Nick Beckstead (Open Philanthropy Project) and Teruji Thomas (Global Priorities Institute, Oxford University)

We show that every theory of the value of uncertain prospects must have one of three unpalatable properties. Reckless theories recommend risking arbitrarily great gains at arbitrarily long odds for the sake of enormous potential; timid theories recommend passing up arbitrarily great gains to prevent a tiny increase in risk; nontransitive theories deny the principle that, if A is better than B and B is better than C, then A must be better than C.

Are we living at the hinge of history? – William MacAskill (Global Priorities Institute, Oxford University)

In the final pages of On What Matters, Volume II, Derek Parfit comments: ‘We live during the hinge of history… If we act wisely in the next few centuries, humanity will survive its most dangerous and decisive period… What now matters most is that we avoid ending human history.’ This passage echoes Parfit’s comment, in Reasons and Persons, that ‘the next few centuries will be the most important in human history’. …

Numbers Tell, Words Sell – Michael Thaler (University College London), Mattie Toma (University of Warwick) and Victor Yaneng Wang (Massachusetts Institute of Technology)

When communicating numeric estimates with policymakers, journalists, or the general public, experts must choose between using numbers or natural language. We run two experiments to study whether experts strategically use language to communicate numeric estimates in order to persuade receivers. In Study 1, senders communicate probabilities of abstract events to receivers on Prolific, and in Study 2 academic researchers communicate the effect sizes in research papers to government policymakers. When…