What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Are we living at the hinge of history? – William MacAskill (Global Priorities Institute, Oxford University)

In the final pages of On What Matters, Volume II, Derek Parfit comments: ‘We live during the hinge of history… If we act wisely in the next few centuries, humanity will survive its most dangerous and decisive period… What now matters most is that we avoid ending human history.’ This passage echoes Parfit’s comment, in Reasons and Persons, that ‘the next few centuries will be the most important in human history’. …

Exceeding expectations: stochastic dominance as a general decision theory – Christian Tarsney (Global Priorities Institute, Oxford University)

The principle that rational agents should maximize expected utility or choiceworthiness is intuitively plausible in many ordinary cases of decision-making under uncertainty. But it is less plausible in cases of extreme, low-probability risk (like Pascal’s Mugging), and intolerably paradoxical in cases like the St. Petersburg and Pasadena games. In this paper I show that, under certain conditions, stochastic dominance reasoning can capture most of the plausible implications of expectational reasoning while avoiding most of its pitfalls…

Social Beneficence – Jacob Barrett (Global Priorities Institute, University of Oxford)

A background assumption in much contemporary political philosophy is that justice is the first virtue of social institutions, taking priority over other values such as beneficence. This assumption is typically treated as a methodological starting point, rather than as following from any particular moral or political theory. In this paper, I challenge this assumption.