What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

How much should governments pay to prevent catastrophes? Longtermism’s limited role – Carl Shulman (Advisor, Open Philanthropy) and Elliott Thornley (Global Priorities Institute, University of Oxford)

Longtermists have argued that humanity should significantly increase its efforts to prevent catastrophes like nuclear wars, pandemics, and AI disasters. But one prominent longtermist argument overshoots this conclusion: the argument also implies that humanity should reduce the risk of existential catastrophe even at extreme cost to the present generation. This overshoot means that democratic governments cannot use the longtermist argument to guide their catastrophe policy. …

Exceeding expectations: stochastic dominance as a general decision theory – Christian Tarsney (Global Priorities Institute, Oxford University)

The principle that rational agents should maximize expected utility or choiceworthiness is intuitively plausible in many ordinary cases of decision-making under uncertainty. But it is less plausible in cases of extreme, low-probability risk (like Pascal’s Mugging), and intolerably paradoxical in cases like the St. Petersburg and Pasadena games. In this paper I show that, under certain conditions, stochastic dominance reasoning can capture most of the plausible implications of expectational reasoning while avoiding most of its pitfalls…

The Conservation Multiplier – Bård Harstad (University of Oslo)

Every government that controls an exhaustible resource must decide whether to exploit it or to conserve and thereby let the subsequent government decide whether to exploit or conserve. This paper develops a positive theory of this situation and shows when a small change in parameter values has a multiplier effect on exploitation. The multiplier strengthens the influence of a lobby paying for exploitation, and of a donor compensating for conservation. …