What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Time Bias and Altruism – Leora Urim Sung (University College London)

We are typically near-future biased, being more concerned with our near future than our distant future. This near-future bias can be directed at others too, being more concerned with their near future than their distant future. In this paper, I argue that, because we discount the future in this way, beyond a certain point in time, we morally ought to be more concerned with the present well- being of others than with the well-being of our distant future selves. It follows that we morally ought to sacrifice…

Concepts of existential catastrophe – Hilary Greaves (University of Oxford)

The notion of existential catastrophe is increasingly appealed to in discussion of risk management around emerging technologies, but it is not completely clear what this notion amounts to. Here, I provide an opinionated survey of the space of plausibly useful definitions of existential catastrophe. Inter alia, I discuss: whether to define existential catastrophe in ex post or ex ante terms, whether an ex ante definition should be in terms of loss of expected value or loss of potential…

The weight of suffering – Andreas Mogensen (Global Priorities Institute, University of Oxford)

How should we weigh suffering against happiness? This paper highlights the existence of an argument from intuitively plausible axiological principles to the striking conclusion that in comparing different populations, there exists some depth of suffering that cannot be compensated for by any measure of well-being. In addition to a number of structural principles, the argument relies on two key premises. The first is the contrary of the so-called Reverse Repugnant Conclusion…