What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Longtermism, aggregation, and catastrophic risk – Emma J. Curran (University of Cambridge)

Advocates of longtermism point out that interventions which focus on improving the prospects of people in the very far future will, in expectation, bring about a significant amount of good. Indeed, in expectation, such long-term interventions bring about far more good than their short-term counterparts. As such, longtermists claim we have compelling moral reason to prefer long-term interventions. …

Tiny probabilities and the value of the far future – Petra Kosonen (Population Wellbeing Initiative, University of Texas at Austin)

Morally speaking, what matters the most is the far future – at least according to Longtermism. The reason why the far future is of utmost importance is that our acts’ expected influence on the value of the world is mainly determined by their consequences in the far future. The case for Longtermism is straightforward: Given the enormous number of people who might exist in the far future, even a tiny probability of affecting how the far future goes outweighs the importance of our acts’ consequences…

Consciousness makes things matter – Andrew Y. Lee (University of Toronto)

This paper argues that phenomenal consciousness is what makes an entity a welfare subject, or the kind of thing that can be better or worse off. I develop and motivate this view, and then defend it from objections concerning death, non-conscious entities that have interests (such as plants), and conscious subjects that necessarily have welfare level zero. I also explain how my theory of welfare subjects relates to experientialist and anti-experientialist theories of welfare goods.