What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

When should an effective altruist donate? – William MacAskill (Global Priorities Institute, Oxford University)

Effective altruism is the use of evidence and careful reasoning to work out how to maximize positive impact on others with a given unit of resources, and the taking of action on that basis. It’s a philosophy and a social movement that is gaining considerable steam in the philanthropic world. For example,…

Existential risk and growth – Leopold Aschenbrenner (Columbia University)

Human activity can create or mitigate risks of catastrophes, such as nuclear war, climate change, pandemics, or artificial intelligence run amok. These could even imperil the survival of human civilization. What is the relationship between economic growth and such existential risks? In a model of directed technical change, with moderate parameters, existential risk follows a Kuznets-style inverted U-shape. …

How effective is (more) money? Randomizing unconditional cash transfer amounts in the US – Ania Jaroszewicz (University of California San Diego), Oliver P. Hauser (University of Exeter), Jon M. Jachimowicz (Harvard Business School) and Julian Jamison (University of Oxford and University of Exeter)

We randomized 5,243 Americans in poverty to receive a one-time unconditional cash transfer (UCT) of $2,000 (two months’ worth of total household income for the median participant), $500 (half a month’s income), or nothing. We measured the effects of the UCTs on participants’ financial well-being, psychological well-being, cognitive capacity, and physical health through surveys administered one week, six weeks, and 15 weeks later. While bank data show that both UCTs increased expenditures, we find no evidence that…