What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Longtermism in an Infinite World – Christian J. Tarsney (Population Wellbeing Initiative, University of Texas at Austin) and Hayden Wilkinson (Global Priorities Institute, University of Oxford)

The case for longtermism depends on the vast potential scale of the future. But that same vastness also threatens to undermine the case for longtermism: If the future contains infinite value, then many theories of value that support longtermism (e.g., risk-neutral total utilitarianism) seem to imply that no available action is better than any other. And some strategies for avoiding this conclusion (e.g., exponential time discounting) yield views that…

Against the singularity hypothesis – David Thorstad (Global Priorities Institute, University of Oxford)

The singularity hypothesis is a radical hypothesis about the future of artificial intelligence on which self-improving artificial agents will quickly become orders of magnitude more intelligent than the average human. Despite the ambitiousness of its claims, the singularity hypothesis has been defended at length by leading philosophers and artificial intelligence researchers. In this paper, I argue that the singularity hypothesis rests on scientifically implausible growth assumptions. …

How effective is (more) money? Randomizing unconditional cash transfer amounts in the US – Ania Jaroszewicz (University of California San Diego), Oliver P. Hauser (University of Exeter), Jon M. Jachimowicz (Harvard Business School) and Julian Jamison (University of Oxford and University of Exeter)

We randomized 5,243 Americans in poverty to receive a one-time unconditional cash transfer (UCT) of $2,000 (two months’ worth of total household income for the median participant), $500 (half a month’s income), or nothing. We measured the effects of the UCTs on participants’ financial well-being, psychological well-being, cognitive capacity, and physical health through surveys administered one week, six weeks, and 15 weeks later. While bank data show that both UCTs increased expenditures, we find no evidence that…