What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Do not go gentle: why the Asymmetry does not support anti-natalism – Andreas Mogensen (Global Priorities Institute, Oxford University)

According to the Asymmetry, adding lives that are not worth living to the population makes the outcome pro tanto worse, but adding lives that are well worth living to the population does not make the outcome pro tanto better. It has been argued that the Asymmetry entails the desirability of human extinction. However, this argument rests on a misunderstanding of the kind of neutrality attributed to the addition of lives worth living by the Asymmetry. A similar misunderstanding is shown to underlie Benatar’s case for anti-natalism.

Longtermism in an Infinite World – Christian J. Tarsney (Population Wellbeing Initiative, University of Texas at Austin) and Hayden Wilkinson (Global Priorities Institute, University of Oxford)

The case for longtermism depends on the vast potential scale of the future. But that same vastness also threatens to undermine the case for longtermism: If the future contains infinite value, then many theories of value that support longtermism (e.g., risk-neutral total utilitarianism) seem to imply that no available action is better than any other. And some strategies for avoiding this conclusion (e.g., exponential time discounting) yield views that…

Funding public projects: A case for the Nash product rule – Florian Brandl (Stanford University), Felix Brandt (Technische Universität München), Dominik Peters (University of Oxford), Christian Stricker (Technische Universität München) and Warut Suksompong (National University of Singapore)

We study a mechanism design problem where a community of agents wishes to fund public projects via voluntary monetary contributions by the community members. This serves as a model for public expenditure without an exogenously available budget, such as participatory budgeting or voluntary tax programs, as well as donor coordination when interpreting charities as public projects and donations as contributions. Our aim is to identify a mutually beneficial distribution of the individual contributions. …