What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
The case for strong longtermism – Hilary Greaves and William MacAskill (Global Priorities Institute, University of Oxford)
A striking fact about the history of civilisation is just how early we are in it. There are 5000 years of recorded history behind us, but how many years are still to come? If we merely last as long as the typical mammalian species…
Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making – David Thorstad and Andreas Mogensen (Global Priorities Institute, Oxford University)
Even our most mundane decisions have the potential to significantly impact the long-term future, but we are often clueless about what this impact may be. In this paper, we aim to characterize and solve two problems raised by recent discussions of cluelessness, which we term the Problems of Decision Paralysis and the Problem of Decision-Making Demandingness. After reviewing and rejecting existing solutions to both problems, we argue that the way forward is to be found in the distinction between procedural and substantive rationality…
The asymmetry, uncertainty, and the long term – Teruji Thomas (Global Priorities Institute, Oxford University)
The Asymmetry is the view in population ethics that, while we ought to avoid creating additional bad lives, there is no requirement to create additional good ones. The question is how to embed this view in a complete normative theory, and in particular one that treats uncertainty in a plausible way. After reviewing…