What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
On two arguments for Fanaticism – Jeffrey Sanford Russell (University of Southern California)
Should we make significant sacrifices to ever-so-slightly lower the chance of extremely bad outcomes, or to ever-so-slightly raise the chance of extremely good outcomes? Fanaticism says yes: for every bad outcome, there is a tiny chance of of extreme disaster that is even worse, and for every good outcome, there is a tiny chance of an enormous good that is even better.
Longtermist institutional reform – Tyler M. John (Rutgers University) and William MacAskill (Global Priorities Institute, Oxford University)
There is a vast number of people who will live in the centuries and millennia to come. Even if homo sapiens survives merely as long as a typical species, we have hundreds of thousands of years ahead of us. And our future potential could be much greater than that again: it will be hundreds of millions of years until the Earth is sterilized by the expansion of the Sun, and many trillions of years before the last stars die out. …
Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making – David Thorstad and Andreas Mogensen (Global Priorities Institute, Oxford University)
Even our most mundane decisions have the potential to significantly impact the long-term future, but we are often clueless about what this impact may be. In this paper, we aim to characterize and solve two problems raised by recent discussions of cluelessness, which we term the Problems of Decision Paralysis and the Problem of Decision-Making Demandingness. After reviewing and rejecting existing solutions to both problems, we argue that the way forward is to be found in the distinction between procedural and substantive rationality…