What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

The Hinge of History Hypothesis: Reply to MacAskill – Andreas Mogensen (Global Priorities Institute, University of Oxford)

Some believe that the current era is uniquely important with respect to how well the rest of human history goes. Following Parfit, call this the Hinge of History Hypothesis. Recently, MacAskill has argued that our era is actually very unlikely to be especially influential in the way asserted by the Hinge of History Hypothesis. I respond to MacAskill, pointing to important unresolved ambiguities in his proposed definition of what it means for a time to be influential and criticizing the two arguments…

Against Anti-Fanaticism – Christian Tarsney (Population Wellbeing Initiative, University of Texas at Austin)

Should you be willing to forego any sure good for a tiny probability of a vastly greater good? Fanatics say you should, anti-fanatics say you should not. Anti-fanaticism has great intuitive appeal. But, I argue, these intuitions are untenable, because satisfying them in their full generality is incompatible with three very plausible principles: acyclicity, a minimal dominance principle, and the principle that any outcome can be made better or worse. This argument against anti-fanaticism can be…

On two arguments for Fanaticism – Jeffrey Sanford Russell (University of Southern California)

Should we make significant sacrifices to ever-so-slightly lower the chance of extremely bad outcomes, or to ever-so-slightly raise the chance of extremely good outcomes? Fanaticism says yes: for every bad outcome, there is a tiny chance of of extreme disaster that is even worse, and for every good outcome, there is a tiny chance of an enormous good that is even better.