What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
The Hinge of History Hypothesis: Reply to MacAskill – Andreas Mogensen (Global Priorities Institute, University of Oxford)
Some believe that the current era is uniquely important with respect to how well the rest of human history goes. Following Parfit, call this the Hinge of History Hypothesis. Recently, MacAskill has argued that our era is actually very unlikely to be especially influential in the way asserted by the Hinge of History Hypothesis. I respond to MacAskill, pointing to important unresolved ambiguities in his proposed definition of what it means for a time to be influential and criticizing the two arguments…
The evidentialist’s wager – William MacAskill, Aron Vallinder (Global Priorities Institute, Oxford University) Caspar Österheld (Duke University), Carl Shulman (Future of Humanity Institute, Oxford University), Johannes Treutlein (TU Berlin)
Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act …
In search of a biological crux for AI consciousness – Bradford Saad (Global Priorities Institute, University of Oxford)
Whether AI systems could be conscious is often thought to turn on whether consciousness is closely linked to biology. The rough thought is that if consciousness is closely linked to biology, then AI consciousness is impossible, and if consciousness is not closely linked to biology, then AI consciousness is possible—or, at any rate, it’s more likely to be possible. A clearer specification of the kind of link between consciousness and biology that is crucial for the possibility of AI consciousness would help organize inquiry into…