What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
How to neglect the long term – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Consider longtermism: the view that, at least in some of the most important decisions facing agents today, which options are morally best is determined by which are best for the long-term future. Various critics have argued that longtermism is false—indeed, that it is obviously false, and that we can reject it on normative grounds without close consideration of certain descriptive facts. In effect, it is argued, longtermism would be false even if real-world agents had promising means…
Critical-set views, biographical identity, and the long term – Elliott Thornley (Global Priorities Institute, University of Oxford)
Critical-set views avoid the Repugnant Conclusion by subtracting some constant from the welfare score of each life in a population. These views are thus sensitive to facts about biographical identity: identity between lives. In this paper, I argue that questions of biographical identity give us reason to reject critical-set views and embrace the total view. I end with a practical implication. If we shift our credences towards the total view, we should also shift our efforts towards ensuring that humanity survives for the long term.
A Fission Problem for Person-Affecting Views – Elliott Thornley (Global Priorities Institute, University of Oxford)
On person-affecting views in population ethics, the moral import of a person’s welfare depends on that person’s temporal or modal status. These views typically imply that – all else equal – we’re never required to create extra people, or to act in ways that increase the probability of extra people coming into existence. In this paper, I use Parfit-style fission cases to construct a dilemma for person-affecting views: either they forfeit their seeming-advantages and face fission analogues…