What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
Once More, Without Feeling – Andreas Mogensen (Global Priorities Institute, University of Oxford)
I argue for a pluralist theory of moral standing, on which both welfare subjectivity and autonomy can confer moral status. I argue that autonomy doesn’t entail welfare subjectivity, but can ground moral standing in its absence. Although I highlight the existence of plausible views on which autonomy entails phenomenal consciousness, I primarily emphasize the need for philosophical debates about the relationship between phenomenal consciousness and moral standing to engage with neglected questions about the nature…
Cassandra’s Curse: A second tragedy of the commons – Philippe Colo (ETH Zurich)
This paper studies why scientific forecasts regarding exceptional or rare events generally fail to trigger adequate public response. I consider a game of contribution to a public bad. Prior to the game, I assume contributors receive non-verifiable expert advice regarding uncertain damages. In addition, I assume that the expert cares only about social welfare. Under mild assumptions, I show that no information transmission can happen at equilibrium when the number of contributors…
Doomsday and objective chance – Teruji Thomas (Global Priorities Institute, Oxford University)
Lewis’s Principal Principle says that one should usually align one’s credences with the known chances. In this paper I develop a version of the Principal Principle that deals well with some exceptional cases related to the distinction between metaphysical and epistemic modality. I explain how this principle gives a unified account of the Sleeping Beauty problem and chance-based principles of anthropic reasoning…