What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
Economic inequality and the long-term future – Andreas T. Schmidt (University of Groningen) and Daan Juijn (CE Delft)
Why, if at all, should we object to economic inequality? Some central arguments – the argument from decreasing marginal utility for example – invoke instrumental reasons and object to inequality because of its effects…
AI alignment vs AI ethical treatment: Ten challenges – Adam Bradley (Lingnan University) and Bradford Saad (Global Priorities Institute, University of Oxford)
A morally acceptable course of AI development should avoid two dangers: creating unaligned AI systems that pose a threat to humanity and mistreating AI systems that merit moral consideration in their own right. This paper argues these two dangers interact and that if we create AI systems that merit moral consideration, simultaneously avoiding both of these dangers would be extremely challenging. While our argument is straightforward and supported by a wide range of pretheoretical moral judgments, it has far-reaching…
The unexpected value of the future – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Various philosophers accept moral views that are impartial, additive, and risk-neutral with respect to betterness. But, if that risk neutrality is spelt out according to expected value theory alone, such views face a dire reductio ad absurdum. If the expected sum of value in humanity’s future is undefined—if, e.g., the probability distribution over possible values of the future resembles the Pasadena game, or a Cauchy distribution—then those views say that no real-world option is ever better than any other. And, as I argue…