What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
The epistemic challenge to longtermism – Christian Tarsney (Global Priorities Institute, Oxford University)
Longtermists claim that what we ought to do is mainly determined by how our actions might affect the very long-run future. A natural objection to longtermism is that these effects may be nearly impossible to predict— perhaps so close to impossible that, despite the astronomical importance of the far future, the expected value of our present actions is mainly determined by near-term considerations. This paper aims to precisify and evaluate one version of this epistemic objection to longtermism…
How should risk and ambiguity affect our charitable giving? – Lara Buchak (Princeton University)
Suppose we want to do the most good we can with a particular sum of money, but we cannot be certain of the consequences of different ways of making use of it. This paper explores how our attitudes towards risk and ambiguity bear on what we should do. It shows that risk-avoidance and ambiguity-aversion can each provide good reason to divide our money between various charitable organizations rather than to give it all to the most promising one…
Existential risk and growth – Leopold Aschenbrenner (Columbia University)
Human activity can create or mitigate risks of catastrophes, such as nuclear war, climate change, pandemics, or artificial intelligence run amok. These could even imperil the survival of human civilization. What is the relationship between economic growth and such existential risks? In a model of directed technical change, with moderate parameters, existential risk follows a Kuznets-style inverted U-shape. …