What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
Can an evidentialist be risk-averse? – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Two key questions of normative decision theory are: 1) whether the probabilities relevant to decision theory are evidential or causal; and 2) whether agents should be risk-neutral, and so maximise the expected value of the outcome, or instead risk-averse (or otherwise sensitive to risk). These questions are typically thought to be independent – that our answer to one bears little on our answer to the other. …
Population ethical intuitions – Lucius Caviola (Harvard University) et al.
Is humanity’s existence worthwhile? If so, where should the human species be headed in the future? In part, the answers to these questions require us to morally evaluate the (potential) human population in terms of its size and aggregate welfare. This assessment lies at the heart of population ethics. Our investigation across nine experiments (N = 5776) aimed to answer three questions about how people aggregate welfare across individuals: (1) Do they weigh happiness and suffering symmetrically…
Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Paris School of Economics)
Estimating long-term impacts of actions is important in many areas but the key difficulty is that long-term outcomes are only observed with a long delay. One alternative approach is to measure the effect on an intermediate outcome or a statistical surrogate and then use this to estimate the long-term effect. …