What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
A bargaining-theoretic approach to moral uncertainty – Owen Cotton-Barratt (Future of Humanity Institute, Oxford University), Hilary Greaves (Global Priorities Institute, Oxford University)
This paper explores a new approach to the problem of decision under relevant moral uncertainty. We treat the case of an agent making decisions in the face of moral uncertainty on the model of bargaining theory, as if the decision-making process were one of bargaining among different internal parts of the agent, with different parts committed to different moral theories. The resulting approach contrasts interestingly with the extant “maximise expected choiceworthiness”…
Prediction: The long and the short of it – Antony Millner (University of California, Santa Barbara) and Daniel Heyen (ETH Zurich)
Commentators often lament forecasters’ inability to provide precise predictions of the long-run behaviour of complex economic and physical systems. Yet their concerns often conflate the presence of substantial long-run uncertainty with the need for long-run predictability; short-run predictions can partially substitute for long-run predictions if decision-makers can adjust their activities over time. …
Concepts of existential catastrophe – Hilary Greaves (University of Oxford)
The notion of existential catastrophe is increasingly appealed to in discussion of risk management around emerging technologies, but it is not completely clear what this notion amounts to. Here, I provide an opinionated survey of the space of plausibly useful definitions of existential catastrophe. Inter alia, I discuss: whether to define existential catastrophe in ex post or ex ante terms, whether an ex ante definition should be in terms of loss of expected value or loss of potential…