What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
Dynamic public good provision under time preference heterogeneity – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford)
I explore the implications of time preference heterogeneity for the private funding of public goods. The assumption that players use a common discount rate is knife-edge: relaxing it yields substantially different equilibria, for two reasons. First, time preference heterogeneity motivates intertemporal polarization, analogous to the polarization seen in a static public good game. In the simplest settings, more patient players spend nothing early in time and less patient players spending nothing later. Second…
Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Paris School of Economics)
Estimating long-term impacts of actions is important in many areas but the key difficulty is that long-term outcomes are only observed with a long delay. One alternative approach is to measure the effect on an intermediate outcome or a statistical surrogate and then use this to estimate the long-term effect. …
How to neglect the long term – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Consider longtermism: the view that, at least in some of the most important decisions facing agents today, which options are morally best is determined by which are best for the long-term future. Various critics have argued that longtermism is false—indeed, that it is obviously false, and that we can reject it on normative grounds without close consideration of certain descriptive facts. In effect, it is argued, longtermism would be false even if real-world agents had promising means…