What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

Intergenerational experimentation and catastrophic risk – Fikri Pitsuwan (Center of Economic Research, ETH Zurich)

I study an intergenerational game in which each generation experiments on a risky technology that provides private benefits, but may also cause a temporary catastrophe. I find a folk-theorem-type result on which there is a continuum of equilibria. Compared to the socially optimal level, some equilibria exhibit too much, while others too little, experimentation. The reason is that the payoff externality causes preemptive experimentation, while the informational externality leads to more caution…

Population ethics with thresholds – Walter Bossert (University of Montreal), Susumu Cato (University of Tokyo) and Kohei Kamaga (Sophia University)

We propose a new class of social quasi-orderings in a variable-population setting. In order to declare one utility distribution at least as good as another, the critical-level utilitarian value of the former must reach or surpass the value of the latter. For each possible absolute value of the difference between the population sizes of two distributions to be compared, we specify a non-negative threshold level and a threshold inequality. This inequality indicates whether the corresponding threshold level must be reached or surpassed in…