What power-seeking theorems do not show

David Thorstad (Vanderbilt University)

GPI Working Paper No. 27-2024

Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Other working papers

In defence of fanaticism – Hayden Wilkinson (Australian National University)

Consider a decision between: 1) a certainty of a moderately good outcome, such as one additional life saved; 2) a lottery which probably gives a worse outcome, but has a tiny probability of a far better outcome (perhaps trillions of blissful lives created). Which is morally better? Expected value theory (with a plausible axiology) judges (2) as better, no matter how tiny its probability of success. But this seems fanatical. So we may be tempted to abandon expected value theory…

Simulation expectation – Teruji Thomas (Global Priorities Institute, University of Oxford)

I present a new argument for the claim that I’m much more likely to be a person living in a computer simulation than a person living in the ground-level of reality. I consider whether this argument can be blocked by an externalist view of what my evidence supports, and I urge caution against the easy assumption that actually finding lots of simulations would increase the odds that I myself am in one.

Population ethics with thresholds – Walter Bossert (University of Montreal), Susumu Cato (University of Tokyo) and Kohei Kamaga (Sophia University)

We propose a new class of social quasi-orderings in a variable-population setting. In order to declare one utility distribution at least as good as another, the critical-level utilitarian value of the former must reach or surpass the value of the latter. For each possible absolute value of the difference between the population sizes of two distributions to be compared, we specify a non-negative threshold level and a threshold inequality. This inequality indicates whether the corresponding threshold level must be reached or surpassed in…