What power-seeking theorems do not show
David Thorstad (Vanderbilt University)
GPI Working Paper No. 27-2024
Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.
Other working papers
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists – Elliott Thornley (Global Priorities Institute, University of Oxford)
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that…
Simulation expectation – Teruji Thomas (Global Priorities Institute, University of Oxford)
I present a new argument for the claim that I’m much more likely to be a person living in a computer simulation than a person living in the ground-level of reality. I consider whether this argument can be blocked by an externalist view of what my evidence supports, and I urge caution against the easy assumption that actually finding lots of simulations would increase the odds that I myself am in one.
A non-identity dilemma for person-affecting views – Elliott Thornley (Global Priorities Institute, University of Oxford)
Person-affecting views in population ethics state that (in cases where all else is equal) we’re permitted but not required to create people who would enjoy good lives. In this paper, I present an argument against every possible variety of person- affecting view. The argument takes the form of a dilemma. Narrow person-affecting views must embrace at least one of three implausible verdicts in a case that I call ‘Expanded Non- Identity.’ Wide person-affecting views run into trouble in a case that I call ‘Two-Shot Non-Identity.’ …