Shutdownable Agents through POST-Agency
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 5-2025
Many fear that future artificial agents will resist shutdown. I present an idea – the POST-Agents Proposal – for ensuring that doesn’t happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST – together with other conditions – implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.
Other working papers
How important is the end of humanity? Lay people prioritize extinction prevention but not above all other societal issues. – Matthew Coleman (Northeastern University), Lucius Caviola (Global Priorities Institute, University of Oxford) et al.
Human extinction would mean the deaths of eight billion people and the end of humanity’s achievements, culture, and future potential. On several ethical views, extinction would be a terrible outcome. How do people think about human extinction? And how much do they prioritize preventing extinction over other societal issues? Across six empirical studies (N = 2,541; U.S. and China) we find that people consider extinction prevention a global priority and deserving of greatly increased societal resources. …
Existential risks from a Thomist Christian perspective – Stefan Riedener (University of Zurich)
Let’s say with Nick Bostrom that an ‘existential risk’ (or ‘x-risk’) is a risk that ‘threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development’ (2013, 15). There are a number of such risks: nuclear wars, developments in biotechnology or artificial intelligence, climate change, pandemics, supervolcanos, asteroids, and so on (see e.g. Bostrom and Ćirković 2008). …
How effective is (more) money? Randomizing unconditional cash transfer amounts in the US – Ania Jaroszewicz (University of California San Diego), Oliver P. Hauser (University of Exeter), Jon M. Jachimowicz (Harvard Business School) and Julian Jamison (University of Oxford and University of Exeter)
We randomized 5,243 Americans in poverty to receive a one-time unconditional cash transfer (UCT) of $2,000 (two months’ worth of total household income for the median participant), $500 (half a month’s income), or nothing. We measured the effects of the UCTs on participants’ financial well-being, psychological well-being, cognitive capacity, and physical health through surveys administered one week, six weeks, and 15 weeks later. While bank data show that both UCTs increased expenditures, we find no evidence that…
- « Previous
- 1
- …
- 35
- 36
- 37