Shutdownable Agents through POST-Agency
Elliott Thornley (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 5-2025
Many fear that future artificial agents will resist shutdown. I present an idea – the POST-Agents Proposal – for ensuring that doesn’t happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST – together with other conditions – implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.
Other working papers
Longtermism, aggregation, and catastrophic risk – Emma J. Curran (University of Cambridge)
Advocates of longtermism point out that interventions which focus on improving the prospects of people in the very far future will, in expectation, bring about a significant amount of good. Indeed, in expectation, such long-term interventions bring about far more good than their short-term counterparts. As such, longtermists claim we have compelling moral reason to prefer long-term interventions. …
In defence of fanaticism – Hayden Wilkinson (Australian National University)
Consider a decision between: 1) a certainty of a moderately good outcome, such as one additional life saved; 2) a lottery which probably gives a worse outcome, but has a tiny probability of a far better outcome (perhaps trillions of blissful lives created). Which is morally better? Expected value theory (with a plausible axiology) judges (2) as better, no matter how tiny its probability of success. But this seems fanatical. So we may be tempted to abandon expected value theory…
Welfare and felt duration – Andreas Mogensen (Global Priorities Institute, University of Oxford)
How should we understand the duration of a pleasant or unpleasant sensation, insofar as its duration modulates how good or bad the experience is overall? Given that we seem able to distinguish between subjective and objective duration and that how well or badly someone’s life goes is naturally thought of as something to be assessed from her own perspective, it seems intuitive that it is subjective duration that modulates how good or bad an experience is from the perspective of an individual’s welfare. …