Towards shutdownable agents via stochastic choice
Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University) and Louis Thomson (University of Oxford)
GPI Working Paper No. 16-2024
Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that does not happen. A key part of the IPP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus suggest that DReST reward functions could also train advanced agents to be USEFUL and NEUTRAL, and thereby make these advanced agents useful and shutdownable.
Other working papers
Can an evidentialist be risk-averse? – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Two key questions of normative decision theory are: 1) whether the probabilities relevant to decision theory are evidential or causal; and 2) whether agents should be risk-neutral, and so maximise the expected value of the outcome, or instead risk-averse (or otherwise sensitive to risk). These questions are typically thought to be independent – that our answer to one bears little on our answer to the other. …
Do not go gentle: why the Asymmetry does not support anti-natalism – Andreas Mogensen (Global Priorities Institute, Oxford University)
According to the Asymmetry, adding lives that are not worth living to the population makes the outcome pro tanto worse, but adding lives that are well worth living to the population does not make the outcome pro tanto better. It has been argued that the Asymmetry entails the desirability of human extinction. However, this argument rests on a misunderstanding of the kind of neutrality attributed to the addition of lives worth living by the Asymmetry. A similar misunderstanding is shown to underlie Benatar’s case for anti-natalism.
The freedom of future people – Andreas T Schmidt (University of Groningen)
What happens to liberal political philosophy, if we consider not only the freedom of present but also future people? In this article, I explore the case for long-term liberalism: freedom should be a central goal, and we should often be particularly concerned with effects on long-term future distributions of freedom. I provide three arguments. First, liberals should be long-term liberals: liberal arguments to value freedom give us reason to be (particularly) concerned with future freedom…