Towards shutdownable agents via stochastic choice
Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University) and Louis Thomson (University of Oxford)
GPI Working Paper No. 16-2024
Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that does not happen. A key part of the IPP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus suggest that DReST reward functions could also train advanced agents to be USEFUL and NEUTRAL, and thereby make these advanced agents useful and shutdownable.
Other working papers
Doomsday rings twice – Andreas Mogensen (Global Priorities Institute, Oxford University)
This paper considers the argument according to which, because we should regard it as a priori very unlikely that we are among the most important people who will ever exist, we should increase our confidence that the human species will not persist beyond the current historical era, which seems to represent…
How should risk and ambiguity affect our charitable giving? – Lara Buchak (Princeton University)
Suppose we want to do the most good we can with a particular sum of money, but we cannot be certain of the consequences of different ways of making use of it. This paper explores how our attitudes towards risk and ambiguity bear on what we should do. It shows that risk-avoidance and ambiguity-aversion can each provide good reason to divide our money between various charitable organizations rather than to give it all to the most promising one…
Consciousness makes things matter – Andrew Y. Lee (University of Toronto)
This paper argues that phenomenal consciousness is what makes an entity a welfare subject, or the kind of thing that can be better or worse off. I develop and motivate this view, and then defend it from objections concerning death, non-conscious entities that have interests (such as plants), and conscious subjects that necessarily have welfare level zero. I also explain how my theory of welfare subjects relates to experientialist and anti-experientialist theories of welfare goods.