Towards shutdownable agents via stochastic choice
Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Imperial College, London), Leyton Ho (Brown University) and Louis Thomson (University of Oxford)
GPI Working Paper No. 16-2024
The Incomplete Preferences Proposal (IPP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the IPP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus provide some initial evidence that DReST reward functions could train advanced agents to be USEFUL and NEUTRAL. Our theoretical work suggests that these agents would be useful and shutdownable.
Other working papers
Are we living at the hinge of history? – William MacAskill (Global Priorities Institute, Oxford University)
In the final pages of On What Matters, Volume II, Derek Parfit comments: ‘We live during the hinge of history… If we act wisely in the next few centuries, humanity will survive its most dangerous and decisive period… What now matters most is that we avoid ending human history.’ This passage echoes Parfit’s comment, in Reasons and Persons, that ‘the next few centuries will be the most important in human history’. …
The paralysis argument – William MacAskill, Andreas Mogensen (Global Priorities Institute, Oxford University)
Given plausible assumptions about the long-run impact of our everyday actions, we show that standard non-consequentialist constraints on doing harm entail that we should try to do as little as possible in our lives. We call this the Paralysis Argument. After laying out the argument, we consider and respond to…
Prediction: The long and the short of it – Antony Millner (University of California, Santa Barbara) and Daniel Heyen (ETH Zurich)
Commentators often lament forecasters’ inability to provide precise predictions of the long-run behaviour of complex economic and physical systems. Yet their concerns often conflate the presence of substantial long-run uncertainty with the need for long-run predictability; short-run predictions can partially substitute for long-run predictions if decision-makers can adjust their activities over time. …