Towards shutdownable agents via stochastic choice
Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Imperial College, London), Leyton Ho (Brown University) and Louis Thomson (University of Oxford)
GPI Working Paper No. 16-2024
The POST-Agents Proposal (PAP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the PAP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus provide some initial evidence that DReST reward functions could train advanced agents to be USEFUL and NEUTRAL. Our theoretical work suggests that these agents would be useful and shutdownable.
Other working papers
Staking our future: deontic long-termism and the non-identity problem – Andreas Mogensen (Global Priorities Institute, Oxford University)
Greaves and MacAskill argue for axiological longtermism, according to which, in a wide class of decision contexts, the option that is ex ante best is the option that corresponds to the best lottery over histories from t onwards, where t is some date far in the future. They suggest that a stakes-sensitivity argument…
Tiny probabilities and the value of the far future – Petra Kosonen (Population Wellbeing Initiative, University of Texas at Austin)
Morally speaking, what matters the most is the far future – at least according to Longtermism. The reason why the far future is of utmost importance is that our acts’ expected influence on the value of the world is mainly determined by their consequences in the far future. The case for Longtermism is straightforward: Given the enormous number of people who might exist in the far future, even a tiny probability of affecting how the far future goes outweighs the importance of our acts’ consequences…
The paralysis argument – William MacAskill, Andreas Mogensen (Global Priorities Institute, Oxford University)
Given plausible assumptions about the long-run impact of our everyday actions, we show that standard non-consequentialist constraints on doing harm entail that we should try to do as little as possible in our lives. We call this the Paralysis Argument. After laying out the argument, we consider and respond to…