Towards shutdownable agents via stochastic choice

Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University) and Louis Thomson (University of Oxford)

GPI Working Paper No. 16-2024

Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn’t happen. A key part of the IPP is using a novel ‘Discounted REward for Same-Length Trajectories (DREST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DREST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus suggest that DREST reward functions could also train advanced agents to be USEFUL and NEUTRAL, and thereby make these advanced agents useful and shutdownable.

Other working papers

Choosing the future: Markets, ethics and rapprochement in social discounting – Antony Millner (University of California, Santa Barbara) and Geoffrey Heal (Columbia University)

This paper provides a critical review of the literature on choosing social discount rates (SDRs) for public cost-benefit analysis. We discuss two dominant approaches, the first based on market prices, and the second based on intertemporal ethics. While both methods have attractive features, neither is immune to criticism. …

Tough enough? Robust satisficing as a decision norm for long-term policy analysis – Andreas Mogensen and David Thorstad (Global Priorities Institute, Oxford University)

This paper aims to open a dialogue between philosophers working in decision theory and operations researchers and engineers whose research addresses the topic of decision making under deep uncertainty. Specifically, we assess the recommendation to follow a norm of robust satisficing when making decisions under deep uncertainty in the context of decision analyses that rely on the tools of Robust Decision Making developed by Robert Lempert and colleagues at RAND …

The cross-sectional implications of the social discount rate – Maya Eden (Brandeis University)

How should policy discount future returns? The standard approach to this normative question is to ask how much society should care about future generations relative to people alive today. This paper establishes an alternative approach, based on the social desirability of redistributing from the current old to the current young. …