On the desire to make a difference
Hilary Greaves, William MacAskill, Andreas Mogensen and Teruji Thomas (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 16-2022, forthcoming in Philosophical Studies
True benevolence is, most fundamentally, a desire that the world be better. It is natural and common, however, to frame thinking about benevolence indirectly, in terms of a desire to make a difference to how good the world is. This would be an innocuous shift if desires to make a difference were extensionally equivalent to desires that the world be better. This paper shows that at least on some common ways of making a “desire to make a difference” precise, this extensional equivalence fails. Where it fails, “difference-making preferences” run counter to the ideals of benevolence. In particular, in the context of decision making under uncertainty, coupling a “difference-making” framing in a natural way with risk aversion leads to preferences that violate stochastic dominance, and that lead to a strong form of collective defeat, from the point of view of betterness. Difference-making framings and true benevolence are not strictly mutually inconsistent, but agents seeking to implement true benevolence must take care to avoid the various pitfalls that we outline.
Other working papers
Towards shutdownable agents via stochastic choice – Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University), and Louis Thomson (University of Oxford)
Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn’t happen. A key part of the IPP is using a novel ‘Discounted REward for Same-Length Trajectories (DREST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics…
A paradox for tiny probabilities and enormous values – Nick Beckstead (Open Philanthropy Project) and Teruji Thomas (Global Priorities Institute, Oxford University)
We show that every theory of the value of uncertain prospects must have one of three unpalatable properties. Reckless theories recommend risking arbitrarily great gains at arbitrarily long odds for the sake of enormous potential; timid theories recommend passing up arbitrarily great gains to prevent a tiny increase in risk; nontransitive theories deny the principle that, if A is better than B and B is better than C, then A must be better than C.
Evolutionary debunking and value alignment – Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments…