Evolutionary debunking and value alignment

Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 11-2024

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.

Other working papers

Moral demands and the far future – Andreas Mogensen (Global Priorities Institute, Oxford University)

I argue that moral philosophers have either misunderstood the problem of moral demandingness or at least failed to recognize important dimensions of the problem that undermine many standard assumptions. It has been assumed that utilitarianism concretely directs us to maximize welfare within a generation by transferring resources to people currently living in extreme poverty. In fact, utilitarianism seems to imply that any obligation to help people who are currently badly off is trumped by obligations to undertake actions targeted at improving the value…

Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making – David Thorstad and Andreas Mogensen (Global Priorities Institute, Oxford University)

Even our most mundane decisions have the potential to significantly impact the long-term future, but we are often clueless about what this impact may be. In this paper, we aim to characterize and solve two problems raised by recent discussions of cluelessness, which we term the Problems of Decision Paralysis and the Problem of Decision-Making Demandingness. After reviewing and rejecting existing solutions to both problems, we argue that the way forward is to be found in the distinction between procedural and substantive rationality…

Do not go gentle: why the Asymmetry does not support anti-natalism – Andreas Mogensen (Global Priorities Institute, Oxford University)

According to the Asymmetry, adding lives that are not worth living to the population makes the outcome pro tanto worse, but adding lives that are well worth living to the population does not make the outcome pro tanto better. It has been argued that the Asymmetry entails the desirability of human extinction. However, this argument rests on a misunderstanding of the kind of neutrality attributed to the addition of lives worth living by the Asymmetry. A similar misunderstanding is shown to underlie Benatar’s case for anti-natalism.