Evolutionary debunking and value alignment

Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 11-2024

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.

Other working papers

High risk, low reward: A challenge to the astronomical value of existential risk mitigation – David Thorstad (Global Priorities Institute, University of Oxford)

Many philosophers defend two claims: the astronomical value thesis that it is astronomically important to mitigate existential risks to humanity, and existential risk pessimism, the claim that humanity faces high levels of existential risk. It is natural to think that existential risk pessimism supports the astronomical value thesis. In this paper, I argue that precisely the opposite is true. Across a range of assumptions, existential risk pessimism significantly reduces the value of existential risk mitigation…

The long-run relationship between per capita incomes and population size – Maya Eden (University of Zurich) and Kevin Kuruc (Population Wellbeing Initiative, University of Texas at Austin)

The relationship between the human population size and per capita incomes has long been debated. Two competing forces feature prominently in these discussions. On the one hand, a larger population means that limited natural resources must be shared among more people. On the other hand, more people means more innovation and faster technological progress, other things equal. We study a model that features both of these channels. A calibration suggests that, in the long run, (marginal) increases in population would…

How to resist the Fading Qualia Argument – Andreas Mogensen (Global Priorities Institute, University of Oxford)

The Fading Qualia Argument is perhaps the strongest argument supporting the view that in order for a system to be conscious, it does not need to be made of anything in particular, so long as its internal parts have the right causal relations to each other and to the system’s inputs and outputs. I show how the argument can be resisted given two key assumptions: that consciousness is associated with vagueness at its boundaries and that conscious neural activity has a particular kind of holistic structure. …