Evolutionary debunking and value alignment

Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 11-2024

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.

Other working papers

Future Suffering and the Non-Identity Problem – Theron Pummer (University of St Andrews)

I present and explore a new version of the Person-Affecting View, according to which reasons to do an act depend wholly on what would be said for or against this act from the points of view of particular individuals. According to my view, (i) there is a morally requiring reason not to bring about lives insofar as they contain suffering (negative welfare), (ii) there is no morally requiring reason to bring about lives insofar as they contain happiness (positive welfare), but (iii) there is a permitting reason to bring about lives insofar as they…

How important is the end of humanity? Lay people prioritize extinction prevention but not above all other societal issues. – Matthew Coleman (Northeastern University), Lucius Caviola (Global Priorities Institute, University of Oxford) et al.

Human extinction would mean the deaths of eight billion people and the end of humanity’s achievements, culture, and future potential. On several ethical views, extinction would be a terrible outcome. How do people think about human extinction? And how much do they prioritize preventing extinction over other societal issues? Across six empirical studies (N = 2,541; U.S. and China) we find that people consider extinction prevention a global priority and deserving of greatly increased societal resources. …

Intergenerational equity under catastrophic climate change – Aurélie Méjean (CNRS, Paris), Antonin Pottier (EHESS, CIRED, Paris), Stéphane Zuber (CNRS, Paris) and Marc Fleurbaey (CNRS, Paris School of Economics)

Climate change raises the issue of intergenerational equity. As climate change threatens irreversible and dangerous impacts, possibly leading to extinction, the most relevant trade-off may not be between present and future consumption, but between present consumption and the mere existence of future generations. To investigate this trade-off, we build an integrated assessment model that explicitly accounts for the risk of extinction of future generations…