Evolutionary debunking and value alignment
Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 11-2024
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.
Other working papers
Altruism in governance: Insights from randomized training – Sultan Mehmood, (New Economic School), Shaheen Naseer (Lahore School of Economics) and Daniel L. Chen (Toulouse School of Economics)
Randomizing different schools of thought in training altruism finds that training junior deputy ministers in the utility of empathy renders at least a 0.4 standard deviation increase in altruism. Treated ministers increased their perspective-taking: blood donations doubled, but only when blood banks requested their exact blood type. Perspective-taking in strategic dilemmas improved. Field measures such as orphanage visits and volunteering in impoverished schools also increased, as did their test scores in teamwork assessments…
How effective is (more) money? Randomizing unconditional cash transfer amounts in the US – Ania Jaroszewicz (University of California San Diego), Oliver P. Hauser (University of Exeter), Jon M. Jachimowicz (Harvard Business School) and Julian Jamison (University of Oxford and University of Exeter)
We randomized 5,243 Americans in poverty to receive a one-time unconditional cash transfer (UCT) of $2,000 (two months’ worth of total household income for the median participant), $500 (half a month’s income), or nothing. We measured the effects of the UCTs on participants’ financial well-being, psychological well-being, cognitive capacity, and physical health through surveys administered one week, six weeks, and 15 weeks later. While bank data show that both UCTs increased expenditures, we find no evidence that…
Is Existential Risk Mitigation Uniquely Cost-Effective? Not in Standard Population Models – Gustav Alexandrie (Global Priorities Institute, University of Oxford) and Maya Eden (Brandeis University)
What socially beneficial causes should philanthropists prioritize if they give equal ethical weight to the welfare of current and future generations? Many have argued that, because human extinction would result in a permanent loss of all future generations, extinction risk mitigation should be the top priority given this impartial stance. Using standard models of population dynamics, we challenge this conclusion. We first introduce a theoretical framework for quantifying undiscounted cost-effectiveness over…