Evolutionary debunking and value alignment
Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 11-2024
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.
Other working papers
Exceeding expectations: stochastic dominance as a general decision theory – Christian Tarsney (Global Priorities Institute, Oxford University)
The principle that rational agents should maximize expected utility or choiceworthiness is intuitively plausible in many ordinary cases of decision-making under uncertainty. But it is less plausible in cases of extreme, low-probability risk (like Pascal’s Mugging), and intolerably paradoxical in cases like the St. Petersburg and Pasadena games. In this paper I show that, under certain conditions, stochastic dominance reasoning can capture most of the plausible implications of expectational reasoning while avoiding most of its pitfalls…
Time discounting, consistency and special obligations: a defence of Robust Temporalism – Harry R. Lloyd (Yale University)
This paper defends the claim that mere temporal proximity always and without exception strengthens certain moral duties, including the duty to save – call this view Robust Temporalism. Although almost all other moral philosophers dismiss Robust Temporalism out of hand, I argue that it is prima facie intuitively plausible, and that it is analogous to a view about special obligations that many philosophers already accept…
When should an effective altruist donate? – William MacAskill (Global Priorities Institute, Oxford University)
Effective altruism is the use of evidence and careful reasoning to work out how to maximize positive impact on others with a given unit of resources, and the taking of action on that basis. It’s a philosophy and a social movement that is gaining considerable steam in the philanthropic world. For example,…