Evolutionary debunking and value alignment
Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 11-2024
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.
Other working papers
Should longtermists recommend hastening extinction rather than delaying it? – Richard Pettigrew (University of Bristol)
Longtermism is the view that the most urgent global priorities, and those to which we should devote the largest portion of our current resources, are those that focus on ensuring a long future for humanity, and perhaps sentient or intelligent life more generally, and improving the quality of those lives in that long future. The central argument for this conclusion is that, given a fixed amount of are source that we are able to devote to global priorities, the longtermist’s favoured interventions have…
Longtermism, aggregation, and catastrophic risk – Emma J. Curran (University of Cambridge)
Advocates of longtermism point out that interventions which focus on improving the prospects of people in the very far future will, in expectation, bring about a significant amount of good. Indeed, in expectation, such long-term interventions bring about far more good than their short-term counterparts. As such, longtermists claim we have compelling moral reason to prefer long-term interventions. …
Ethical Consumerism – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford)
I study a static production economy in which consumers have not only preferences over their own consumption but also external, or “ethical”, preferences over the supply of each good. Though existing work on the implications of external preferences assumes price-taking, I show that ethical consumers generically prefer not to act even approximately as price-takers. I therefore introduce a near-Nash equilibrium concept that generalizes the near-Nash equilibria found in literature on strategic foundations of general equilibrium…