Will AI Avoid Exploitation?

Adam Bales (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 16-2023, published in Philosophical Studies

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.

Other working papers

Maximal cluelessness – Andreas Mogensen (Global Priorities Institute, Oxford University)

I argue that many of the priority rankings that have been proposed by effective altruists seem to be in tension with apparently reasonable assumptions about the rational pursuit of our aims in the face of uncertainty. The particular issue on which I focus arises from recognition of the overwhelming importance…

The case for strong longtermism – Hilary Greaves and William MacAskill (Global Priorities Institute, University of Oxford)

A striking fact about the history of civilisation is just how early we are in it. There are 5000 years of recorded history behind us, but how many years are still to come? If we merely last as long as the typical mammalian species…

Altruism in governance: Insights from randomized training – Sultan Mehmood, (New Economic School), Shaheen Naseer (Lahore School of Economics) and Daniel L. Chen (Toulouse School of Economics)

Randomizing different schools of thought in training altruism finds that training junior deputy ministers in the utility of empathy renders at least a 0.4 standard deviation increase in altruism. Treated ministers increased their perspective-taking: blood donations doubled, but only when blood banks requested their exact blood type. Perspective-taking in strategic dilemmas improved. Field measures such as orphanage visits and volunteering in impoverished schools also increased, as did their test scores in teamwork assessments…