Will AI Avoid Exploitation?
Adam Bales (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 16-2023, published in Philosophical Studies
A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.
Other working papers
On the desire to make a difference – Hilary Greaves, William MacAskill, Andreas Mogensen and Teruji Thomas (Global Priorities Institute, University of Oxford)
True benevolence is, most fundamentally, a desire that the world be better. It is natural and common, however, to frame thinking about benevolence indirectly, in terms of a desire to make a difference to how good the world is. This would be an innocuous shift if desires to make a difference were extensionally equivalent to desires that the world be better. This paper shows that at least on some common ways of making a “desire to make a difference” precise, this extensional equivalence fails.
Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Paris School of Economics)
Estimating long-term impacts of actions is important in many areas but the key difficulty is that long-term outcomes are only observed with a long delay. One alternative approach is to measure the effect on an intermediate outcome or a statistical surrogate and then use this to estimate the long-term effect. …
Prediction: The long and the short of it – Antony Millner (University of California, Santa Barbara) and Daniel Heyen (ETH Zurich)
Commentators often lament forecasters’ inability to provide precise predictions of the long-run behaviour of complex economic and physical systems. Yet their concerns often conflate the presence of substantial long-run uncertainty with the need for long-run predictability; short-run predictions can partially substitute for long-run predictions if decision-makers can adjust their activities over time. …