Will AI Avoid Exploitation?
Adam Bales (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 16-2023, published in Philosophical Studies
A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.
Other working papers
Population ethics with thresholds – Walter Bossert (University of Montreal), Susumu Cato (University of Tokyo) and Kohei Kamaga (Sophia University)
We propose a new class of social quasi-orderings in a variable-population setting. In order to declare one utility distribution at least as good as another, the critical-level utilitarian value of the former must reach or surpass the value of the latter. For each possible absolute value of the difference between the population sizes of two distributions to be compared, we specify a non-negative threshold level and a threshold inequality. This inequality indicates whether the corresponding threshold level must be reached or surpassed in…
Intergenerational experimentation and catastrophic risk – Fikri Pitsuwan (Center of Economic Research, ETH Zurich)
I study an intergenerational game in which each generation experiments on a risky technology that provides private benefits, but may also cause a temporary catastrophe. I find a folk-theorem-type result on which there is a continuum of equilibria. Compared to the socially optimal level, some equilibria exhibit too much, while others too little, experimentation. The reason is that the payoff externality causes preemptive experimentation, while the informational externality leads to more caution…
‘The only ethical argument for positive 𝛿’? – Andreas Mogensen (Global Priorities Institute, Oxford University)
I consider whether a positive rate of pure intergenerational time preference is justifiable in terms of agent-relative moral reasons relating to partiality between generations, an idea I call discounting for kinship. I respond to Parfit’s objections to discounting for kinship, but then highlight a number of apparent limitations of this…