Will AI Avoid Exploitation?
Adam Bales (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 16-2023, published in Philosophical Studies
A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.
Other working papers
Staking our future: deontic long-termism and the non-identity problem – Andreas Mogensen (Global Priorities Institute, Oxford University)
Greaves and MacAskill argue for axiological longtermism, according to which, in a wide class of decision contexts, the option that is ex ante best is the option that corresponds to the best lottery over histories from t onwards, where t is some date far in the future. They suggest that a stakes-sensitivity argument…
Population ethical intuitions – Lucius Caviola (Harvard University) et al.
Is humanity’s existence worthwhile? If so, where should the human species be headed in the future? In part, the answers to these questions require us to morally evaluate the (potential) human population in terms of its size and aggregate welfare. This assessment lies at the heart of population ethics. Our investigation across nine experiments (N = 5776) aimed to answer three questions about how people aggregate welfare across individuals: (1) Do they weigh happiness and suffering symmetrically…
Welfare and felt duration – Andreas Mogensen (Global Priorities Institute, University of Oxford)
How should we understand the duration of a pleasant or unpleasant sensation, insofar as its duration modulates how good or bad the experience is overall? Given that we seem able to distinguish between subjective and objective duration and that how well or badly someone’s life goes is naturally thought of as something to be assessed from her own perspective, it seems intuitive that it is subjective duration that modulates how good or bad an experience is from the perspective of an individual’s welfare. …