Input to UN Interim Report on Governing AI for Humanity

This document was written by Bradford Saad, with assistance from Andreas Mogensen and Jeff Sebo. Jakob Lohmar provided valuable research assistance. The document benefited from discussion with or feedback from Frankie Andersen-Wood, Adam Bales, Ondrej Bajgar, Thomas Houlden, Jojo Lee, Toby Ord, Teruji Thomas, Elliott Thornley and Eva Vivalt.

Other papers

Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Paris School of Economics)

Estimating long-term impacts of actions is important in many areas but the key difficulty is that long-term outcomes are only observed with a long delay. One alternative approach is to measure the effect on an intermediate outcome or a statistical surrogate and then use this to estimate the long-term effect. …

Shutdownable Agents through POST-Agency – Elliott Thornley (Global Priorities Institute, University of Oxford)

Many fear that future artificial agents will resist shutdown. I present an idea – the POST-Agents Proposal – for ensuring that doesn’t happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST – together with other conditions – implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.

Tough enough? Robust satisficing as a decision norm for long-term policy analysis – Andreas Mogensen and David Thorstad (Global Priorities Institute, Oxford University)

This paper aims to open a dialogue between philosophers working in decision theory and operations researchers and engineers whose research addresses the topic of decision making under deep uncertainty. Specifically, we assess the recommendation to follow a norm of robust satisficing when making decisions under deep uncertainty in the context of decision analyses that rely on the tools of Robust Decision Making developed by Robert Lempert and colleagues at RAND …