Imperfect Recall and AI Delegation
Eric Olav Chen (Global Priorities Institute, University of Oxford), Alexis Ghersengorin (Global Priorities Institute, University of Oxford) and Sami Petersen (Department of Economics, University of Oxford)
GPI Working Paper No. 30-2024
A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.
Other working papers
Social Beneficence – Jacob Barrett (Global Priorities Institute, University of Oxford)
A background assumption in much contemporary political philosophy is that justice is the first virtue of social institutions, taking priority over other values such as beneficence. This assumption is typically treated as a methodological starting point, rather than as following from any particular moral or political theory. In this paper, I challenge this assumption.
Longtermism, aggregation, and catastrophic risk – Emma J. Curran (University of Cambridge)
Advocates of longtermism point out that interventions which focus on improving the prospects of people in the very far future will, in expectation, bring about a significant amount of good. Indeed, in expectation, such long-term interventions bring about far more good than their short-term counterparts. As such, longtermists claim we have compelling moral reason to prefer long-term interventions. …
Numbers Tell, Words Sell – Michael Thaler (University College London), Mattie Toma (University of Warwick) and Victor Yaneng Wang (Massachusetts Institute of Technology)
When communicating numeric estimates with policymakers, journalists, or the general public, experts must choose between using numbers or natural language. We run two experiments to study whether experts strategically use language to communicate numeric estimates in order to persuade receivers. In Study 1, senders communicate probabilities of abstract events to receivers on Prolific, and in Study 2 academic researchers communicate the effect sizes in research papers to government policymakers. When…