Johanna Thoma | Risk imposition by artificial agents: the moral proxy problem

Global Priorities Seminar

Presented as part of the Global Priorities Seminar series

19 June 2020

JOHANNA THOMA: (00:09) Thanks so much for the introduction and for the invitation. I'm really happy to speak to this group. My normal research is mostly about decision theory and I've been particularly interested in the problem of risk aversion. And what this work in progress is, is an attempt to say something interesting about debates about the design of artificial agents. But I'm relatively new to those debates. I'm trying to contribute something from the perspective of decision theory. And as I said, it's very much a work in progress. I'm really interested in hearing your questions and comments. At the end of this, perhaps I should say, I'm genuinely puzzled about some of the problems that I'm going to look at, so I'm interested in constructive debate about it.

(01:03) So I've called the paper The Moral Proxy Problem and mainly what it's about is the question of whether artificial agents should be programmed to be risk neutral, or rather, whether we should allow them to be risk averse in certain circumstances. And it turns out that this has to do to some extent with the framing question which arises because it's not entirely clear whether we should think of artificial agents as moral proxies for low-level agents like individual users of artificial agents, so making decisions on behalf of them, or as moral proxies for high-level agents, such as the designers of the artificial agents or regulators representing society at large. And it turns out that the question of whether we should allow for risk aversion in the design of artificial agents depends to some extent on our answer to this question. So it has practical relevance, this framing question, in the context of risk.

02:04) Okay. So to start… Just a few words about what I mean by artificial agents and to frame the problem. So by artificial agents, I mean entities that don't just process information, but make decisions. So in that sense, they are agents. They don't just form cognitive attitudes, but make choices as well. They are artificial in that they're artifacts built by humans. And I have a weak notion of autonomy in mind here. So they don't have continuous human input and moreover, the information on the basis of which they make choices, isn't just what the programmer put into the agent, but rather they go on and collect information themselves.

(02:50) So one example that has gotten disproportionate attention by philosophers is the example of self-driving cars. So they use advanced sensor technology to perceive things about their surroundings and make predictions about how objects in their vicinity are going to use, going to move and then make decisions about how to move itself. But there are lots of other interesting artificial agents. One that's actually used quite a bit more already are artificial trading agents that perform trading operations and markets. There are nursebots that are being developed or even robot teachers. So we can think of all of these as making decisions and collecting information in their surroundings, responding to them.

(03:41) And the ambition with these artificial agents is to make decisions roughly as good or better than those that humans would make. So we would only use them if it is cheaper or saves us time to have an artificial agent perform something that normally a human agent would perform. And for that to be the case, they would have to make decisions roughly as good as humans, or in some cases, the ambition is that they make better decisions. So the ambition of self-driving cars is, to some extent, that they drive more safely than human agents would.

(04:14) Now a challenge that has received a lot of attention from moral philosophers is how to program these agents to make moral choices, because many artificial agents will have to make morally difficult and morally significant choices. There's fairly large literature already looking at scenarios that look a bit like traditional trolley problems for self-driving cars, but it's been pointed out that, of course, artificial agents face morally significant choices in much more mundane situations as well. Self-driving cars have to make trade-offs between safety and mobility, for instance, or a nursebots has to make decisions that impact a patient’s health, robot teachers are going to be given children into their care. So a lot of the more mundane choices these artificial agents make are going to be morally significant. So how should we program them to make moral choices?

(05:10) And the observation, that is, the most specific starting point for my talk is that many of these morally significant choices are actually going to be made under conditions of uncertainty as well, where we don't know for sure what the consequences of a particular choice made by an artificial agent is going to be and the artificial agent itself can only determine probabilistically what's going to happen if it chooses in a particular way. So I'm going to look at a cases where we can or the artificial agent can assign probabilities, but doesn't know for certain what's going to happen if it makes a particular choice.

(05:51) And the specific question I'm interested in is, how should they be designed to make choices under this kind of uncertainty, and in particular, should they be programmed to be risk neutral or not. I’ll be saying a little bit more precisely in a moment, what I mean by risk neutrality.

(06:10) So what I want to argue for the rest of this talk, mainly two things. So I want to start by making a point about standard approaches to the design of artificial agents under the conditions of uncertainty. And that is that the standard approach to the design of artificial agents, as I understand it, implies risk neutrality regarding values or goals satisfaction. But in fact, risk aversion is neither intuitively irrational nor intuitively always immoral, and in fact, it's common in human agents. And so what follows from this is that if we design artificial agents in this risk neutral way, that is actually a substantial commitment and it will mean that artificial agents will act substantially differently from even what considered human judgment would say. So this won't be an instance of an artificial agent acting differently from a human because they make a better decision, it will be a substantively different decision when the human moral judgment could have been a permissible choice. So that's the first point.

(07:27) And the second one is the moral proxy problem more specifically. So we can understand artificial agents as moral proxies for human agents. So they choose on behalf of human agents, pursuing their interests. But we need to decide whether they are moral proxies for low-level agents such as individual users, or high-level agents, such as designers and distributors, for instance.

(07:54) And in the context of uncertainty, I'm going to argue that this has special practical relevance and that's because moral proxies for low-level agents might need to be risk averse in the individual choices that they face, whereas moral proxies for high-level agents should be risk neutral in individual choices because this has almost certainly better outcomes in the aggregate. So there's a practical relevance to the moral proxy problem in this context and I'm going to just present some preliminary thoughts on addressing this problem but will suggest that each of them comes with sacrifices. So in that sense at least, it's important to realize that there is this problem and it's a hard one.

(08:47) Okay. So I'll start with the observation about the standard approach to the programming of artificial agents in contexts of uncertainty. So the orthodox theory of choice under uncertainty that will be familiar to economists as well is Expected Utility Theory and it's also the theory that designers of artificial agents eventually aim to implement as a theory of choice for artificial agents under uncertainty.

(09:19) And moral philosophers also often defer to this theory when it comes to conditions of uncertainty.

(09:26) So here's just a simple statement of expected utility theory. So it says that agents ought to choose an act with maximum expected utility and the following formula gives you the expected utility of an act ɑ. So we think of an act as leading to various potential outcomes. So the outcomes here are represented by x and we can assign a probability to each of the outcomes if you take a particular action and you can also assign a utility to each of those outcomes. And the expected utility is just the probability weighted sum of the utilities of the various different outcomes that your act might lead to. And expected utility theory asks you to maximize that probability weighted sum, choose the act or one of the acts with maximum expected utility.

(10:23) There are some worries about the application of expected utility theory in the design of autonomous artificial agents. The first one that's discussed quite a bit already in this literature has to do with the insensitivity of expected utility theory to how ex ante risks are distributed between people when we're imposing risks on other people.

(10:49) So for instance, suppose that you have to decide between two people who have to perform some unpleasant task. They are each equally undeserving of having to do this. Expected utility theory would have you be indifferent between choosing one definite person to do the task or flipping a coin and thus giving each of them an equal chance of getting out of this unpleasant task. So there seems to be something attractive about equalizing ex ante risks of harm between people. But expected utility theory is insensitive to that and this is something that contractualists in particular have been worried about.

(11:29) But I'm interested in a different problem here. So I'll try, as much as possible to abstract away from the first kind of problem, although it's something that we can… The interactions between them is something we can potentially discuss in the Q&A. The potential problem that I'm more interested in is the following: So expected utility theory has been controversial, in particular in recent years, at least under one common interpretation – and I show why I'm saying this in a moment. Controversial because it implies risk neutrality in the pursuit of goals and values and rules out what we'll call ‘pure’ risk aversion.

(12:12) Okay. So let me explain this point. So what is risk aversion? Roughly, risk aversion in the attainment of some good, manifests in choosing an option with a lower expectation of that good because the potential outcomes are less spread out. So because you have a better sense of what's going to happen, if you pick a safer option, you're willing to give up some expected value.

(12:46) So one simple example of this is choosing a sure £100 over a 50% chance of receiving £300. So the monetary value of the 50% chance of receiving £300 is £150. So that's higher, but you might, nevertheless, choose the certain £100 because you get it for sure. So this is a simple example of risk aversion.

(13:12) So I mentioned that there's a problem with risk aversion and expected utility theory but the economists in particular might straightaway think that, well, there is a way expected utility theory can capture risk aversion, namely through decreasing marginal utility in the good. So to have decreasing marginal utility in money, for instance, means that the more money you already have, the less additional utility you get from the next unit. And if you allow for this, then you can allow for risk aversion.

(13:46) So this is just a simple example: Suppose that the u(m) = √m . So you can see this in this graph then. From the sure £100 pounds you get utility 10. Now with this function, it's the case that the additional utility you would get from moving from 100 to 300 is just 7.32, whereas, the utility that you would lose from going from 100 to 0 is 10. So that utility difference is actually a bigger even though the monetary difference is smaller.

(14:23) And what this means is that the expected utility of the gamble can actually be lower than the expected utility of the of the safe value even though the expected monetary value is higher. So in this case, the expected utility of the gamble is 8.66, whereas, the expected utility of the safe choice is 10. So expected utility theory, in this sense, can capture risk aversion.

(14:49) But whether this actually adequately captures ordinary risk aversion depends in part on what we understand utility to be.

(14:58) And there are two common understandings of utility in the literature. One is the substantive understanding, which takes utility to be a cardinal measure of degrees of goal satisfaction or a cardinal measure of value.

(15:14) So if we look back at our example earlier, what this would mean in the case of money when you've got decreasing marginal utility, that would mean that the more money you already have, the less valuable an additional pound is to you, or the less an additional pound serves your goal. So you'd have to interpret it in that way. And so when you actually value money in this decreasing way, then you can explain risk aversion.

(15:46) But what you can't explain is what we'll call ‘pure risk aversion’, which goes beyond that. So what expected utility theory under this interpretation now rules out is risk aversion with regard to value or goal satisfaction itself. So if utility just is a cardinal measure of value or goal satisfaction, then you can't have decreasing marginal utility in value or goal satisfaction. You have to maximize the expectation of value. And so you can't be risk averse with regard to value or degrees of goal satisfaction, itself. And so what this means, in the in the monetary gamble example, for instance, if it were the case that actually you value each pound the same, so you don't have decreasing marginal value that you get from money. In fact, you value money linearly, expected utility theory wouldn't let you be risk averse. So it rules out pure risk aversion, which is risk aversion with regard to value or goal satisfaction itself.

(16:53) There's another understanding of utility which does allow for pure risk aversion, namely formal understanding. So if we think utility is just whatever represents an agent's preferences provided these are consistent with the standard expected utility axioms, there's nothing that rules out saying that you've got decreasing marginal utility in value or decreasing marginal utility in degrees of goal satisfaction, because all that would mean is that you've got risk averse preferences that can only be represented in terms of decreasing marginal utility in value. So this understanding does allow for pure risk aversion.

(17:32) Now, the point I want to make about standard approaches to the design of artificial agents is that the substantive understanding of utility seems to be common both in the AI literature but also amongst moral philosophers who often defer to expected utility theory. And so if the substantive understanding of utility is adopted, that means that we're ruling out pure risk aversion. We are imposing risk neutrality with regard to value itself.

(18:05) So here's just some evidence that this is standardly presumed in the literature on the design of artificial agents. The standard approach is to first specify the goals of the system in an objective function or what is sometimes called a reward function or an evaluation function, which measures the degree to which the goals of the system are satisfied.

(18:30) And so just to take a very simple example, you can think of an artificial nutritional assistant programmed for high caloric intake. So the objective function there might just be a linear function of calories.

(18:43) But standardly there would be multiple goals and an objective function that specifies their relative importance. So in the case of a self-driving car that might be: Reach your destination fast, use little fuel, avoid accidents, minimize harm in case of unavoidable accident and so on.

(19:00) And then the standard approach is to programme directly or train the agent to maximize the expectation of that function. So what this essentially means is that we're setting the objective function equal to the utility function and then applying expected utility theory. So this amounts to a substantive understanding of utility.

(19:25) So here are just some quotes from one of the most popular textbooks on the design of artificial agents.

“For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the set of percept sequence and whatever built-in knowledge the agent has.”

“An agent's utility function is essentially an internalization of the performance measure.”

(19:52) And so in the literature on the design of artificial agents, as I understand, it's usually assumed that the objective that we maximize the expectation… the goal is to maximize the expectation of the objective function, which amounts to a substantive understanding of utility with an expected utility theory which rules out pure risk aversion.

(20:17) And in moral philosophy we often find the same assumption. So expected utility theory is now also often taken to imply expected value maximization. There are lots of examples of this here to that in the debate on the design of artificial agents, specifically, that make this expected value maximization assumption.

(20:39) But the crucial point here is just that the substantive reading of utility permits us to risk neutrality regarding value or degree of goal satisfaction. And so by adopting the substantive reading of utility, the literature on the design of artificial agents is committing itself to risk neutrality.

(21:03) Yet intuitively, such risk neutrality is neither rationally nor morally required. So, I'll present some examples that are designed to bring out this intuition. They're going to be highly stylized examples because the best way to bring out this intuition is to have an example where the objective function measuring degrees of goal satisfaction or value is intuitively linear. And then if it's intuitively nevertheless permissible to be risk averse, then that would be an example of pure risk aversion.

(21:39) So I'll present a few examples like that of situations that might, in a more complex way, be faced by artificial agents. But maybe it would be good to just point out at this point that it's also accepted by various, or has been accepted in the past, by various moral philosophers that risk neutrality is not rationally or morally required. So just think about Rawls’ position on how somebody should choose behind the veil of ignorance that is substantively risk averse to quite an extreme extent.

(22:17) In the more applied moral philosophy literature in recent years, people have talked a lot about a precautionary principle. So anyone who's in favour of that will think that substantive risk aversion is permissible rationally and morally. And of course here I'm, to some extent, I’m motivated by the recent literature, in the decision theory literature, that questions whether expected utility maximization is really rationally required and looks at the potential normative plausibility of generalisations of expected utility theory.

(22:58) Okay. So here is the first example. It's slightly along the lines of a trolley problem, so I hope you forgive me for that. I'll look at two different examples in a moment. So it's an example of crash optimization. So suppose that a self-driving bus is heading towards a group of 10 people and a lethal collision is unavoidable. But the car could swerve left or right to reduce the number of fatalities.

(23:26) So in this case, as long as we hold other potentially relevant factors equal, it's plausible to suppose that an objective function measuring value or the goals of the agent should be linear in lives saved. So the idea here is that each life saved is equally valuable, at least if you hold various other factors equal.

(23:53) Now the specific choice. Suppose the specific choice faced by the self-driving bus is to either swerve left to save one person for sure or to serve right and have a 50% chance of saving nobody and a 50% chance of saving three.

(24:11) Now, a risk averse agent might prefer swerving left and saving one person for sure. And I submit that's not obviously irrational or morally wrong. So the thinking would be, you'd rather be safe than sorry and save the one person for sure. You wouldn't want to risk saving nobody at all on the 50% chance of saving more people.

(24:35) Now, if you don't find this example persuasive, we can also change the numbers to make the case that the expected number of lives saved is closer. So in the example as I gave it here, the expected number of lives saved by going right is 1.5, so a risk neutral agent would swerve right. Now suppose the chance of saving three is actually only 34% and the chance of saving nobody is 66%. In that case, the expected value is quite close. A risk neutral agent would still go right. But here, you don't have to be particularly risk averse to prefer swerving left instead.

(25:16) So just a quick side remark. I assumed here that no randomization is possible. And this is to assume away the potential concern for equalizing chances of survival in this case where you could randomize between the two options in order to give everybody 1/3 chance to be saved. So we'll assume randomization is not possible.

(25:42) So here's another example that is structurally exactly the same, but just to show that these kinds of cases might not only occur for self-driving cars, which is the standard example a lot of moral philosophers have used in recent years. So suppose an artificial rescue coordination centre has to decide between sending a rescue team to one of two fatal accidents involving several victims, some of which might be saved if the rescue team gets to them. But there's only one rescue team.

(26:12) Again, it seems plausible that the objective function should be linear in lives saved if we hold other factors equal.

(26:21) And the choice is between saving one for sure and a 50% chance of saving nobody and a 50% chance of saving three.

(26:30) And again, the risk that a human agent might prefer sending the team to Accident 1 and this doesn't seem obviously irrational or morally wrong. Or if you think it is so, there are… You can play around with the numbers to potentially find a case that you might find not obviously irrational or morally wrong, but is nevertheless an instance of risk aversion.

(26:55) Both of the examples we just looked at, you might think, “Well, these are rare events. They won't be decision situations that artificial agents are going to face very often.” So here is an example that is potentially one that might be faced by artificial agents more frequently.

(27:14) So it's again, an example of a self-driving car. Suppose the self-driving car is driving on the left lane of a dual carriageway and it's approaching one person on the side of the road next to a broken-down car. And at the same time, there's a car with two passengers approaching from behind on the right lane and the car estimates that there is a small chance that the car behind is approaching fast enough to fatally crash into it should it change lanes. But there's also a small, but three times as big a chance that the person by the side of the road is going to stumble into the road at the wrong moment and be fatally hit.

(27:51) So in this case, too, I think it's quite plausible that the objective function is linear in accidental killings. The agent should show equal concern for any person they might accidentally kill, other things being equal.

(28:09) Suppose the two options are: If you don't change lanes, there's a 0.3% chance of killing the one person who might stumble into the road. The expected number of fatalities here is 0.003. Or you might change lane, in which case there's a 0.1 chance of killing two people. The expected fatalities here are 0.002.

(28:36) And here again, a risk averse agent might prefer not to change lanes, even though that has a higher number of expected fatalities, namely 0.003, because of considerations of risk aversion. So the thought would be, you wouldn't want to risk actually killing two people. And the decreased chance of that happening doesn't somehow make it worthwhile to take the risk of killing two people. So one very simple choice that would give you that result is if you wanted to maximize the worst possible outcome, which if you don't change lane is killing one. But it would be worse, namely killing two, if you did change lane. But you don't have to be so extremely risk averse. It might just be that you give greater weight in your decision making to the worst possible outcomes and then the result of that might be that you shouldn't change lanes, even though that has a higher expected number of fatalities.

(29:39) Okay, so what was the takeaway from these cases? What were they meant to illustrate? The stylized cases were chosen to illustrate the intuitive permissibility of pure risk aversion because they feature an uncontroversially linear objective or value function, but if nevertheless agents there are intuitively not irrational in being risk averse, that illustrates the plausibility of pure risk aversion.

(30:12) But of course, most applications will feature more complex kinds of value function, trading off different kinds of concerns. So this is the case with self-driving cars who have to trade off safety concerns with other kinds of concerns, such as getting to your destination fast, or if you think of a nursebot, they have to maybe trade off the inconvenience of an intrusive kind of procedure with an increased chance of a better health outcome, for instance. Or a robot teacher who has to trade off average exam performance, for instance, and the number of people who are going to fail the exam. So there are multiple goals at stake in most applications. But if a pure risk aversion is plausible in the simple cases, there's no reason to think that pure risk aversion shouldn't be a factor there as well and that nursebots can be risk averse or human nurses can be plausibly risk averse when they make trade offs between health outcomes and inconvenience, for instance.

(31:23) And in the case of human agents, we tend to be permissive of a range of different levels of risk aversion. Under the formal interpretation, expected utility theory can capture this through decreasing marginal utility in value and alternatives to expected utility theory such as, for instance, Buchak’s risk-weighted expected utility theory that's been quite popular in recent years, can also capture such pure risk aversion.

(31:54) And so if artificial agents were programmed to be always risk neutral, this would mean that they act substantively differently from how many human agents would permissively act in the same kinds of circumstances.

(32:09) And design that allows for risk aversion has the potential to more closely emulate considered human judgment about individual cases. So by adopting the substantive understanding of utility and implementing expected utility theory, we're making a choice to design artificial agents to act in a particular way that might make them act substantively differently from considered human moral judgment.

(32:37) So this is the first if more technical point that I wanted to make in this talk. To move on to, or to use this to motivate what I call the moral proxy problem, note that it's not entirely clear whether framing things in terms of human judgment about individual cases is actually the right way to frame decisions about how to design artificial agents.

(33:09) So the scenarios like the ones that we looked at are currently faced repeatedly by different human agents – or at least, more complex versions of those problems. And in the future, they're going to be faced repeatedly by artificial agents.

(33:27) So consider now the following compound gamble, which looks at many of those kinds of decisions all at once. So the first one is to decide at once on 100 independent instances of Case 1: the Unavoidable Crash, and Case 2: Artificial Rescue Coordination Centre.

(33:49) And suppose that in all of those instances, the same choice is going to be made. So if in all of those instances, the safe option is going to be chosen, you will save 100 people for sure. And if in all of those instances, the risky option is going to be chosen, the expected number of lives saved is 150. So this is not a surprise. The expected number of lives saved was already higher in the individual case. But what should let us pause is that, now the chance of saving fewer people than in the safe option is less than 0.5%. So it's extremely unlikely that you will save fewer people by going with the risky option than it is if you always go with the safe option.

(34:41) And now consider the following compound gamble for the third case, The Changing Lanes case. Suppose you have to decide at once on 100,000 independent instances of Case 3. So if you always go with the safe option the expected number of fatalities is 300 and if you go with the risky option, the expected number of fatalities is 200. So again, this isn't a surprise. We knew that the expected number of fatalities is higher for the safe option even in the individual case. But note that the chance, under the safe option, of fewer than 250 fatalities is 0.1%. And for the risky option, the chance of it being more than 250 is about 0.7%. So again, you're virtually certain that there are going to be fewer fatalities if you go with the risky option in every instance of this choice.

(35:39) And so in both of these types of cases, it seems unreasonable in this compound decision to choose the safe option.

(35:51) So what does this show us? Well, human agents don't normally face such a compound decision directly. Hopefully, most people will face the first kind of choice once in their life. Most people will never face it. The second kind of choice, more people might face, but not 100,000 times. So human agents normally don't face this kind of compound gamble. So it's not clear what the significance of considering this compound gamble is, for individual agents in particular, because your individual choice is independent both in terms of value and in terms of probabilities from the choices of anybody else.

(36:34) But here's where the significance of artificial agent design comes in. We can think of the designers of artificial agents as facing decisions of this type. They, when they make design decisions, make design decisions about many agents who might find themselves in those kinds of choice scenarios. So the fact is, the consequences of their design decisions will be the consequences of this compound choice.

(37:00) So now we face a kind of framing problem. What is the right way to look at these kinds of choices, the right way to frame them in order to decide how this agent should be designed?

(37:13) And it's a framing decision that has to do with the choice of agential perspective. So this is an important but actually often neglected question in the framing of decision problems – so the question of agency, which matters for the scope of the decision problem.

(37:30) When an individual human driver or an individual emergency rescue coordinator faces the choices that we've just discussed, the question of agency is clear. So given the consequences of her choices are independent (in terms of probabilities and the values at stake) from those of other drivers or coordinators, it seems plausible that she can simply consider them in isolation. So how you choose in this case is not going to affect how any other cases pan out and it's not going to change the probabilities in any other similar kinds of situations.

(38:09) And then, as we've seen earlier, for these individual agents, risk aversion seems permissible.

(38:16) However, the question of agency is complicated when artificial agents replace human drivers or coordinators. And as I want to argue, the context of risk makes this issue, in this case, practically important.

(38:30) So we're ready now to state The Moral Proxy Problem. So the thought here is that artificial agents are designed to serve human interest and make decisions on behalf of humans and in that sense, we can think of them as moral proxies for human agents. And I'm taking this terminology from a paper by Millar.

(38:51) And in any given context, now we can ask, who are they actually moral proxies for? And there might be several potential plausible answers to this question.

(39:01) What's important for us are two different types of answers to this question. One, would say that they are moral proxies for what I call low-level agents, which are the users of individual artificial agents, or individual human agents whose choices are being replaced by an artificial agent. So for instance, owners of self-driving cars, a local authority using an artificial rescue coordinator and so on.

(39:26) And high-level agents, which are those who are in a position to control the choice behaviour of many artificial agents, such as designers of artificial agents or regulators representing society at large. So we have to make a choice of who artificial agents are moral proxies for.

(39:46) And perhaps just as a side note here. In many potential areas of introduction of artificial agents, prior to the introduction, there is actually limited higher-level control over individual human choices – for instance, through legislation. And so this is certainly the case for self-driving cars replacing human drivers and the problem of crash optimization. There's only a limited extent to which a legislator can influence how humans make these kinds of choices. And in our case, there's also no obvious collective action problem in the sense that there's a conflict of interest between people who find themselves in a strategic situation of interaction because there's this independence between the individual instances of risky choices and because we can actually think of all agents sharing the same interest in trying to minimize fatalities, for instance. And in those cases, it's really only technology that seems to make high-level agents potentially relevant that now this high-level framing is one salient way of looking at the decision problem.

(40:59) So to come to the practical importance of the moral proxy problem in the context of risk, the thought is that moral proxies for low-level agents may need to be risk averse in our cases. And this is because human agents are often risk averse. So in order for artificial agents to be moral proxies for individual agents, there are basically two approaches that we might take here. One is to just allow personalisation of the values and levels of risk aversion that you might input into an artificial agent. Or we might look at what typical human agents consider moral judgment is about in a particular kind of case. And here again, we might get a risk averse kind of answer. So moral proxies for low-level agents may need to be risk averse in our cases.

(41:55) High-level agents, however, face the compound perspective. And so their moral proxies should not display risk aversion in individual choices, because for them, the almost certain worst outcomes of allowing low-level of risk aversion becomes important.

(42:15) Okay. So in the following, I just want to point to a few considerations that we might point to in trying to make a decision about who artificial agents should be moral proxies for. But this is very preliminary. So I just want to go through some of the considerations that are found in the literature.

(42:37) So first, I want to point out that when it comes to thinking of artificial agents as moral proxies for low-level agents, this is actually something that is implicitly assumed by a variety of positions we find in related debates on the design of artificial agents.

(42:56) So in much of the literature on moral dilemmas for artificial agents in the moral philosophy literature, there’s just an assumption that we can conclude fairly directly from our moral judgments about individual dilemma situations, such as the ones that we just looked at, to how artificial agents should handle them. And I think this often just stems from an unthinking acceptance of artificial agents as low-level moral proxies. And so for those people, what I think my argument shows is that there are substantive implications for the handling of risk to this unthinking acceptance of the low-level perspective.

(43:38) Those who argue for personalisable ethics settings also standardly assume a low-level framing and assume artificial agents are low-level moral proxies. So we might look to arguments in favour of personalisable ethics settings to see whether they work as arguments in favour of low-level framing.

(43:59) Just one side note to note here is that the examples we discussed here should move these proponents to include adjustable risk aversion settings, which is actually something that's not usually done. So we often find the substantive interpretation of expected utility theory proposed as well by those who are proposing personalisable ethics settings. So what my argument would imply here is that this might be something to consider for those who are arguing for personalisable ethics settings.

(44:34) And lastly, there's a low-level solution to what's sometimes called the ‘responsibility gap’, which seems to assume low-level moral proxies. So the responsibility gap is the worry that when artificial agents cause harm, that there might be nobody to hold responsible, especially when we find ourselves in a situation where artificial agents aren't advanced enough for us to consider them as moral agents in their own right. Who else should we hold responsible? And one response to this worry is that it is individual users that are responsible or legally liable for harms caused by their artificial agents.

(45:15) Now, I think this is only plausible if they have a similar degree of forward-looking responsibility. So that literature is mostly about responsibility for harms already committed. But I think it's only plausible if there's also forward-looking responsibility for choices to be made and decisions about programming are framed from their perspective. It seems like it would be unfair to hold individual agents responsible for choices that are justifiable only from a higher level agential perspective.

(45:47) Okay. So those were some positions that are defended in the literature that seem to assume a low-level framing. What can we say in terms of positive considerations in favour of a low-level framing? Well, when we look at arguments in favour of personalisable ethics settings, one claim that's often made is that it would be paternalistic for moral programming to be imposed by designers or other high-level agents, that that would be paternalistic towards the users of artificial agents.

(46:23) I think one fairly immediate worry here is that a lot of the kinds of cases we're interested in here, these are moral choices about risk imposition on other agents and usually when we impose a choice on somebody in order to, say, prevent harm to somebody else, that's not an instance of paternalism. But I think that might be a little bit too quick here because if, as I've tried to show with the examples earlier, it is the case that there is some moral and rational leeway when it comes to levels of risk aversion, there might be room here for a kind of paternalism that comes from imposing a specific attitude to risk that is different from the one that an individual human agent would have had and that is being imposed on them.

(47:14) So there might be some room for paternalism here, but we can't use this as an argument in favour of low-level agential framing because in fact, that would be question begging.

(47:25) So this is only compelling if the moral choices are actually in the legitimate sphere of agency of the users and the artificial agents act as their moral proxies.

(47:35) And so to appeal to paternalism to argue for low-level agential framing would be question begging.

(47:42) But perhaps, an argument in the vicinity here that is potentially more successful, appeals to the notion of liberal neutrality. So difficult moral trade-offs are often ones that there's reasonable disagreement about and the context of risk introduces additional intuitive rational and moral leeway. And in those contexts, one might think that high-level agents should maintain liberal neutrality, shouldn't impose a particular decision on individual agents. So it shouldn't be Google that's deciding these things, would be the thought.

(48:19) And one way of maintaining such neutrality is to partition the moral space so that individuals get to make certain decisions themselves, especially on matters where we don't face a traditional collective action problem, as argued we don't face, in the kinds of cases that I presented.

(48:36) And personalisable ethics settings are one straightforward way of partitioning the moral space in this way.

(48:43) A few more hand wavy remarks that might point in the direction of low-level agential framing is an appeal to some sense of moral closeness. So users are, in various senses, morally closer to the potentially harmful effects of the actions of artificial agents.

(48:59) They make the final decision of whether to deploy the agent. Their lives might also be at stake. So they weren't in the cases I looked at, but they might be in others. They more closely observe the potentially harmful events and will have to live with the memory of these events. And we might want them to generally maintain responsible oversight of the operations of the artificial agents.

(49:25) And I think all this will point to at least individual users feeling more responsible for the choices that are made by artificial agents that they own, though it's unclear whether this points to actual responsibility on the part of individual users. So these were some considerations in favour of low-level agential framing.

(49:49) What might we say for high-level agential framing? I think the intuitive case here is that, as a matter of fact, decisions that programmers and those regulators… (sorry, I lost my slides) So, as a matter of fact, the decisions that programmers make determine many lower-level choices, and in that sense, they’re facing the compound choice. And in that compound choice, as we've seen, the almost certainly worse aggregate outcome of allowing lower-level risk aversion appears decisive.

(50:31) And so given that, as a matter of fact, those really are the consequences of certain design decisions, designers of artificial agents… In order for artificial agents to be designed as moral proxies for individual agents, which might need to be risk averse, designers would have to abstract away from this, from the fact that these design decisions might have almost certainly worse aggregate outcomes. And I think that might just be a very hard thing to do.

(51:02) There are also some independent reasons for conducting what is sometimes called a ‘systems-level analysis’ – looking at things from the higher level perspective – which is that consideration has to be given to how artificial agents interact with each other. So I think this is, again, particularly obvious in the case of self-driving cars and there are many different self-driving cars on the road. We have to somehow manage how they interact with each other. And so for those decisions at least, we have to adopt a kind of high-level perspective. And so it seems apt to also adapt it for other consequences of the behaviour of artificial agents.

(51:41) And moreover, there's a potential for genuine collective action problems when ethics settings are personalisable. So here, the concern is mostly about ethics settings where you can give the lives of the passengers in your car additional weight or give priority to yourself. And we might be in a situation where everybody has an interest to do that, but that has worse consequences on the aggregate. And again, this might push us to a more high-level perspective.

(52:15) There's also a high-level solution to the ‘responsibility gap’, which says that we find backward-looking responsibility for harms at the level of high-level agents. And if this view is attractive, it makes it plausible to find forward-looking responsibility at that level of agency as well. So again, the thought would be that it's unfair to hold high-level agents responsible for choices that are only justifiable from the lower-level perspective.

(52:44) And one potential argument for high-level responsibility for harms caused that has been made in this literature is appeal to moral luck. So the thought here is that whether individual artificial agents ever find themselves in situations where they have to cause harm, it's in part down to luck. So it may seem then unfair to hold their users responsible, but not others who employ their artificial agents in exactly the same way. And this might speak in favour of collectivizing responsibility when we can. So this high-level solution to the responsibility gap, again, might point to a high-level agential framing for choices under risk.

(53:25) So I think neither case is entirely conclusive. And in these kinds of situations where it's unclear how to attribute responsibility or agency, there's often a kind of move made that points to distributed agency. So we're just saying, well, to some extent, there's agency at the lower level and there's also agency at the higher level. And we find this appeal quite often in discussions about the responsibility of artificial agents.

(53:53) So here are some examples, Taddeo and Floridi:

“The effects of decisions or actions based on AI are often the result of countless interactions between many actors, including designers, developers, users, software and hardware…” And

“With distributed agency comes distributed responsibility.”

So they are saying this distributed agency across the high and the lower level, so including designers and users.

(54:18) And here is an example from the legal literature. There's this idea of a legal ‘electronic person’ that might be responsible that's composed of designers, producers, users and so on. So again, a composition of higher-level and lower-level agents.

(54:35) The problem, though, is that for the problem under consideration here, the appeal to distributed agency actually doesn't help. And that's because normally of what we would think how decisions are made in cases of distributed agency, is we negotiate on a way forward between the different sub-agents involved. But that requires us to already have picked a framing of the decision problem. But here adopting one or the other agential perspective results in different ways of framing the decision problem. We need to settle on one before we can actually solve it. So this doesn't help with this problem.

(55:14) And in fact, the moral proxy problem complicates distributed agency, distributed responsibility. More generally, I think, because if substantively different programmings are plausible from the high-level and the low-level perspective, it might be unfair to hold high-level agents responsible or only partially responsible for choices justified from the low-level perspective and vice versa.

(55:39) So appeal to distributed agency doesn't help. It seems like we have to actually choose between the two perspectives.

(55:46) And I think the takeaway here is that both options involve sacrifices. And the main takeaway I guess, is going to be is that those who are proposing these framings should be aware of the potential downfalls.

(56:01) So for the high-level, the worry is that this involves imposing specific courses of action in matters where there's intuitively rational and moral leeway when human agents are involved.

(56:11) And has risks absolving users of artificial agents of felt or actual responsibility for the artificial agents they employ, or having them live with consequences of choices they would not have made.

(56:22) And the pitfalls of going with the low-level perspective is that this involves designers making design decisions that have almost certain worse aggregate outcomes than other available design decisions.

(56:33) And this limits our resources to solve various other problems that can only be solved from a high-level perspective, such as problems to do with the interaction of different artificial agents.

(56:46) So just to give a brief Overview again of what I've argued: The first part of the talk was meant to show that the standard approach to artificial agent design assumes substantive risk neutrality and that is something that makes them act differently from considered human judgment, which might display risk aversion and is not obviously irrational or immoral in doing so.

(57:12) And the second part looked at what I call The Moral Proxy Problem, which asked the question of whether we should think of artificial agents as moral proxies for higher-level agents or for lower-level agents and points out that in contexts of uncertainty, this has special practical relevance because moral proxies for low-level agents may need to be risk averse in the individual choices they face and moral proxies for high-level agents should be risk neutral in individual choices because this has almost certainly better outcomes in the aggregate.

(57:44) And in the end, I’ve considered some ways of potentially addressing the problem but I think each comes with sacrifices. And so this is a hard choice that people in this debate should be aware of, I think.

(58:00) Okay, and this is it. Thank you very much for listening until the end and here are some references.

[End of Transcription]