Andreas Mogensen, David Thorstad | Tough enough? Robust satisficing as a decision norm for long-term policy analysis
ANDREAS MOGENSEN: (00:09) Okay. Alright. Okay. Good. As David said, yes, this is a joint work between the two of us and I should also say David is the technical person, so any technical questions I'll pass off to him in the Q&A.
(00:25) Anyway, here goes. So we as a species currently confront a range of profound challenges to our long-term survival and long-term flourishing, including nuclear weapons, climate change risks from artificial intelligence and synthetic biology. And the sad fact is that severe uncertainty, cluelessness, if you will, clouds our efforts to forecast the impact of policy decisions that we might make in response to these threats, policies whose impacts will play out over a very, very long-run timescales. And one of the things we'd really, really like to know is how we're supposed to make decisions in the face of such deep uncertainty or cluelessness.
(01:05) Now, in recent years, the term ‘deep uncertainty’ has been adopted as something of a technical term in operations research and in engineering to denote decision problems in which our evidence is profoundly impoverished in the way it so usually is, when we confront these problems in long-term policy. So the third edition of The Encyclopedia of Operations Research and Management Science defines deep uncertainty disjunctively as arising either in situations where one's able to enumerate multiple plausible alternatives without being able to rank them in terms of perceived likelihood, or even worse, all that's known is that we don't know.
(01:45) Now, given the paucity of evidence that could constrain the probability assignments under deep uncertainty so understood, analysts that are associated with the study of decision making under deep uncertainty typically abbreviated as DMDU, they generally claim that orthodox approaches to decision analysis based on expected value maximization are just not helpful in this kind of context. So to quote Ben Haim, an engineer, writes that despite the power of classical decision theories, in many areas such as engineering, economics, management, medicine and public policy, a need has arisen for a different format for decisions based on severely uncertain evidence.
(02:25) And this community of researchers working under this umbrella of DMDU, they developed a range of decision support tools that is supposed to address this need. So they have names like Robust Decision Making or RDM, Dynamic Adaptive Policy Pathways. I can't remember what that one does. Info-Gap Decision Theory and things of that kind.
(02:46) These primarily, I think, can be described as comprising procedures for framing and exploring decision problems. And actually, sometimes their proponents seem a bit reluctant to suggest normative criteria for solving decision problems once you've set them up right. So Robert Lempert and colleagues, they write about Robust Decision Making, that this technique does not determine the best strategy. Rather, it uses information in computer simulations to reduce complicated, multi-dimensional, deeply uncertain problems to a small number of key trade-offs for decision makers to ponder. And this is the aspect of DMDU research that's been highlighted in a recent paper by Casey Helgeson, which I think is one of the relatively few papers by philosophers to engage with this area of research in depth.
(03:31) Helgeson argues that DMDU research focuses principally on how to frame the decision problem in the first place. And thereby he thinks it represents something of a counterbalancing influence to philosophical decision theory’s comparative focus on the choice task.
(03:50) Nonetheless, the researchers associated with this area do sometimes provide suggestions for candidate normative criteria that they think of as appropriate to solving the choice task under conditions of deep uncertainty. And in particular, they tend somewhat strikingly to advocate normative criteria that emphasize the importance of achieving robustly satisfactory performance. So Robert Lempert writes that DMDU decision support tools help to facilitate the assessment of alternative strategies with criteria such as robustness and satisficing rather than optimality, by which I think he means roughly expected utility maximization. He says the former are particularly appropriate for situations of deep uncertainty.
(04:33) Similarly, Schwartz, Ben Haim and Dasco write that there's a quite reasonable alternative to utility maximization. It's maximizing the robustness to uncertainty of a satisfactory outcome or robust satisficing. Robust satisficing they say, is particularly apt when probabilities are not known or are known imprecisely.
(04:53) So, David and I's interest when we were working on this paper was in this norm of robust satisficing considered as a decision norm for deep uncertainty, specifically considered in relation to the kind of decision support tools that are characteristic of what's known as robust decision making. We're going to set aside the other approaches like info-gap decision theory, if you remember that I mentioned that name. And I'll explain what robust decision making involves in just a second. But the key questions that we want to focus on are: how should we characterize this idea of robust satisficing as a decision norm within the context of the decision support tools from RDM? What's its relation to more familiar decision criteria that are known to philosophers and economists? And is this actually a sensible norm for decision making under very deep uncertainty.
(05:43) So I'd want to quite briefly just outline the procedures and decision support tools to be used to facilitate decision framing in the context of this RDM technique. And I also want to highlight and provide some preliminary interpretation of some key claims about robustness and satisficing as decision criteria that have been made by people involved in its development.
(06:07) So RDM, was developed at the RAND Corporation around the turn of the millennium and it's key innovation was that it used computer-based exploratory modeling to augment some earlier techniques that were well known in management and operations research called – They have names like scenario-based planning and assumption-based planning.
(06:28) And the key thing here is you can use these computer simulations to run many, many, many possible scenarios, building on a very wide range of different plausible assumptions which you couldn't do before the widespread availability of lots of computing power. And you can use that to build up a very large database of possible futures and to evaluate the performance of different candidate strategies across that very large ensemble of possible futures.
(06:52) And once you've used these computer simulations to create a wide range of possible futures, RDM then typically uses scenario discovery algorithms to identify the key factors that determine whether a given strategy is able to satisfactorily meet the organization's goals.
(07:08) And then usually some data visualization techniques are used to help to represent to decision makers how the performance of different strategies across this ensemble, how it depends on different factors, and what are the key trade-offs to be made.
(07:23) So that's a very brief whistle-stop tour of how this kind of decision support tools work.
(07:29) Now, researchers who have been associated with the development these tools, they typically emphasize the extreme fallibility of prediction in the face of deep uncertainty and they recommend identifying or trying to identify strategies that, in some sense, perform reasonably well across the entire ensemble of possible futures, as opposed to trying to fix the unique probability distribution over scenarios relative to which you can then try to maximize expected utility. They grant that expected utility maximization is the normative standard when uncertainties are shallow and probabilities can be known, but under deep uncertainty, they claim we need some alternative approach.
(08:07) It's common to use a regret measure in these studies to assess the performance of different strategies within a scenario or across scenarios, so to speak. So Lempert, Popper and Bankes in this 2003 paper, say that a robust strategy should be defined as one that has small regret over a wide range of plausible futures.
(08:29) At the end of the talk, I'll suggest some reasons why you might think this regret measure is not the best way to assess satisfactory performance.
(08:38) The standard approach for RDM is, as I just said, is to consider how well a given policy performs across the range of possible futures. So you're looking at its ability to yield a satisfactory outcome given different possible states of the world. And so understood you might think that maybe this robust satisficing norm is going to be something you could consider as a competitor to other kinds of non-probabilistic norms for decision making under what Luce and Raiffa called complete ignorance. So this would be norms with which you might be familiar like maximin or minimax regret or the Hurwicz criterion.
(09:14) But it's also important to note that the assessment of candidate policies against this norm of robust satisficing in this literature needn't entirely forego the use of probabilities. So Lempert, Popper and Bankes, this notion of robust satisficing, they say can be generalized by considering the ensemble of plausible probability distributions that can be explored using the same computational techniques described earlier for dealing with ensembles of scenarios. So in other words, you can try to assess the robustness of a given policy by considering the extent to which its expected performance is acceptable across the range of probability distributions that you take to be consistent with your evidence. And so understood you can think of the robust satisficing norm as perhaps being some kind of competitor to norms for decision making with imprecise probabilities.
(10:04) And I'll discuss some of them in just a second.
(10:08) So, having just given an initial impression of what's going on in this literature, I now want to explore the core considerations that are put forward there for preferring decision norms that exhibits some kind of robustness in the sense that I've just described or discussed, however vaguely and imprecisely. And that takes the form of an intuitive objection to the application of subjective expected utility theory to problems involving deep uncertainty.
(10:35) So consider the approach suggested by John Broome for dealing with climate change, which Lara actually also mentioned in her talk. So at least the way I read him, Broome concedes that our evidence about climate impacts just isn't good enough to uniquely constrain the probabilities that we assign to all relevant contingencies. His recommendation is that we should nonetheless assign these contingencies precise probabilities as best we can on the basis of something like a subjective best guess estimate and then we ought to maximize expected utility or expected value relative to that assignment. So Broome says stick with expected value theory since it's very well founded and do your best with probabilities and values.
(11:18) And the core worry that we perceive as animating the emphasis on robustness in the RDM literature is this thought that a policy that is perhaps optimal relative to some particular subjective best guess probability assignment could in principle be suboptimal or even catastrophic in expectation relative to other probability assignments that weren't excluded by your evidence.
(11:39) And intuitively, if that is the case, then it seems unwise to optimize relative to one's subjective best guess probability estimate and decision making can be improved, we might think, by incorporating alternative less committal ways of representing uncertainty.
(11:56) But on its face, this intuitive objection that I've just rehearsed against subjective expected utility maximization looks to be an objection to many standard norms for decision making with imprecise probabilities. So decision norms that reject precise probabilism and assume instead that a decision makers doxastic state might be modeled by a non-singleton set of probability functions R often called a representor following [inaudible 12:23], the representor sometimes imagined metaphorically as a credal committee, an assembly of individuals, each of which is a probability function in R.
(12:34) So to see what I mean here, consider what we might call the liberal norm. This says that an act is rationally permissible if and only if it maximizes expected utility relative to some probability function in R. So echoing the points noted earlier, it feels natural to want to object to the liberal rule on the basis that it seems unwise to choose an act that happens to be optimal relative to some arbitrary probability distribution without regard for whether it might be evaluated differently across the other probability assignments that you take to be consistent with your evidence.
(13:06) And then the same concern would extend to the so-called maximal rule which says that an act is rationally permissible if no other has greater expected utility relative to every probability function in the representor just because every act that's permissible, according to the liberal rule is also permissible according to this one.
(13:23) So these decision rules, I want to say, they give individual probability functions too much power by allowing them to veto any recommendation against choice of a given policy that could otherwise be made by one's credal committee.
(13:37) And we think that a similar concern arguably extends to other decision rules commonly discussed in this literature, such as MaxMinEU and HurwiczEU. So MaxMinEU ranks acts in terms of their minimum expected utility relative to the probability functions in R and HurwiczEU which is also the one Lara discussed, I can't remember exactly what name she used for it. It's a generalization of MaxMinEU that ranks the available acts in terms of a convex combination of their minimum and maximum expected utilities.
(14:12) So intuitively, we want to say both rules give extremal probability functions too much power in settling the recommendation of the agent’s credal committee. Intuitively, we'd like to object that policies should be evaluated by the performance on more moderate probabilistic hypotheses.
(14:30) Okay. So all of this seems, I think, to have some kind of purchase on our intuitions. But here's the worry. So in formulating the line of objection, certainly as it was applied to subjective expected utility theory and to the liberal rule for imprecise credences, we emphasize this idea that a policy that might be optimal relative to this particular probability assignment could be suboptimal or catastrophic in expectation relative to some other probability assignment that wasn't excluded by your evidence.
(14:59) But intuitively, the objection here is not just that there might be some other probability function relative to which this policy fails to maximize expected utility. For example, the intuitions that I've been trying to pump plausibly are not of the kind that would lead us to accept the so-called Conservative rule for decision making with imprecise probabilities, which says that an act is rationally permissible if and only if it maximizes expected utility on every probability function in your representor. Intuitively, this conservative rule is going to evoke the very same concern about giving veto power to extremal probability functions in the agent’s credal committing, the same kind of concern as with the liberal rule albeit in the opposite direction or something like that.
(15:41) So plausibly, the intuitive objection here is one that gains purchase only insofar as it makes sense to say things like there are more admissible probability functions such that the policy fails to maximize expected utility than as such that it succeeds in satisfying that criteria. So the objection to the rules that we've discussed is that they heed the advice of two or even fewer admissible probability functions, even when many more elements of the agent's credal committee might disagree.
(16:10) I think it's very natural to worry that all this talk of there being more and fewer admissible probability functions can't be made suitably formal or precise without helping ourselves to what ultimately turns out to be a second-order probability measure on the set of admissible probability assignments by which we quantify the size of the collection of first-order probability distributions.
(16:33) But there's actually a sense in which it would be unsurprising for second-order probabilities to show up in RDM and robust satisficing because the robust satisficing norm has been presented at times as a descendant of a little known decision norm for decision making in complete ignorance, the domain criteria due to Starr.
(16:54) So very roughly, the domain criterion asks agents who are – The conditions of complete ignorance, they don't know anything about which states of the world might be actual. They're asked to consider the set of all possible probability assignments to the states, adopt a uniform second-order probability measure over first-order probability functions and then rank each strategy in terms of the second-order probability that it maximizes expected utility relative to a randomly drawn first-order probability assignment. So within the ancestry of this stuff, there's the second-order probabilistic decisions.
(17:32) But in a different sense, I think it's obviously surprising if a reliance on a unique second-order probability function is presupposed here because it seems to go against what ultimately concerns or motivates a lot of the research in this space. So prominent here is this thought that it's just inappropriate to require decision makers who are facing severely uncertain long-term policy decisions to nail down their uncertainty in the form of a unique probability distribution.
(18:02) And Robert Lempert, for example, writes that RDM is designed to help users form and then examine hypotheses about the best action to take in those situations poorly described by well known probability distributions and models.
(18:16) So in light of all this, we think that proponents of robust satisficing have some explaining to do. Does a concern for robustness, in fact, implicitly invoke uniform probability measure?
(18:28) Perhaps not. So in particular, we're interested if not wholly persuaded, perhaps by the following kind of rejoinder to the objection that a decision criterion that uses some kind of robustness measure 𝜇 on R must presuppose second-order probabilities. And the kind of rejoinder that we discussed is the claim that 𝜇 can't be a probability measure on R because R is not a sample space. So roughly speaking, a sample space is going to be a set of different ways that things can be exactly one of which is realized or is actual or true. But R needn't be viewed as a set of probability functions exactly one of which is true or correct. You could obviously think that there is always some unique credence function that's warranted by our evidence and maybe you and I just lacked the cognitive wherewithal to be able to fix our credences with that required degree of precision. But at least the way we see it, that sort of view is relatively atypical in the recent philosophical literature on imprecise credences.
(19:29) So proponents of imprecise probabilism like Joyce, typically instead take the view that even ideally rational agents will sometimes face sufficiently ambiguous or incomplete evidence that they will have a doxastic state that is best represented by a set, a non-singleton set of probability functions. So it's not the case on this sort of view that one of the elements of R represents the epistemically right response to the agent’s evidence. Rather that role is to be played by the mushy credal state that's represented by R itself.
(20:02) And yeah, so if you take a view of this kind, then R can't be interpreted as a sample space because it's elements aren't different candidates for the uniquely correct probability function. And as a result, a measure on R can't be a probability measure whatever other properties it might satisfy.
(20:18) And so in that way, you could try to argue that a demand for robustness is not presupposing second-order probabilities.
(20:24) Now, we're not totally sure how persuasive this rejoinder is. In particular, I think there's a good worry to have, which is that for all we've said, the intuitive appeal of using this measure on R derives from implicitly treating the representor as if it were a sample space. Maybe we can discuss that in the Q&A.
(20:43) What I want to do now is move on to discuss the role of satisficing in all this. So our discussion just now was about the desirability of choosing options that are robustly supported in the sense of being approved by a larger proportion of the probability functions in one's representor. What we now want to do is explore the relationship between robustness and satisficing because then expositions of RDM, robustness and optimizing, are frequently contrasted, as we saw earlier, and a desire for robustness is often linked to satisficing choice. But we think there's no really straightforward relationship between those concepts.
(21:19) And you can bring that out by thinking again about this domain criterion that we mentioned earlier, that often gets noted in this literature. So the domain criterion was recommended by Starr in the 60s, in part on the basis of concerns about robustness. So Starr makes this argument that a decision rule that relies on a unique probability assignment is just going to be too sensitive in its recommendations to the particular probability vector that we choose and in which we can have little confidence because we're in conditions of severe uncertainty.
(21:50) Nonetheless, the domain criterion is one that ranks strategies in terms of the measure of the set of probability assignments on which they maximize expected utility. And so it's naturally thought of, perhaps, as a norm for robust optimizing.
(22:04) So given that there could be this norm of robust optimizing what explains the emphasis on satisficing choice that we see in the design and application of RDM?
(22:15) Recommendations to adopt the satisficing norm in this literature, is sometimes likened to the notion of satisficing choice that was discussed by Herbert Simon in his pioneering work from the 50s. We actually think this is misleading. So we think it's important to distinguish between satisficing choice in a static setting and satisficing in a dynamic setting.
(22:38) In Simon's treatment, satisficing is a search heuristic for a class of dynamic decision problems that involve serially presented options. So basically, it says you should fix an aspiration level and as you search through the options, you should halt your search when you find an option that meets the aspiration level because you thereby evade the costs that are associated with continued search or with iteratively computing the expected utility of further search.
(23:04) And the application of a satisficing stopping rule in search problems needs to be contrasted with satisficing in a static context where all your options are available to view all at once. And it was really the rationality and/or moral permissibility of satisficing choice in static contexts of that kind that was an issue in the debate between Michael Slote and his various critics during the 80s running through to the 90s and 2000s, I think. So Slote argued that people are sometimes rationally or morally permitted to choose a satisfactory outcome even when a better outcome was known to be available and could be costlessly chosen.
(23:41) Now, RDM is often applied in contexts that don't have a fixed menu of candidate strategies. So Lempert, Popper and Bankes, they note that the human participants in the analysis are often encouraged to hypothesize about strategic options that might prove more robust than the current alternatives. And these new candidates can then be added to the scenario generator and their implications dispassionately explored by the computer. So there is a kind of search component that's involved in many practical applications of RDM.
(24:10) Nonetheless, at least the way David and I read it, the robust satisficing decision rule seems to function as a criterion for the synchronous comparison of different candidate strategies. So if we're asked to assess two candidate strategies side by side in an imprecise probabilistic setting, the thought is that we're to prefer that which does better in terms of performing satisfactorily across a wider range of admissible probability distributions. And that recommendation can then be contrasted with Starr’s recommendation to prefer the act that does better in terms of performing optimally across a wider range of admissible probability distributions.
(24:48) So we think that a better way in which you might try to justify this emphasis on satisficing is by looking to the study of voting methods. Now in the literature on decision making and imprecise probabilities, comparisons with social choice are just everywhere. So here's one example from a paper by Brian Weatherson. Weatherson says we can regard each of the probability functions in the representor as a voter which voices an opinion about which choice is best. And Weatherson takes this as an invitation to draw various lessons from social choice theory, especially from Arrow’s impossibility theorem. Now viewed in this perspective, it does seem natural to consider different voting rules as ways of aggregating preferences across the members of the agent’s so-called credal committee. One major concern about doing this is probably that you would have doubts about using a measure that would allow you to compare the size of different subsections of the electorate. But we've already discussed that a little bit.
(25:42) Now, once you take this perspective, you can think of the domain criteria as basically being an instance of plurality voting, the most commonly used voting method in democracies today. So each voter casts a ballot for her most preferred option and the option with the most votes is the winner. So you can interpret the domain rule as stating that when the agent’s beliefs are modeled by a representor R and 𝜇 represents an indifferent weighting function over R, then any option o is to be valued at the measure of the set Mo on which o maximizes expected utility. And then choice of o is permissible just in case it has highest value or is tied for highest value.
(26:22) So under this interpretation of the domain criterion, each possible probability assignment can be conceived as a voter who casts a ballot for their most preferred option understood as the option that maximizes expected utility relative to the probability function whose vote has been cast.
(26:38) An alternative to plurality voting is so-called approval voting. So here ballots no longer necessarily indicate each voter’s most preferred option, but rather all of the candidates that she finds acceptable. So that allows each voter to vote for multiple candidates. Every ballot on which a candidate appears garners that candidate one vote and then the election goes to the person who receives the most votes. It's relatively natural, we think, to interpret this norm of robust satisficing as an instance of approval in the imprecise probabilistic context. We can interpret this rule as valuing options at the measure of the set So on which o has satisfactory expected utility with choice of a given option permitted just in case its value is maximal. So in this case, each admissible probability assignment can again be conceived as a voter who casts a ballot for every option that she finds satisfactory given the probability assignment that she is. And then the winner is the one who gets the most approval votes.
(27:40) So earlier I asked, why does the RDM literature link satisficing and robustness as opposed to telling you to pick the option that is robustly optimal in the style of the domain rule? And one way in which you can now rephrase that choice is as a choice between plurality voting and approval voting as a way of aggregating evaluations across the agent’s credal committee.
(28:00) And the literature on voting methods has a bunch of arguments for the relative superiority of approval voting. Some of them can't be transferred to this weirder context, but some can, and we'll just discuss.
(28:12) One has to do with avoiding vote splitting. So ideally, a voting method should be insensitive to perverse instances of vote splitting such as the following. Imagine we have a diverse electorate and they're asked to choose between a tax-only option and a combined tax subsidy hybrid policy as a policy for climate abatements. Imagine that 60% of the electorate favors the hybrid and 40% favor the tax-only option. So obviously, under plurality voting, the hybrid model will be chosen. But now suppose that voters are asked to choose between a tax-only policy and a pair of hybrids, which differ slightly. One of them has a more generous subsidy than the other. As before 40% of voters like the tax-only option, but now the remaining 60% end up being split roughly evenly between the two hybrid options. As a result of which the tax-only option wins under plurality voting. And that seems like the wrong result. Now fortunately, approval voting allows you in principle to avoid that result. It doesn't require voters to choose between the two hybrid options. Most voters who approve of one mixed policy will presumably approve of the other and so vote splitting will be avoided. And that seems like a strong reason to prefer approval voting over plurality voting.
(29:28) In some sense, approval voting is less vulnerable to perverse instances of vote splitting, though actually formalizing this is a little bit challenging.
(29:37) So one way in which you can formalize immunity to vote splitting is by Sen’s α condition. So if S is the set of electoral candidates, P is the set of preference orderings and f is this voting function of choice functioning that returns a subset of S as the winners, then the choice function satisfies α just in case whenever x is chosen from S, x would be chosen from any subset of S given the same profile the preference would. So in other words, the winning option is not one that would have lost had we eliminated some other options from the ballot, and equivalently, no losing option would have won had a different alternative being added to the ballot. And obviously, when f satisfies α electoral outcomes, therefore, end up being immune to this kind of spoiler effect.
(30:24) Now, approval voting will satisfy the α condition so long as what counts as an approvable choice is one that doesn't vary with the menu of options. So for example, if any voter votes for a policy option just in case it yields expected GDP per capita above some fixed absolute threshold. Within the context of choice under deep uncertainty, the same can be said of the robust satisficing rule, if it scores options in terms of the value of the measure of So provided that a probability function forms part of that set just in case it gives o’s expected utility along some dimension or compositive dimensions greater than or equal to some fixed absolute threshold.
(31:02) Now, in our view, vote splitting is no less of a concern in the decision context we're thinking about where probability functions are conceived as voters. If anything, it's a much greater concern than we think because voters and candidates in this context, they can't make the kind of strategic choices that people make in real world political settings to avoid spoiler effects. And also the individuation of the candidates here is to a large extent arbitrary because a given policy can, in principle, be subdivided, at least mentally, into indefinitely many possible sub implementations replacing one candidate with a whole army of clones. So insensitivity to spoiler effects in this context can be conceived as another kind of robustness, namely as robustness with respect to arbitrary choices about how to individuate the decision maker’s options. So that seems like a strong reason to favor fixed threshold robust satisficing.
(31:57) But as we've noted, the RDM literature doesn't typically define satisficing in terms of satisfaction of an absolute threshold, instead, typically a regret measures is used. Now, that means that what counts as a satisfactory choice can in principle vary with the menu of options. And so the addition or subtraction of candidates from that menu can respectively change losers to winners or winners to losers. So we think this provides a good reason to reject the standard conception of the satisficing threshold that's appealed to in the RDM literature.
(32:28) Okay, so I realize I'm running over of time. So I think I'll just say – We hope this paper is going to open a dialogue between philosophers and the DMDU community and that we each have something to learn from one another. So thanks everyone. And if you're interested, you can find a full draft of the paper and all references online as a GPI working paper.