A Casual Understanding of When and When Not to Jeffrey ...

37 downloads 0 Views 1023KB Size Report
Jerry hates going to the fairgrounds when Newman is there. Suppose further that ..... are characters from the popular 1990s American television show, Seinfeld.
Imprint 

Philosophers’ 

volume 17, no. 8 April 2017

A Causal Understanding of When and When Not to Jeffrey Conditionalize Ben Schwan1 University of Wisconsin — Madison

Reuben Stern Munich Center for Mathematical Philosophy, LMU Munich

©  2017  Ben Schwan and Reuben Stern This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

1. We each accept full and equal responsibility for what follows.



Abstract: There are cases of ineffable learning — i. e., cases where an agent learns something, but becomes certain of nothing that she can express — where it is rational to update by Jeffrey conditionalization. But there are likewise cases of ineffable learning where updating by Jeffrey conditionalization is irrational. In this paper, we first characterize a novel class of cases where it is irrational to update by Jeffrey conditionalization. Then we use the d-separation criterion (from the graphical approach to causal modeling) to develop a causal understanding of when and when not to Jeffrey conditionalize that (unlike other norms on offer) bars updating by Jeffrey conditionalization in these cases. Finally, we reflect on how the possibility of so-called “unfaithful” causal systems bears on the normative force of the causal updating norm that we advocate. I. Introduction When you learn something, but become certain of nothing, update by Jeffrey conditionalization (JC). Or so the saying goes. But as Jeffrey himself notices (1970, 1983), there are times when updating by JC leads one astray; so we need some account of when it is appropriate to Jeffrey conditionalize. In this paper, we use the graphical approach to causal modeling to develop an account of when and when not to Jeffrey conditionalize. First, we identify an important class of cases where it is inappropriate to Jeffrey conditionalize that has gone unnoticed in the literature. Second, we use the d-separation criterion (from the graphical approach to causal modeling) to develop a causal updating norm that specifies when and when not to Jeffrey conditionalize given knowledge of a directed acyclic graph (DAG) (where DAGs encode information about causal relevance). Third, we consider how the possibility of socalled “unfaithful” causal systems affects the normative force of our causal understanding of when not to Jeffrey conditionalize. Finally, we briefly comment on some loose ends, including how our account relates to standard conditionalization, what agents should do when our causal updating norm says that updating by JC is inappropriate, how our understanding of JC relates to Skyrms’s (1980) influential



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

understanding in terms of higher-order degrees of belief, and a handful of other related issues.

applicable in many cases, not all learning is certitude-gaining. Indeed, certitudes are often hard to come by. As Jeffrey (2004, p. 53) observes:

II.  The Need for Jeffrey Conditionalization

Certainty is quite demanding. It rules out not only the far-fetched uncertainties associated with philosophical skepticism, but also the familiar uncertainties that affect real empirical inquiry in science and everyday life.

Suppose that Kramer is considering whether or not to bet on his favorite horse, Mother Was A Mudder, in an upcoming race. Since Mother Was A Mudder’s mother was a mudder, Kramer believes that Mother is significantly more likely to win if the track is muddy than if it is not. Suppose that, come race day, Kramer learns that the track is muddy. Clearly, he should become more confident that Mother will win.2 Some learning experiences can be described in this way — i. e., an agent can express what she learns with certainty. In such cases, it is plausible that agents should update their beliefs by conditionalization3 — i. e., that an agent’s credences after learning should match her pre-learning credences conditional on those propositions in which she becomes certain. Formally, when prob(*) represents an agent’s prior credences, PROB(*) represents her posterior credences, and e represents the proposition learned, her posterior credence in any arbitrary proposition, a, should satisfy: Standard Conditionalization

At least sometimes (and perhaps usually), we learn something but become certain of nothing that we can express. Suppose, for example, that because Kramer is unable to make it to the fairgrounds on race day, he never observes that the track is muddy. Instead, while running errands, he looks westward and catches a glimpse of the sky. Though he can’t say precisely what he saw, he becomes more confident that the track is muddy — specifically, his credence shifts from 0.3 to 0.6.4 Kramer learns something — his glance at the western sky provides information that alters his beliefs — but he does not become certain of any proposition that he can express. Call an instance of ineffable learning any case in which an agent gains no certitudes that she can express but, as a result of some learning experience, shifts her probability distribution over some initial partition of proposition(s) b1, b2,… bn.5 Were the agent able to recharacterize what she learns in terms of something in which she becomes certain, then she could update by standard conditionalization. But in cases of

PROB(a) = prob(a|e)

So in Kramer’s case, if, before learning the track is muddy, his credence that Mother wins conditional on the track being muddy is 0.8, and his credence that Mother wins conditional on it not being muddy is 0.1, then Kramer should have an unconditional credence of 0.8 that Mother will win after learning of the track’s muddiness. This much seems clear. But while standard conditionalization is

4. In New York City (where Kramer lives), weather typically comes from the west. 5. It may be controversial whether every case that might intuitively be characterized as ineffable learning is correctly modeled in terms of moving from one particular probability distribution to another. Though we believe this merits attention, we have nothing to say about it here. Instead, we treat ‘ineffable learning’ as a term of art that picks out all and only those cases in which some agent does not become certain of any proposition that she can express, but does (as a result of the some learning experience) shift her credences over some initial partition of propositions, b1, b2,… bn, from one probability distribution to another.

2. This case mirrors one of Jeffrey’s own (1983, pp. 169–171). 3. For the sake of readability, we sometimes refer to credences and subjective probabilities as beliefs.

philosophers’ imprint

–  2  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

ineffable learning — i. e., cases where the agent by hypothesis cannot do this, and instead finds her confidence shifting in some initial partition of propositions — updating by standard conditionalization is of no use. Clearly, though, agents should update their beliefs in light of ineffable learning. Since Kramer believes Mother is more likely to win if the track is muddy and has become more confident that the track is in fact muddy, he should seemingly be more confident that Mother will win. But how much more? To answer this, we need some updating rule that doesn’t make reference to any proposition that Kramer learns with certainty. Jeffrey (1983) provides one. He advises that, in at least some cases of ineffable learning, agents should Jeffrey conditionalize. JC dictates that when a learning experience shifts an agent’s credences over some initial partition of propositions b1, b2,… bn, her posterior credence in any arbitrary proposition a should be calculated using the following formula:

more confident that Mother wins (specifically, updating his credence from 0.31 to 0.52). He is not as optimistic as he would be were he certain that the track was muddy. Nor is he as pessimistic as he would be were he certain that it wasn’t muddy. It thus seems intuitive for Kramer to update by JC. III.  Jeffrey Conditionalization and Rigidity But just because JC works for Kramer doesn’t mean that it will work for you, too. Not always, anyway. It is broadly acknowledged that there are cases where updating by JC leads one astray, so there is need for an account of when it is and is not appropriate to Jeffrey conditionalize. Jeffrey himself appreciated that JC shouldn’t be applied willynilly. Rather, he claimed that JC is a valid updating rule if and only if an agent’s prior and posterior credences, conditional on each of the ineffable-learning-affected propositions, should match. This condition has become known as Rigidity: Rigidity

Jeffrey Conditionalization

At first blush, this seems to fit the bill — it’s kosher to update via JC just in case Rigidity holds for the propositions in your belief network.6 But while it is, strictly speaking, true that JC yields the correct verdict when and only when Rigidity is satisfied, this proposal is uninformative. Given the axioms of probability theory, JC and Rigidity are interderivable. So, given some reasonable shift in an agent’s initial partition, if it is rational for her conditional credences to remain invariant across the learning experience (i. e., if Rigidity is reasonably assumed), then updating via JC is rational (because it makes the rest of her unconditional beliefs cohere with the shift in her initial partition and her static conditional credences). But, as Jeffrey admits (1970, pp. 178–179) in response to

So after Kramer’s glimpse at the sky shifts his credence that the track is muddy (m) from 0.3 to 0.6, JC prescribes that his posterior credence that Mother wins (w) be 0.52. Kramer’s Credences

Again, this seems reasonable enough. Applying JC yields a coherent (probabilistic) credence distribution that incorporates Kramer’s increased confidence that the track is muddy, according to which he is

philosophers’ imprint

PROB(a|bi) = prob(a|bi)

6. Because Jeffrey (1983) is doing decision theory, he speaks in terms of “preference rankings”. Because we’re not doing decision theory, we speak in terms of “belief networks”.

–  3  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

Levi (1967), neither the shift in the initial partition nor the satisfaction of Rigidity is trivially rational.

ourselves in situations wherein our confidence in some proposition does shift from one non-extreme value to another in light of some ineffable experience (regardless of whether it does so rationally). Our hope is to provide agents with advice about how to update given this initial shift. And this motivates our focus on Rigidity — given a shift on some fixed initial partition, it is rational to update by JC if and only if the agent’s prior and posterior credences, conditional on each element of the initial partition, should remain invariant across the update.

To judge the soundness of a shift from p to p’ […] we must not only look at the two belief functions and their differences; we must also inquire into the forces which prompted the change — the dynamics of the situation. This is another way of saying that while coherence is the only condition I can formulate that seems necessary for reasonableness of belief functions in all cases, each particular case must be examined with an eye to the agent’s particular situation. […] Of course one can, in a rather question-begging way, give necessary and sufficient conditions for the appropriateness of [JC]: it is necessary and sufficient that it be sensible for the partitioning conditions to hold of the agent’s original belief function, and that the Rigidity conditions hold between his old and new belief functions — in view of whatever the agent’s situation happens to be at the time of the change. […] But none of this is much real help. […] [T]he genus of cases in which [JC] is appropriate is one I do not know how to characterize clearly and non-circularly.

IV.  Rigidity and Origination Though Jeffrey says in the passage cited above that he has no independent characterization of when Rigidity is satisfied (and therefore no characterization of exactly when it is appropriate to update by JC), he gestures towards some characterization in various places in his canon. For example, when stating the need for JC, Jeffrey (1983, p. 168; emphasis added) writes: […] if a is a proposition in the agent’s preference ranking but is not one of the n propositions whose probabilities were directly affected by the passage of experience, how shall PROB a be determined? And, when explaining JC and Rigidity, where bi represents the agent’s initial partition, Jeffrey (1983, p. 169; emphasis added) claims:

Here we take up the question of when “the agent’s particular situation” licenses the assumption of Rigidity across the prior and posterior probability distributions. In so doing, we set aside the important question — forcefully raised by Levi (1967) — about whether the shift that constitutes an agent’s ineffable learning experience is itself justified. Though we agree with Levi that sometimes it is not,7 we often find

[JC] is applicable in exactly the case where the change from prob to PROB originates in bi in the sense that [Rigidity] holds for any a in the preference ranking.

7. For example, it would be irrational for Kramer to shift his confidence that some cigars are Cuban after glancing at the sky (provided, of course, that Kramer doesn’t initially take the appearance of the sky to provide any evidence about the nationality of the cigars).

philosophers’ imprint

What could Jeffrey mean here? Since he suggests above that we must think about “the forces which prompted” the initial shift in confidence in order to determine whether updating by JC is rational, it would

–  4  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

be helpful if we had some means of representing whatever prompts the agent’s shift in confidence in the initial partition. Luckily, we do. Though we cannot specify the content of what is learned during an ineffable learning experience (because it is ineffable), we can follow Jeffrey (1983) and Pearl (1988) in representing whatever is learned with a dummy proposition — i. e., a proposition that expresses what the agent would have learned with certainty were she capable of expressing it.8 So, for example, in the case of Kramer, the dummy proposition represents whatever Kramer saw in the sky that led him to become more confident that there would be mud on the tracks. Likewise, if a poker player became more confident that she would win the pot because of something that she saw (but couldn’t express) on her opponent’s face, then the dummy proposition would represent whatever she saw on her opponent’s face. The dummy proposition allows us to represent ineffable learning

as updating via standard conditionalization on a dummy proposition.9 Thus when Kramer glances at the sky and his credence that the track is muddy shifts from 0.3 to 0.6, we represent Kramer as if he becomes certain of some ineffable-weather-proposition, d. Given standard conditionalization, then, all of Kramer’s posterior credences should be equal to his prior credences conditional on the dummy proposition: PROB(*) = prob(*|d) Though the dummy proposition yields tools for understanding ineffable learning in terms of standard conditionalization, it does not (on its own) provide tools for determining precisely how an agent should update her beliefs in light of some ineffable learning experience. This is because the ineffability of the dummy proposition prevents it from entering into explicit prior conditional probability judgments. For example, though it is true (in some sense) that the poker player should update her probability distribution in correspondence with her prior conditional probability judgments given d (i. e., given that her opponent’s face would look as it wound up looking), we lack access to these conditional probability judgments (if they can even be said to exist),10 since we lack the ability to describe what exactly the poker player sees.11

8. It may seem to some as though nothing is learned with certainty in contexts of ineffable learning (regardless of whether it can be expressed), but this view does not hold up for at least three reasons. First, as we discuss in detail in Section VII, nearly every case that appears in the literature seems to be describable in terms of learning something ineffable with certainty. In Jeffrey’s classic candlelight cases, for example, the agent’s credence that some cloth is a particular color changes because of how the cloth appears in candlelight. The agent plausibly learns that the cloth appears that way with certainty (even though she cannot describe what she sees). Second, as Skyrms (1980) recognizes, even if the agent sometimes does not learn about anything other than her psychology with certainty, she plausibly does learn with certainty that her confidence in the initial partition shifts. (In cases where this is all that the agent learns with certainty, as we discuss in Section VII, the dummy proposition can be construed as representing the agent’s new degree of confidence.) Third, as Diaconis and Zabell (1982, Theorem 2.1) show and Kyburg (1987, pp. 281–282) makes clear, there exists some proposition that the agent can be modeled as learning with certainty and conditionalizing on (even though the relevant proposition is not in the agent’s actual belief network) in every case where it is appropriate to update by JC. Thus, even though their proposition need not represent our dummy proposition (since their proposition matches our dummy proposition only when updating by JC is rational), their result does show that there is something the agent can be modeled as learning with certainty when the agent updates by JC. So Diaconis, Zabell, and Kyburg effectively open the door for speaking in terms of what the agent learns with certainty in the context of JC.

philosophers’ imprint

9. Our predecessors who utilize the dummy proposition (Jeffrey 1983 and Pearl 1988) likewise adopt this strategy. 10. Whether the poker player has such a conditional probability judgment depends on what subjective credences are — e. g., whether they are behavioral dispositions, explicit judgments, etc. Though this question merits attention — and perhaps even special attention in the context of determining whether to update by JC — it lies beyond the purview of this paper. 11. When we speak in terms of our lack of access, we intend to include both the poker player and the poker player’s audience in the class of people who lack access. Thus we (the audience) lack access to the poker player’s prob(*|d) and thereby cannot use just this information to evaluate the poker player’s rationality. But the poker player, too, lacks such access, and thereby cannot determine what her beliefs should be by consulting prob(*|d) .

–  5  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

But this does not render the dummy proposition entirely useless when determining how to update in light of ineffable learning. Though d cannot enter into the agent’s explicit conditional probability judgments, there are qualitative judgments that the poker player can make about the relevance of the dummy proposition to her beliefs in other propositions. For example, she can be sure that d is irrelevant to what song the casino will play next, and plausibly relevant (in some sense) to what cards her opponent is holding (since her opponent’s apparent psychology is plausibly informed by the cards in her hand).12 Thus the dummy proposition allows us to say something about the agent’s learning situation, and thereby at least opens the door for consideration of whether “the forces which prompted” an agent’s shift in confidence license updating by JC. With the dummy proposition in hand, we can naturally characterize what it is for a shift to originate in some initial partition as follows:

This definition makes sense of Jeffrey’s speaking in terms of the updated probabilities being “directly affected by the passage of experience” and likewise squares with others’ understanding of when to update by JC.14 Moreover, since it is plausible that we can gauge whether d is relevant to other propositions in an agent’s belief network (without knowing the content of d), it is plausible that Origination gives us an independently graspable sense of when Rigidity should be satisfied, and that it can thereby tell us when and when not to Jeffrey conditionalize. Specifically, one should update by JC if and only if the shift that results from one’s ineffable learning experience originates in {bi}. But does Origination deliver the desired results? At least sometimes, it does. Imagine that Elaine is considering joining Kramer at the fairgrounds, but that she doesn’t want to go unless her acquaintance, Puddy, is there. For some reason, Puddy loves to go to the track on rainy days; he keeps his eye on the sky and goes when it looks like rain. Since the track tends to get muddy when it looks like rain (and thereby tends to be muddy when Puddy is there), Elaine’s credence that Puddy is at the track (p) conditional on the track being muddy (m) is 0.8. Her credence that Puddy is at the track conditional on the track not being muddy is 0.1. Elaine glances

Origination: For any case in which ineffable learning shifts an agent’s probability distribution over some initial partition of proposition(s) b1, b2,… bn, and where d is a dummy proposition that represents the content of that learning experience, the shift originates in the initial partition {bi} if and only if there are no propositions in the agent’s belief network outside of {bi} that are directly evidentially relevant to d (where a is “directly evidentially relevant” to d if and only if a is both unconditionally correlated with d and correlated with d conditional on bi).13

evidentially relevant to D if and only if A is correlated with D, but directly evidentially relevant to D only if they remain correlated conditional on any value of B. 14. This characterization is not Jeffrey’s own. But we take his language to suggest something like this understanding. It is also suggested by both Pearl (1988) and Skyrms (1980). Pearl uses undirected graphs to represent this exact constraint (complete with the dummy proposition and all), and suggests that he is explicating Jeffrey when doing so. Skyrms (1980, p. 125) suggests that it is appropriate to update by JC when there are no propositions in the agent’s belief network that are directly evidentially relevant (in the sense specified above) to the proposition that the agent’s degree of belief in the initial partition takes on its new value (after the ineffable learning experience). Thus Skyrms espouses Origination, but the “dummy proposition” that he uses is different from ours insofar as its content is not ineffable, but rather represents the fact that one’s credence shifts to its new value. We discuss the relationship between our causal understanding of when to update by JC and Skyrms’s “higher-order degree of belief” understanding in Section VII.

12. There are multiple kinds of relevance that one can make qualitative judgments about. For example, one can judge that the dummy proposition is causally (ir)relevant to some other proposition, and can likewise judge that dummy proposition is evidentially (ir)relevant to some other proposition. One of the key points of our paper is that qualitative judgments of causal (ir)relevance bear many fruits in contexts of ineffable learning. 13. Why think that this captures direct evidential relevance? Because A is

philosophers’ imprint

–  6  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

westward. Upon seeing the sky, her credence that the track is muddy shifts from 0.3 to 0.6. Though Elaine cannot say exactly what she saw, she knows that Puddy bases his decision whether to go to the track on what she saw (since Puddy bases his decision on the very same view of the sky). Should Elaine update by JC?

is mud on the tracks), it seems like Elaine’s posterior credence that Puddy is at the fairgrounds should not be entirely determined by her shift in credence about the state of the track, and should instead shift some independent amount that corresponds to the effect that the weather has on Puddy’s presence. Elaine’s mistake, then, is using JC to update her credence in something — that Puddy would be at the fairgrounds — that is (relative to the propositions at hand) directly evidentially relevant to the content of her learning experience (since the correlation between Puddy’s presence and whatever Elaine learns about the sky persists even if we condition on any particular state of the track). This is precisely the kind of JC update that Origination bars. But just because Origination is useful to Elaine doesn’t mean it will be useful for you, too. In the next section, we find that Jerry is not so lucky.

Elaine’s Credences

Here we have a case of ineffable learning that resembles Kramer’s in some respects but where applying JC yields a counterintuitive result. If Elaine uses her initial shift in confidence that the track is muddy to update her belief that Puddy will be at the fairgrounds, then she will become only somewhat more confident that he will be there — specifically, she will update her credence from 0.31 to 0.52. But that can’t be right. Elaine knows that Puddy bases his decision whether to go to the track on his view of the sky. So when Elaine glances west, not only should her credence that the track is muddy shift some amount, but she should likewise become very confident that Puddy will be at the track (since she knows that he nearly always goes when the sky suggests rain). Thus, even though updating by JC guarantees a coherent posterior credence distribution, it is inappropriate for Elaine to do so for reasons that have nothing to do with (synchronic) coherence. What exactly goes wrong when Elaine updates by JC? The problem is that she updates her confidence that Puddy will be there in light of the shift in her credence that the track is muddy. But this is problematic because this update does not take into consideration the way in which what she saw in the sky is directly relevant to the probability that Puddy will be at the track. Since Puddy bases his decision to go to the track on the appearance of the sky (no matter whether there

philosophers’ imprint

V.  The Limits of Origination Suppose that Jerry is considering joining Kramer at the race, but that Jerry hates going to the fairgrounds when Newman is there. Suppose further that Newman has the (somewhat weird) habit of going to the fairgrounds on his days off from work and the (somewhat annoying) habit of hosing down the racetrack when he goes.15 Consequently, Jerry’s credence that Newman is at the fairgrounds (n) conditional on the track being muddy (m) is 0.8, and his credence that Newman is at the fairgrounds conditional on the track not being muddy is 0.1.16

15. It is important that whether Newman has work on a particular day has nothing to do with the weather. We can imagine that Newman is a noble postman, and that he accordingly works come rain or shine. 16. As we discuss in Section VII, it is important to remember that these conditional probability judgments do not entail that Jerry should be 0.8 confident that Newman is at the fairgrounds whenever he learns with certainty that the track is muddy and 0.1 confident whenever he learns with certainty that the track is not muddy. If Jerry learns with certainty that the track is muddy as well as something else (e. g., that it is raining), then Jerry’s conditional probability judgment does not entail that Jerry should be 0.8 confident that Newman is at the fairgrounds.

–  7  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

Imagine that while pondering whether to go, Jerry glances westward, and his credence that the track is muddy shifts from 0.3 to 0.6.

but sanctioned by Origination. We thus need some understanding other than Origination of when and when not to Jeffrey conditionalize.

Jerry’s Credences

VI.  Causation and Rigidity The lesson of the preceding section is that there are cases of ineffable learning in which it is irrational to assume Rigidity even though one’s shift in confidence originates in the initial partition. This suggests that we should search for some other characterization of the cases where it is rational to assume Rigidity. As a first step towards doing so, using the dummy proposition to characterize the shift from prob to PROB, we can restate Rigidity as follows:

Here again we have a case of ineffable learning that is very similar to Kramer’s. But if we blindly apply JC to Jerry’s belief that Newman is at the fairgrounds, we get a deeply counterintuitive result. JC recommends that, as a result of his shift in credence that the track is muddy, Jerry update his credence that Newman is at the track from 0.31 to 0.52. But that’s crazy. The sky evidence is nothing that Jerry can state, but whatever it is, it is not relevant to whether Newman is at the track (since Newman does not base his behavior on the weather). Nevertheless, JC demands that Jerry become more confident that his nemesis will be there. Critically, unlike Elaine’s mistake, Jerry’s mistake was not one of updating his credence in something that was directly evidentially relevant to the weather. Indeed, intuitively, Jerry’s ineffable evidence — i. e., what he learned with certainty but cannot express — is not evidence at all for whether Newman is at the track (since Newman’s presence is evidentially independent of the weather). So Origination does not bar Jerry from updating by JC even though doing so is a mistake. And this is worrisome because Jerry’s case is perfectly ordinary. His mistake was updating by JC when there was a proposition in his belief network that he took to be correlated with whether the track is muddy, but evidentially independent of what he (ineffably) learned. (This is possible since probabilistic dependence is not transitive.) As we’ll see in the next section, Jerry’s case is just an example of a broad class of cases in which updating by JC is irrational

philosophers’ imprint

New Rigidity

PROB(a|bi) = prob(a|bi &d)

Substantively, this is no different from Rigidity as originally stated.17 But the dummy proposition helps to elucidate a few interesting aspects of Rigidity. First, it reveals that Rigidity expresses a screening-off condition according to which bi screens off a from d. Or, put differently, Rigidity states that a is probabilistically independent of d conditional on bi. Second, it suggests a promising strategy for an independent account of when Rigidity is rationally satisfied. This is because facts about screening-off relations are intimately related to facts about causal relevance, and we often have some independent grasp of facts about what causes what in cases of ineffable learning. The most promising approach to causation that is rooted in screening-off facts is the graphical approach to causal modeling found in, e. g., Pearl (2009) and Spirtes, Glymour, and Scheines (2000). In order to see the precise relationship between these facts and facts about causal relevance, we must introduce some machinery in the causal modeling toolbox. In the graphical approach to causal modeling, 17. New Rigidity is equivalent to (old) Rigidity given standard conditionalization and our characterization of ineffable learning in terms of updating on a dummy proposition.

–  8  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

a hypothesis about what causes what can be represented as a directed acyclic graph (DAG), where a DAG graphically represents the causal relations that obtain among a set of variables V as a set of directed edges (or arrows), such that no directed path forms a cycle. Since a variable is just a partition (that takes on a particular value for each element), we can represent the propositions in an agent’s belief network as a set of variables that take on distinct values when various propositions obtain. For example, in the simplest case, a variable can represent whether a single proposition or its negation obtains. Likewise, if there exists a set of propositions in the belief network that are mutually exclusive and collectively exhaustive, a variable can represent the entire partition, taking on a particular value for the truth of each proposition. So if we represent propositions (values of variables) with lowercase letters, and the partitions of which they are members (variables) with uppercase letters, then everything in an agent’s belief network is representable in terms of causal graphs.18 To illustrate, recall Kramer’s case. Since the muddiness of the track affects whether Mother wins, one can represent Kramer’s beliefs about the causal relations that obtain among the things in his belief network with the following DAG, where M represents whether the track is muddy, and W represents whether Mother wins:

the (ineffable) proposition that Kramer learns but cannot express.19 Since it is natural to believe that the state of the track is caused by whatever is in the sky, and a cause of whether Mother wins, one can represent the causal relations implied by Kramer’s artificial belief network in the following DAG, with D representing whatever Kramer sees as he glances at the sky: Figure 2: One might worry that by appealing to the causal relations that obtain among the variables in Kramer’s artificial belief network, we have ceased to talk about Kramer. But we don’t think that’s right. The ability to precisely say what you learn is not a prerequisite for having beliefs about the causal relations that the content of your learning experience enters into. For example, though Kramer cannot be sure of d’s content, he can be sure that D is not causally upstream from M since the question of whether d obtains (whatever d is) is settled before the question of whether there will be mud on the tracks.20 Though DAGs like these are nothing more than pictorial representations of causal relations, causal modelers make assumptions about directed edges that render these graphs full of valuable information. Chief among these assumptions is the Causal Markov Condition, which, according to Hausman and Woodward (1999), is “implicit in the view that causes can be used to manipulate their effects”.

Figure 1: But this alone is not enough to tell us anything about whether it is appropriate for Kramer to update via JC given a shift in M. For this, we need some way to represent how Kramer’s beliefs about M and W relate to the content of the learning experience that prompts his shift in M. This is just what an artificial belief network provides, where an artificial belief network includes not only the variables in Kramer’s actual belief network, but also a variable that represents the truth-value of

19. The values of the variable are the dummy proposition and its negation. The negation of the dummy proposition is likewise ineffable. (Were the negation not ineffable, then the dummy proposition itself could not be said to be genuinely ineffable since one could precisely characterize it as the negation of the negation.)

18. We have adopted this way of representing propositions throughout the paper.

20. Though in this case D is causally upstream of the initial partition, B, there can likewise be cases where D is causally downstream of B — e. g., when the initial partition is about some past event that is a cause of D.

philosophers’ imprint

Causal Markov Condition (CMC): The CMC is satisfied by a given DAG and probability distribution if and only

–  9  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

if every variable X in V is probabilistically independent of its nondescendants conditional on its parents, where X’s parents are X’s most immediate causal predecessors, and X’s nondescendants are the variables in V that are not causally downstream from X.21

agent represents her belief network (or artificial belief network) with some DAG, she must be sure to include every common cause of any two variables contained therein. The CMC likewise entails that some probability distributions are incompatible with certain DAGs. Pearl (2009, pp. 16–17) has neatly summarized these implications in the graphical terms of colliders and d-separation.

The CMC is a bit of a mouthful, but we can better understand some of its implications, first, by considering its relation to Reichenbach’s (1956) Principle of the Common Cause and, second, by considering its relation to the hugely influential d-separation criterion. The CMC entails a version of Reichenbach’s principle that says that if variables F and G are correlated, then either F (directly or indirectly) causes G, G (directly or indirectly) causes F, or F and G are (direct or indirect) joint effects of some common cause.22 This means that the CMC is plausibly assumed only of variable sets that are “causally sufficient” — i. e., of variable sets for which it is the case that every common cause of any two or more variables in V is in V.23 So when an

collider: A variable is a collider along a path if and only if it is the direct effect of two variables along the path. (So M is a collider along I→M←J but not along I←M→J or I→M→J.) d-separation: A path between two variables, X and Y, is d-separated (or blocked) by a (possibly empty) set of variables, Z, if and only if i. the path between X and Y contains a non-collider that is in Z, or ii. the path contains a collider, and neither the collider nor any descendant of the collider is in Z.24

21. Sometimes (e. g., in Spirtes, Glymour, and Scheines 2000) the CMC is weakened such that it entails only that every variable in V is probabilistically independent of its nondescendants other than its parents conditional on its parents. But since it is standardly thought that X cannot be correlated with Y conditional on some value of X, we follow Hausman and Woodward (1999) in adopting this stronger (and easier to parse) version of the CMC.

Given the CMC, if every (undirected) path between a pair of variables in V is d-separated by Z (according to a given DAG), then the pair of variables must be probabilistically independent of each other conditional on any assignment of values over Z.25 (Geiger and

22. Why does the CMC entail this principle? In the event that some pair of variables is dependent and neither is causally upstream from the other, there must exist some parent(s) of both variables on which one can condition to render the relevant variables independent. So it is provable of every system of variables that satisfies the CMC, first, that if (i) F and G do not (directly or indirectly) influence one another and (ii) F and G are probabilistically dependent, then there exists a set C of variables not containing F or G but causing both, and, second, that F and G must be independent conditional on any assignment of value(s) over C.

not currently sure exactly how causal sufficiency can be plausibly weakened, we leave this task for later. 24. It may be somewhat difficult to see how this applies when there are multiple colliders along a path. The presence of a single collider along a path that is excluded from Z and whose descendants are excluded from Z is sufficient for the path’s being d-separated by Z.

23. Assuming the CMC of a variable set that is not causally sufficient often results in positing spurious causal relationships. It is also worth mentioning that it seems possible to weaken the requirement that one must attend to every common cause of any two or more variables in V by requiring the agent to attend only to some important subset of the common causes of any two or more variables in V (because, for example, the agent can leave out distal common causes of X and Y in the event that she has included some more proximate common cause of X and Y that screens off the distal cause). But since we are

philosophers’ imprint

25. Why think that conditioning on a collider (or its descendant) may induce a dependence? Elwert and Winship (2014, p. 36) provide the following example to make it intuitive:

–  10  –

Consider the relationships between talent, A, beauty, B, and Hollywood success, C. Suppose, for argument’s sake, that beauty and talent are

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

Pearl [1989] and Verma [1987] prove that the d-separation criterion characterizes all and only the conditional independence relations that follow from satisfying the Markov condition for a DAG.) So, from a given DAG, we can determine when certain variables must be probabilistically independent of each other. For example, given the DAG of Kramer’s artificial belief network, we can determine that D must be independent of W, conditional on any value of M. This relationship between DAGs and screening-off relations suggests that we can use causal information to infer whether Rigidity is satisfied and, consequently, whether it is appropriate to update via JC in certain cases. For example, since Kramer finds his credence shifting in the proposition that the track is muddy, it is permissible for him to update his credence that Mother will win because his knowledge of the causal structure at hand tells him that M screens off W from D. And this is the response we were hoping for. Generally, given the CMC, where A represents any arbitrary proposition in an agent’s actual belief network not among the initial partition, if every path between the dummy variable, D, and A is d-separated by B, then it is appropriate to update via JC. This is our Causal Updating Norm:

between A and D is d-separated by B for any arbitrary A in the agent’s artificial belief network.26 So how does CUN fare for Elaine and Jerry? The following two DAGs appear to give us a handle on the causal details of their respective scenarios. Allow N to represent whether Newman is present at the track and P to represent whether Puddy is present at the track. Figure 3:

Figure 4:

We’ve seen that CUN licenses Kramer to update his belief that Mother will win since M d-separates W from D (Figure 2).27 And we can now see that CUN does not license Elaine to update her belief that Puddy will be at the track via JC since M does not d-separate P from D (Figure 3). So far, so good; like Origination, CUN delivers the intuitive results for Kramer and Elaine’s cases. But what about Jerry? Does CUN succeed where Origination fails? Looking at the DAG of Jerry’s scenario (Figure 4), we see that M is a collider on the only path between N and D. Thus N is not d-separated

Causal Updating Norm (CUN): For any case in which ineffable learning shifts an agent’s probability distribution over some initial partition of propositions, B, it is appropriate to update by JC if and only if every path

unassociated in the general population […]. Suppose further that beauty and talent are separately sufficient for becoming a successful Hollywood actor. Given these assumptions, success clearly is a collider variable. Now condition on the collider, for example, by looking at the relationship between beauty and talent only among successful Hollywood actors. Under our model of success, knowing that a talentless person is a successful actor implies that the person must be beautiful. Conversely, knowing that a less than beautiful person is a successful actor implies that the person must be talented. Either way, conditioning on the collider (success) has created a spurious association between beauty and talent among the successful.

philosophers’ imprint

26. Pearl (1988) endorses a norm that superficially resembles CUN. But Pearl’s norm is stated in terms of the separation (rather than d-separation) of variables in an undirected (rather than directed) graph. Since Pearl’s undirected graphs represent nothing more than probabilistic relationships, his norm can be understood in terms of Origination but not causation. 27. Following Pearl (2009, p. 17), we sometimes speak of two variables being dseparated by another. Specifically, Z is said to d-separate X from Y if and only if Z d-separates every path between X and Y.

–  11  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

from D by M, and CUN therefore (correctly) instructs Jerry not to update by JC. This highlights an important class of ineffable learning cases where it is a mistake to update by JC — namely, cases where the initial partition, B, is a collider (or a descendant of a collider) between D and (at least) one of the other propositions in the agent’s belief network. Absent considerations of causal relevance, these cases go unnoticed.28 But CUN is designed to handle cases like Jerry’s (since the initial partition is a collider in Jerry’s belief network). And though Jerry’s case is just one example, this sort of case is ubiquitous. Indeed, it will arise when an agent (i) takes the initial partition (B) to be causally downstream from her ineffable evidence (D), and (ii) has credences about states of affairs (A) that are causally upstream from the initial partition. In such cases, Origination (incorrectly) says that it is appropriate to update by JC, whereas CUN (correctly) says that it is inappropriate. So there appears to be a significant class of cases in which it is inappropriate to update by JC that neither Jeffrey nor his commentators anticipate.

(because M is a collider), the CMC does not tell us anything about the probability distributions over variables that are not d-separated (it tells us only about probability distributions over variables that are). In order to infer that it is inappropriate for Jerry to update by JC, we must help ourselves to some additional assumption(s) about the set of probability distributions compatible with causal paths that are not d-separated.30 CUN is exceptionless only given the most discussed assumption of this sort — the Causal Faithfulness Condition. Causal Faithfulness Condition (CFC): The CFC is satisfied by a given DAG and probability distribution if and only if every conditional independence relation that obtains in the probability distribution is entailed by the CMC. The CFC straightforwardly entails that if there exists a path between two variables that is not d-separated, then the relevant variables must be probabilistically dependent. If this principle were uncontroversial, then it would vindicate CUN. But it’s not. Though many philosophers believe that there are no counterexamples to the CMC (at least in the macroscopic world),31 it is nearly universally acknowledged that there are counterexamples to the CFC. For example, Hesslow (1976) famously described a case in which birth control causes thrombosis, but also reduces the risk of thrombosis by reducing the risk of pregnancy (which causes thrombosis). If the parameters of these causal relationships line up just right, then it is plausible that the probabilistic effect of taking the birth control that is mediated by pregnancy will cancel out the probabilistic effect that is due to the pill’s direct effect on the body. So even though birth control and thrombosis are not d-separated (along

VI.  The Causal Faithfulness Condition and When Not to Jeffrey Conditionalize Though CUN appears to deliver intuitive results when applied to the Seinfeld cases,29 there is still an important wrinkle to iron out. Specifically, we need to explain why (and in what sense) it is permissible to update by JC only when the relevant d-separation facts obtain, given that that the CMC entails only that it is appropriate to Jeffrey conditionalize when the relevant d-separation facts obtain. Consider Jerry’s case once more. CUN says that it is inappropriate for Jerry to update by JC, but the CMC alone does not license this verdict. Though we know that N is not d-separated from D by M 28. Indeed, these are the cases that evade Pearl’s (1988) treatment in terms of undirected non-causal graphs.

30. Paths that are not d-separated are d-connected. To avoid utilizing more jargon than we already have, though, we speak solely in terms of d-separation.

29. Kramer, Jerry, Elaine, and (the soon to be discussed) George are characters from the popular 1990s American television show, Seinfeld.

philosophers’ imprint

31. Spirtes, Glymour, and Scheines (2000) make this point.

–  12  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

both the direct and indirect paths), it seems entirely possible that they are probabilistically independent because the paths BC→P→TH and BC→TH could very well cancel each other out. (See Figure 5.)

probabilistically independent of D conditional on any value of M, it appears that Rigidity is satisfied, and, therefore, that it is appropriate to update by JC (contrary to the advice of CUN). It is easy to see how such cases bear on CUN’s status as an evaluative norm.33 When there are failures of faithfulness, CUN sometimes incorrectly says that it is inappropriate to update by JC. But it is harder to see how such cases bear on CUN’s status as a prescriptive norm.34 In order to assess CUN’s prescriptive status, it is helpful to consider how and whether adopting CUN would lead Jerry astray when dealing with the above unfaithful system.35 Whether CUN leads Jerry astray depends on what Jerry should do when CUN tells him not to update via JC. Thus far, we have not given any advice to agents who learn that it is inappropriate to update by JC in some instance of ineffable learning. But, of course, just because you shouldn’t update by JC doesn’t mean that you should do nothing at all. Indeed, were you to do nothing — i. e., were you to update just the initial partition in light of some ineffable learning experience — then you would risk incoherence (since we often have propositions in our belief networks other than bi whose probabilities should shift upon undergoing ineffable learning experiences). Our official story (for the purposes of this paper) is that we are silent about what to do when CUN says not to update by JC (though we do present some tentative suggestions in Section VII). Still, it is important to understand what our silence amounts to in the grand scheme of things. Though CUN relies upon the CFC insofar as it gives necessary conditions for the appropriateness of updating via JC, CUN does not rely upon the CFC in its specification of sufficient conditions. This is because the CMC

Figure 5:

What do such counterexamples mean for the status of CUN? In order to answer this question, we must determine, first, exactly what sort of trouble they pose when they occur, and, second, whether they occur frequently enough to prompt genuine concern. In order to assess what sort of trouble is stirred by these counterexamples, it is helpful to consider a case in which the CFC fails. Imagine a revised version of Jerry’s case in which Newman, in addition to being disposed to hose down the track, is also weather-obsessed (like Puddy from Elaine’s case) and likes to go to the track when it rains. In this case, it is reasonable to think that Newman’s presence causes the track’s muddiness and is affected by the state of the weather. Thus we can represent this revised Jerry case with the following DAG:

Figure 6:

Now suppose that there is a failure of faithfulness resulting from the fact that the dependency induced between N and D by conditioning on the collider, M, is cancelled out by the dependency along the direct path from D to N. CUN forbids updating by JC in this case because M does not d-separate N from D.32 Is CUN right? Since N is

33. An evaluative norm is just a norm that allows us to evaluate whether or not some update is rational or irrational. 34. A prescriptive norm is a norm that gives agents advice about how they should update. 35. Until now, our discussion of when and when not to Jeffrey conditionalize could be given either an evaluative or a prescriptive gloss. It is only with regard to the points that immediately follow that we take this distinction to be important for our project.

32. Both because M is a collider on a path between D and N and because D is a direct cause of N.

philosophers’ imprint

–  13  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

by itself justifies that it is appropriate to update by JC when every path between D and any arbitrary A is d-separated by B. This means that if you take our advice — i. e., update by JC when CUN’s conditions are satisfied — then you will not be led astray. In the event that CUN bars updating by JC, we have no advice to give (even though we of course think that you should update your beliefs in some way). But this doesn’t worry us much. It seems better to have some advice than no advice, and we would rather be silent than wrong. Of course, it is still useful to know definitively when it is inappropriate to update by JC (since this knowledge settles whether updating by JC is appropriate). So it is reasonable to wonder how frequently failures of faithfulness lead CUN to deliver false verdicts of inappropriateness (in order to establish CUN’s heuristic value). Towards this end, Spirtes, Glymour, and Scheines (2000, pp. 68– 69) have proved that unfaithful causal systems are Lebesgue measure 0 with respect to possible causal systems, while faithful systems are measure 1.36 At first blush, this appears to demonstrate that failures of faithfulness are so rare that you should not expect to ever actually encounter one. So it may seem as though CUN is exceptionless in the sense that it will actually never lead you astray (even though it will possibly lead you astray). This would be good news for CUN, but we unfortunately don’t think it is quite right. Spirtes, Glymour, and Scheines’s proof makes use of a principle of indifference insofar as it assumes that all parameter values within the range of a continuous variable are equally probable (Zhang and Spirtes 2008). This strikes us as problematic not only because of the usual problems with principles of indifference, but also because there may occasionally be positive reason to expect a causal system to be unfaithful.37 But Spirtes, Glymour, and Scheines’s

proof does highlight the important fact that the CFC is violated only by a very special choice of parameter values. While this does not underwrite the conclusion that reasonable agents should have zero confidence in unfaithful systems, it does, we think, license agents to have (on average) very low confidence in unfaithful systems.38 We thus think that CUN is of strong heuristic value for determining when it is inappropriate to update by JC. So, ultimately, where does this leave CUN? If someone is sure that every path between A and D is d-separated by B for any arbitrary A in the agent’s belief network, then (relative to that someone’s beliefs), it is rational to update by JC. If, relative to someone’s causal beliefs, CUN says that updating by JC is inappropriate, then CUN’s verdict is usually correct (relative to that someone’s beliefs). By our lights, this means that CUN is an exceptionless rule for inferring appropriateness, and at least a valuable heuristic for inferring inappropriateness. It therefore seems reasonable to adopt CUN as an epistemic policy. VII.  Loose Ends The previous section wraps up our primary defense of CUN, but there remain some loose ends to tie up. First, one might wonder how CUN applies to standard conditionalization. Second, one might wonder how our causal understanding of when and when not to Jeffrey conditionalize relates to Skyrms’s influential understanding of the same issue in terms of “higher-order degrees of belief”. Third, we have been silent about what agents should do when CUN does not license updating by JC, but it is reasonable to wonder how agents should update their credences in such cases. Fourth, one might worry of faithfulness in systems that must maintain a dynamic equilibrium. But Weinberger (2017) argues that the various proposed counterexamples to the CFC that involve coordinated parameters are not genuine counterexamples.

36. The proof is limited to the linear case (because some important assumptions about the parameter space are guaranteed to be satisfied), but Spirtes, Glymour, and Scheines argue that the same moral applies to other classes of functions. We agree.

38. Even Anderson (2013), who argues that there are some contexts where we have positive reason to expect failures of faithfulness, thinks that there is no such reason for most causal systems of interest. Thus it is reasonable to have, on average, low confidence that the parameters line up in just the way required to generate a failure of the CFC.

37. When there is positive reason of this sort is complicated. Andersen (2013) and Cartwright (2001) argue that we have positive reason to expect failures

philosophers’ imprint

–  14  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

that whether CUN is rationally satisfied is no more transparent than whether Rigidity is rationally satisfied since the d-separation criterion may seem to provide nothing more than a logic of conditional independence. Finally, one might wonder how agents should update when their confidence initially shifts over multiple variables in some DAG. We’ll examine each loose end in turn.

probability of a given b and whatever else you learned (which of course need not agree with the conditional probability of a given b).39 Jerry’s error in this context is that he attempts to update his confidence that Newman is at the fairgrounds in correspondence with his conditional probability that Newman is there given that it’s muddy, even though he learns more than that it is muddy. So, in cases like Jerry’s, CUN says not to update by JC, but updating by JC (despite first appearances) does not correspond to updating by standard conditionalization.

1. How does CUN deal with standard conditionalization (given that standard conditionalization is a special case of JC)? The short answer is that our commitment to representing ineffable learning as conditionalizing on a dummy proposition entails a commitment to standard conditionalization; thus CUN effectively assumes that it is rational to update by conditionalization. Still, one may wonder what CUN says when the agent learns bi with certainty (i. e., that some member of the initial partition obtains). Strictly speaking, CUN doesn’t apply. We have defined ineffable learning such that this does not qualify (since the agent does learn something with certainty that she can express), and CUN speaks only to cases of ineffable learning. But one still might wonder how CUN would fare if its domain of applicability were extended to such cases — i. e., if we countenanced the possibility of ineffable learning when an agent learns something with certainty that she cannot express, but also learns something with certainty that she can express. Interestingly, when CUN is extended to such cases, it does not always license updating a in correspondence with the prior’s conditional probability assignment of a given b. Suppose, for example, that Jerry’s confidence that the track is muddy increases to certainty upon glancing at the sky. According to CUN, Jerry should not update by JC since M does not d-separate the only path between N and D. Is there a wrench in the works? Luckily, the answer is no. Defenders of standard conditionalization never claim that you should update in correspondence with your conditional probability of a given just b whenever you learn b with certainty, because there are contexts where you may learn more than b with certainty and where conditionalization therefore mandates updating in correspondence with conditional

philosophers’ imprint

2. We are not the first to represent ineffable learning as conditionalization on a proposition that is not included within the initial partition. For example, as we mention above, Pearl (1988) also construes ineffable learning as conditionalizing on a proposition that represents whatever the agent learns with certainty but cannot express. But not everyone who utilizes some additional proposition agrees that the proposition in question should be our dummy proposition. Indeed, Skyrms (1980) famously parts ways with our treatment. He worries that any approach described in terms of giving certainty to an “observational proposition” requires an “unacceptable epistemology of the given” and instead represents the agent as conditionalizing on a “non-observational” proposition — specifically, the proposition that her subjective probability distribution over the initial partition takes the form that it does after the ineffable learning experience. Thus Skyrms would characterize Kramer’s learning experience in terms of conditionalizing on the proposition that he (Kramer) now has credence 0.6 that the track is muddy (rather than as learning something ineffable about the sky with certainty).40 This allows Skyrms to formulate an alternative understanding of when to update by JC, according to 39. For an extensive discussion of standard conditionalization’s need for a language capable of representing everything that an agent learns with certainty, see Titelbaum (2013, chapter 8). 40. Conditionalizing on the proposition about your shift in confidence generates (what we call) the initial shift if your credence in p is n when you learn of yourself that you have credence n in p. (See Skyrms 1980 for discussion of whether and when this is rational.)

–  15  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

which it is appropriate to do so when the initial partition screens off any arbitrary proposition in the agent’s belief network (outside of the initial partition) from a variable that represents the possible subjective probability distributions that the agent could have had over the initial partition. We have concerns about Skyrms’s approach insofar as it is taken to provide general advice since (as we mention when tying up the first loose end) it seems that, upon undergoing an ineffable learning experience, an agent often learns more than just that her probability distribution over the initial partition takes its posterior form. Indeed, in nearly every case that is discussed in the literature, it is clear that the agent learns something not represented in the initial partition. This is obviously true of our Seinfeld cases, but it is also true of Jeffrey’s classic candlelight examples, where an agent’s credence that some cloth is a particular color changes because of how the cloth appears in dim candlelight. Though the agent’s credence initially shifts in the proposition that the cloth is green, she plausibly learns with certainty that the cloth appears however it does (even though she cannot describe how it appears). And since there is no reason to believe that conditionalizing on the “non-observational” proposition (that her credence that the cloth is green took its new value) will have the same effect as conditionalizing on the non-observational proposition and the dummy proposition (i. e., whatever ineffable stuff she learned about the way in which the cloth appears under candlelight), we worry that Skyrms’s understanding cannot be extended to even the most standard cases of ineffable learning. Of course Skyrms may complain that our own approach leads to an “unacceptable epistemology of the given”, but we are not sure why it is any more problematic to posit certainty of something ineffable than it is to posit certainty of one’s own psychology. After all, it is clear that we can be wrong about our subjective degrees of belief, and that you can be wrong about ours. So if it’s problematic to speak in terms of what we learn with certainty but cannot express, it seems equally problematic to speak in terms of what we learn with certainty about the agent’s beliefs.

Still, though we’re skeptical about the application of Skyrms’s approach to many classic examples of ineffable learning, we think that Skyrms has homed in on an important set of cases where all that one learns with certainty is that her degree of belief of changes in some proposition upon undergoing a learning experience. In such cases, joint application of our approach and Skyrms’s approach may generate interesting results. In order to combine the approaches, we can simply trade in our dummy variable for a variable encoding the form that one’s subjective probability distribution over the initial partition takes. Then, we simply ask whether the initial partition, B, d-separates every path between the new Skyrms variable and any arbitrary A in the agent’s belief network.41

philosophers’ imprint

3. When CUN says it is inappropriate to update by JC, we are officially silent. But unofficially, we have some ideas. There are two ways in which CUN might be violated: 1) when the initial partition is a collider (or a descendant of a collider) between the dummy variable and some other variable in the agent’s belief network (à la Jerry), and 2) when there is a proposition in the agent’s belief network whose causal relevance to the dummy variable is not mediated by the initial partition (à la Elaine). When the first kind of failure occurs, at least one proposition in the agent’s belief network is independent of D. As such, it may seem reasonable to simply leave the probability estimates for the 41. We suspect that there may be interesting generalizations that can be derived from within our framework about these contexts. For example, if, as a matter of course, one’s degree of belief is always caused by what one’s degree of belief is about — i. e., if the initial partition (about the external world) always causes the Skyrms variable (about the agent’s credences) and not vice versa — then CUN effectively reduces to an understanding in terms of Origination because the initial partition cannot be a collider (or a descendant of a collider) between B and the Skyrms variable. Now, we aren’t sure whether the antecedent of this conditional is true (especially since the initial partition may be about something in the future, and positing the causal arrow from such a partition to the Skyrms variable would require retrocausality), but we nevertheless think that it provides some evidence that there is meaningful work to be done at the intersection of our causal understanding of when to update by JC and Skyrms’s “higher-order degree of belief” understanding.

–  16  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

independent variables intact, and use JC to update any other variables in her belief network that are screened off from D by B. To illustrate, imagine that George cares about everything that Jerry and Kramer care about (because they are so dear to his heart) — he wants Mudder to win (W), and he hopes that Newman (N) won’t be at the track.

agent has viable options when her initial partition is a collider between the dummy variable and some other proposition in her belief network. But even if the above proposal works, it doesn’t solve everything. When the second kind of failure occurs — i. e., a failure that can be characterized in terms of Origination — there are variables in the belief network that are neither screened off from D by B nor independent of D. Thus one cannot simply adopt some posterior that obeys the independence constraints implied by the CMC. Though we have no general advice for such cases, it seems as though one can sometimes recharacterize the initial partition to include everything that she does take to be directly relevant to the dummy variable. (This plausibly amounts to conjoining all such variables into one by taking the Cartesian product of the variables that are directly relevant in the original V.)42 Thus Elaine can use her causal knowledge (or her knowledge of direct evidential dependencies) to establish that she should collapse P and B into one variable. Of course Elaine cannot use her confidence in B to determine the probability distribution over the collapsed variable, but at least she will be using a variable set that satisfies CUN.

Figure 7:

The thought is that George should update his belief that Mudder will win just as Kramer does (i. e., by JC), but leave his belief that Newman is at the track alone (because what he learned is independent of Newman’s presence). So even though George shouldn’t globally update by JC, it seems reasonable for him to selectively update the subset of propositions in his belief network that satisfy CUN. While this is right as far as it goes, this bit of instruction does not yield a unique coherent posterior probability distribution because (as the Law of Total Probability and a bit of algebra show) it is impossible to solve for the conditional probabilities of a given b when it is stipulated that the unconditional probability distribution over A must remain invariant across the update. Thus one cannot arrive at a unique posterior probability distribution simply by updating the probabilities of the variables that are screened off by B and leaving the others intact because this leaves open multiple possible conditional probabilities. Still, the fact that we can be sure that every variable is either independent of D or screened off from D by B suggests that we may be able to arrive at the correct update by adopting the posterior probability distribution that (i) is closest to the prior (in the sense of divergence and/or distance metrics) and (ii) obeys the independence constraints implied by the CMC. Fleshing out the details of this proposal is beyond our purposes here, but it at least suggests that an

philosophers’ imprint

4. One might worry that CUN does not provide any advice over and above New Rigidity, since the d-separation criterion gives us nothing more than a logic of conditional independence, and New Rigidity already tells us what must be conditionally independent across the update. The answer to this objection is simple: the d-separation criterion gives us more than a logic of conditional independence. Causal dependence is asymmetrical, while probabilistic dependence 42. We do not think that this strategy will work in general because there are cases where such collapsing would not seem to yield desirable results. Suppose, for example, that the dummy variable mediates some variable, C, and B. Since C and B are not oriented towards D in the same direction — i. e., since one is a cause of D and the other is an effect of D — problems would seem to arise from the fact that we’d need to arbitrarily choose a direction of the edge between D and the new collapsed variable. When collapsing does not work, if the agent’s confidence can be modelled as initially shifting in the multiple variables that are directly relevant (C and B), then it may be possible to specify the rational update through means discussed later, in Loose End 5.

–  17  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

is symmetrical; thus, causal information is not exhausted by probabilistic information. In order to account for the asymmetry of causation, philosophers of causation must identify some aspect of the causal relation that is not found in the probability calculus. For example, it is often argued that causes must precede their effects.43 If this is right, then information about temporal order is useful for determining when it is appropriate to update by JC. Others disagree, and instead argue that the asymmetry of causation results from the asymmetry of counterfactual dependence between cause and effect.44 If this is right, then information about counterfactual dependence is useful for determining when it is appropriate to update by JC. But no matter who is right, the asymmetry of causation makes it clear that our grasp of causal relevance is somewhat independent from our grasp of probabilistic relevance. Of course, one may additionally worry that the ineffability of the dummy proposition makes it hard to form beliefs about the causal relations at play in a given ineffable learning context. While we agree that this may sometimes be difficult (and that CUN will not be helpful in such cases), we think that we are often in a good position to make judgments of causal (ir)relevance when undergoing ineffable learning, and this is the only kind of causal belief that is required for the use of CUN. We can be sure, for example, that what song comes on next is

causally independent of our (Section IV) poker opponent’s appearance, and we can likewise be sure that the appearance of the poker player’s face is somehow caused by the cards that she is holding, even if we have no knowledge of how that causal dependency manifests — e. g., of whether that face is causally promoted by good cards, by bad cards, etc. This is the only kind of belief about causation that application of CUN requires.45 5. Finally, until now, we’ve dealt with cases in which ineffable learning prompts a shift in confidence over a single variable that is constructed to represent the partition in which the agent’s confidence initially shifts. But how should an agent update when ineffable learning prompts a shift in confidence over multiple variables in a DAG (whose variables are not constructed to represent the ineffable learning experience)? Though we’ve so far avoided such cases for ease of explication, we believe that CUN can be slightly modified in order to handle them. Specifically, CUN must be altered to refer to a set of variables d-separating D from A (rather than a single variable, B). In such contexts, it will be appropriate to update by JC exactly when the agent initially shifts her confidence in a set of variables that forms what Pearl (1988) calls a “Markov blanket” around D.46, 47 But this idea deserves more attention than it can be given in a section called “Loose Ends”. So we leave it for later.

43. Spohn (2001), who argues that “Bayesian networks are all there is to causal dependence”, accounts for the asymmetry of causation in this way.

VIII. Conclusion

44. Woodward adopts this strategy. He likewise makes it clear that taking this route amounts to positing some aspect of causal relevance that does not reduce to probabilistic relevance:

When to Jeffrey conditionalize? We have shown that there are cases where it is inappropriate to update by JC that have thus far gone unnoticed in the literature — specifically cases where the initial



Roughly speaking the role of the directed graphs or structural equations is to represent information about patterns of counterfactual dependence among variables; more specifically, it is to tell us what would happen to the values of some variables under changes of a special sort involving what I will call interventions. […] The probability distribution, P, by contrast, does not convey modal or counterfactual information of this sort. Instead, it conveys information about the actual distribution of values of variables. (Woodward 2001, p. 41)

philosophers’ imprint

45. For those who are familiar with structural equation models, the basic point here is that CUN utilizes the features of the causal graph, but not features of the underlying structural equations. 46. Of course all of the hedges pertaining to unfaithful causal systems would apply to this version of CUN as well. 47. We are indebted to Malcolm Forster for bringing this possibility to our attention in personal communication.

–  18  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

partition, B, is a collider (or a descendant of a collider) between D and (at least) one of the other propositions in the agent’s belief network. We then used the d-separation criterion to develop a norm (CUN) that says it is inappropriate to update by JC in these cases. Following CUN guarantees that you will never irrationally update via JC. But there is a sense in which CUN is too cautious. Specifically, CUN incorrectly says that it is inappropriate to Jeffrey conditionalize in some cases where the causal system at hand is unfaithful. For this reason, we then examined the relevance of the fact that CUN’s verdicts of appropriateness stand on firmer ground than CUN’s verdicts of inappropriateness (because the latter must rely on the controversial CFC, while the former rely only on the less controversial CMC). In so doing, we hope to have adequately developed a causal understanding of when and when not to Jeffrey conditionalize, and (in the process) demonstrated that our beliefs about causal relevance have a vital role to play in a probabilistic epistemology. Acknowledgements We are grateful to anonymous referees who read this manuscript, Malcolm Forster, Dan Hausman, Karolina Krzyżanowska, Isaac Levi, David O’Brien, Shanna Slank, Jan Sprenger, Rush Stewart, Mike Titelbaum, Aron Vallinder, Olav Vassend, and Naftali Weinberger for their helpful discussion and input, and to audiences at the 2016 Formal Epistemology Workshop in Groningen, Netherlands, and the 2014 Informal Formal Epistemology Meeting in Madison, Wisconsin, for useful feedback. Reuben is also indebted to the DFG for funding much of his work on this project through grant no. 623584 (project HA3000/9–1).

philosophers’ imprint

–  19  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize Kyburg, H. E. (1987). Bayesian and non-Bayesian evidential updating. Artificial Intelligence, 31(3):271–293.

References Andersen, H. (2013). When to expect violations of causal faithfulness and why it matters. Philosophy of Science, 80(5):672–683.

Levi, I. (1967). Probability kinematics. The British Journal for the Philosophy of Science, 18(3):197–209.

Cartwright, N. (2001). What is wrong with Bayes nets? The Monist, 84(2):242–264.

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

Elwert, F. and Winship, C. (2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40:31–53.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. (2nd ed.) Cambridge University Press.

Geiger, D. and Pearl, J. (1989). Logical and Algorithmic Properties of Conditional Independence and Qualitative Independence. Report CSD 870056, R-97-IIL, Cognitive Systems Laboratory, University of California, Los Angeles.

Reichenbach, H. (1956). The Direction of Time. University of California Press. Skyrms, B. (1980). Higher-order degrees of belief. In Mellor, D. H., editor, Prospects for Pragmatism, pages 109–137. Cambridge University Press.

Hausman, D. M. and Woodward, J. (1999). Independence, invariance and the causal Markov condition. The British Journal for the Philosophy of Science, 50:521–583.

Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search. (2nd ed.) MIT Press.

Hesslow, G. (1976). Discussion: Two notes on the probabilistic approach to causality. Philosophy of Science, 43(2):290–292.

Spohn, W. (2001). Bayesian nets are all there is to causal dependence. In Galavotti, M. C., Suppes, P., and Costantini, D., editors, Stochastic Causality, pages 157–172. CSLI Publications.

Jeffrey, R. (1970). Dracula meets Wolfman: Acceptance vs. partial belief. In Swain, M., editor, Induction, Acceptance and Rational Belief, pages 157–185. D. Reidel.

Titelbaum, M. G. (2013).  Quitting Certainties: A Bayesian Framework Modeling Degrees of Belief. Oxford University Press.

Jeffrey, R. (1983). The Logic of Decision (2nd ed.). University of Chicago Press.

Verma, T. (1987). Causal Networks: Semantics and Expressiveness. Technical Report R-65-I, Cognitive Systems Laboratory, University of California, Los Angeles.

Jeffrey, R. (2004). Subjective Probability: The Real Thing. Cambridge University Press.

philosophers’ imprint

–  20  –

vol. 17, no. 8 (april 2017)



ben schwan & reuben stern

A Causal Understanding of When and When Not to Jeffrey Conditionalize

Weinberger, N. (2017). Faithfulness, Coordination and Causal Coincidences. Erkenntnis, doi:10.1007/s10670-017-9882-6. Woodward, J. (2001). Probabilistic causality, direct causes, and counterfactual dependence. In Galavotti, M. C., Suppes, P. and Costantini, D., editors, Stochastic Causality, pages 39–63. CSLI Publications. Zhang, J., and Spirtes, P. (2008). Detection of unfaithfulness and robust causal inference. Minds and Machines, 18(2):239–271.

philosophers’ imprint

–  21  –

vol. 17, no. 8 (april 2017)