The Foundations of Probability with Black Swans

We extend the foundation of probability in samples with rare events that are potentially catastrophic, called black swans, such as natural hazards, market crashes, catastrophic climate change, and species extinction. Such events are generally treated as “outliers” and disregarded. We propose a new axiomatization of probability requiring equal treatment in the measurement of rare and frequent events—the Swan Axiom—and characterize the subjective probabilities that the axioms imply: these are neither finitely additive nor countably additive but a combination of both. They exclude countably additive probabilities as in De Groot 1970 and Arrow 1971 and are a strict subset of Savage 1954 probabilities that are finitely additive measures. Our subjective probabilities are standard distributions when the sample has no black swans. The finitely additive part assigns however more weight to rare events than do standard distributions and in that sense explains the persistent observation of “power laws” and “heavy tails” that eludes classic theory. The axioms extend earlier work by Chichilnisky 1996, 2000, 2002, 2009 to encompass the foundation of subjective probability and axiomatic treatments of subjective probability by Villegas 1964 , De Groot 1963 , Dubins and Savage 1965 , Dubins 1975 Purves and Sudderth 1976 and of choice under uncertainty by Arrow 1971 .


Introduction
Black swans are rare events with important consequences, such as market crashes, natural hazards, global warming, and major episodes of extinction.This article is about the foundations of probability when catastrophic events are at stake.It provides a new axiomatic foundation for probability requiring sensitivity both to rare and frequent events.The study culminates in Theorem 6.1, that proves existence and representation of a probability satisfying three axioms.The last of these axioms requires sensitivity to rare events, a property that is desirable but not respected by standard probabilities.The article shows the connection between those axioms and the Axiom of Choice at the foundation of Mathematics.It defines a new type of probabilities that coincide with standard distributions when the sample is populated only by relatively frequent events.Generally, however, they are a mixture of countable and finitely additive measures, assigning more weight to black swans than do normal distributions, and predicting more realistically the incidence of "outliers," "power laws," and "heavy tails" 1, 2 .
The article refines and extends the formulation of probability in an uncertain world.It provides an argument, and formalization, that probabilities must be additive functionals on L ∞ U where U is a σ-field of "events" represented by their indicator bounded and real valued functions , that are neither countably additive nor finitely additive.The contribution is to provide an axiomatization showing that subjective probabilities must lie in the full space L * ∞ rather than L 1 as the usual formalization Arrow, 3 forcing countable additivity implies.The new axioms refine both Savage's 4 axiomatization of finitely additive measures, and Villegas' 5 and Arrow's 3 that are based on countably additive measures, and extend both to deal more realistically with catastrophic events.Savage 4 axiomatized subjective probabilities as finitely additive measures representing the decision makers' beliefs, an approach that can ignore frequent events as shown in the appendix.To overcome this, Villegas 5 and Arrow 3 introduced an additional continuity axiom called "Monotone Continuity" that yields countably additivity of the measures.However Monotone Continuity has unusual implications when the subject is confronted with rare events, for example, it predicts that in exchange for a couple of cents, one should be willing to accept a small risk of death measured by a countably additive probability , a possibility that Arrow called "outrageous" 3, Pages 48-49 .This article defines a realistic solution: for some, very large, payoffs and in certain situations, one may be willing to accept a small risk of death-but not in others.This means that Monotone Continuity holds in some cases but not in others, a possibility that leads to the axiomatization proposed in this article and is consistent with the experimental observations reported by Chanel and Chichilnisky 6, 7 .The results are as follows.We show that countably additive measures are insensitive to black swans: they assign negligible weight to rare events, no matter how important these may be, treating catastrophes as outliers.Finitely additive measures, on the other hand, may assign no weight to frequent events, which is equally troubling.Our new axiomatization balances the two approaches and extends both, requiring sensitivity in the measurement of rare as well as frequent events.We provide an existence theorem for probabilities that satisfy our axioms, and a characterization of all that do.
The results are based on an axiomatic approach to choice under uncertainty and sustainable development introduced by Chichilnisky 8-10 and illuminate the classic issue of continuity that has always been at the core of "subjective probability" axioms Villegas, 5 , Arrow 3 .To define continuity, we use a topology that tallies with the experimental evidence of how people react to rare events that cause fear Le Doux 11 , Chichilnisky 12 , previously used by Debreu 13 to formalize a market's Invisible Hand, and by Chichilnisky 9,12,14 to axiomatize choice under uncertainty with rare events that inspire fear.The new results provided here show that the standard axiom of decision theory, Monotone Continuity, is equivalent to De Groot's Axiom SP 4 that lies at the foundation of classic likelihood theory Proposition 2.1 and that both of these axioms underestimate rare events no matter how catastrophic they may be.We introduce here a new Swan Axiom Section 3 that logically negates them both, show it is a combination of two axioms defined by Chichilnisky 9, 14 and prove that any subjective probability satisfying the Swan Axiom is neither countably additive nor finitely additive: it has elements of both Theorem 4.1 .Theorem 6.1 provides a complete characterization of all subjective probabilities that satisfy linearity and the Swan Axiom, thus extending earlier results of Chichilnisky 1, 2, 9, 12, 14 .
There are other approaches to subjective probability such as Choquet Expected Utility Model CEU, Schmeidler, 15 and Prospect Theory Kahneman and Tversky, 16, 17 .They use a nonlinear treatment of probabilities of likelihoods see, e.g., Dreze, 18 , or Bernstein, 19 , while we retain linear probabilities.Both have a tendency to give higher weight to small probabilities, and are theoretical answers to experimental paradoxes found by Allais in 1953 and Ellsberg in 1961, among others refuting the Independence Axiom of the Subjective Expected Utility SEU model.Our work focuses instead directly on the foundations of probability by taking the logical negation of the Monotone Continuity Axiom.It is striking that weakening or rejecting this axiom-respectively, in decision theory and in probability theory-ends up in probability models that are more in tune with observed attitudes when facing catastrophic events.Presumably each approach has advantages and shortcomings.It seems that the approach offered here may be superior on four counts: i it retains linearity of probabilities, ii it identifies Monotone Continuity as the reason for underestimating the measurement of catastrophic events, an axiom that depends on a technical definition of continuity and has no other compelling feature, iii it seems easier to explain and to grasp, and therefore iv it may be easier to use in applications.

Uncertainty
Uncertainty is described by a set of distinctive and exhaustive possible events represented by a family of sets {U α }, α ∈ N, whose union describes a universe U α U α .An event U ∈ U is identified with its characteristic function φ U : U → R where φ U x 1 when x ∈ U and φ U x 0 when x / ∈ U.The subjective probability of an event U is a real number W U that measures how likely it is to occur according to the subject.Generally we assume that the probability of the universe is 1 and that of the empty set is zero W ∅ 0. In this article we make no difference between subjective probabilities and likelihoods, using both terms intercheangeably.Classic axioms for subjective probability resp.likelihoods are provided by Savage 4 and De Groot 20 .The likelihood of two disjoint events is the sum of their likelihoods: ∅; a property called additivity.These properties correspond to the definition of a probability or likelihood as a finite additive measure on a family σ-algebra of measurable sets of U, which is Savage's 4 definition of subjective probability.W is countably additive when A purely finitely additive probability is one that is additive but not countably additive.Savage's subjective probabilities can be purely finitely additive or countably additive.In that sense they include all the probabilities in this article.However as seen below, this article excludes probabilities that are either purely finitely additive, or countably additive, and therefore our characterization of a subjective probability is strictly finer than that of Savage's 4 , and different from the view of a measure as a countably additive set function e.g.De Groot , 21 The following Axioms were introduced by Villegas 5 ; and others for the purpose of obtaining countable additivity.

Monotone Continuity Axiom (MC) (Arrow [3])
For every two events f and g with W f > W g , and every vanishing sequence of events {E α } 1,2... defined as follows: for all α, E α 1 ⊂ E α and ∞ α 1 E α ∅ there exists N such that altering arbitrarily the events f and g on the set E i , where i > N, does not alter the subjective probability ranking of the events, namely, W f > W g , where f and g are the altered events.
This axiom is equivalent to requiring that the probability of the sets along a vanishing sequence goes to zero.Observe that the decreasing sequence could consist of infinite intervals of the form n, ∞ for n 1, 2 . . . .Monotone continuity therefore implies that the likelihood of this sequence of events goes to zero, even though all its sets are unbounded.A similar example can be constructed with a decreasing sequence of bounded sets, −1/n, 1/n for n 1, 2 . . ., which is also a vanishing sequence as it is decreasing and their intersection is empty.De Groot's Axiom SP 4 (De Groot, [20], Chapter 6, page 71) events and B is some fixed event that is less likely than A i for all i, then the probability of the intersection ∞ i A i is larger than that of B. The following proposition establishes that the two axioms presented above are one and the same; both imply countable additivity.

Proposition 2.1. A relative likelihood (subjective probability) satisfies the Monotone Continuity Axiom if and only if it satisfies Axiom SP 4 . Each of the two axioms implies countable additivity.
Proof.Assume that De Groot's axiom SP 4 is satisfied.When the intersection of a decreasing sequence of events is empty i A i ∅ and the set B is less likely to occur than every set A i , then the subset B must be as likely as the empty set; namely, its probability must be zero.In other words, if B is more likely than the empty set, then regardless of how small is the set B, it is impossible for every set A i to be as likely as B. Equivalently, the probability of the sets that are far away in the vanishing sequence must go to zero.Therefore SP 4 implies Monotone Continuity.Reciprocally, assume that MC is satisfied.Consider a decreasing sequence of events A i and define a new sequence by substracting from each set the intersection of the family, namely, A 1 − ∞ i A i , A 2 − ∞ i A i , . . . .Let B be a set that is more likely than the empty set but less likely than every A i .Observe that the intersection of the new sequence is empty, ∞ i A i − ∞ i A i ∅ and since A i ⊃ A i 1 the new sequence is, by definition, a vanishing sequence.Therefore by MC lim i W A i − ∞ i A i 0. Since W B > 0, B must be more likely than A i − ∞ i A i for some i onwards.Furthermore, The next section shows that the two axioms, Monotone Continuity and SP 4 , are biased against rare events no matter how catastrophic these may be.

The Value of Life
The best way to explain the role of Monotone Continuity is by means of an example provided by Arrow 3, Pages 48-49 .He explains that if a is an action that involves receiving one cent, b is another that involves receiving zero cents, and c is a third action involving receiving one cent and facing a small probability of death, then Monotone Continuity requires that the third action involving death and one cent should be preferred to the action with zero cents if the probability of death is small enough.Even Arrow says of his requirement "this may sound outrageous at first blush. .."Arrow 3, Pages 48-49 .Outrageous or not, Monotone Continuity MC leads to neglect rare events with major consequences, like death.Death is a black swan.
To overcome the bias we introduce an axiom that is the logical negation of MC: this means that sometimes MC holds and others it does not.We call this the Swan Axiom, and it is stated formally below.To illustrate this, consider an experiment where subjects are offered a certain amount of money to choose a pill at random from a pile, which is known to contain one pill that causes death.It was shown experimentally Chanel and Chichilnisky 7 that in some cases people accept a sum of money and choose a pill provided that the pile is large enough-namely, when the probability of death is small enough-thus satisfying the Monotone Continuity axiom and determining the statistical value of their lives.But there are also cases where the subjects will not accept to choose any pill, no matter how large is the pile.Some people refuse the payment of one cent if it involves a small probability of death, no matter how small the probability may be Chanel and Chichilnisky, 6, 7 .This conflicts with the Monotone Continuity axiom, as explicitly presented by Arrow 3 .
Our Axiom provides a reasonable resolution to this dilemma that is realistic and consistent with the experimental evidence.It implies that there exist catastrophic outcomes such as the risk of death, so terrible that one is unwilling to face a small probability of death to obtain one cent versus nothing, no matter how small the probability may be.According to our Axiom, no probability of death may be acceptable when one cent is involved.Our Axiom also implies that in other cases there may be a small enough probability that the lottery involving death may be acceptable, for example if the payoff is large enough to justify the small risk.This is a possibility discussed by Arrow 3 .In other words: sometimes one is willing to take a risk with a small enough probability of a catastrophe, in other cases one is not.This is the content of our Axiom, which is formally stated as follows.

The Swan Axiom
This axiom is the logical negation of Monotone Continuity: There exist events f and g with W f > W g , and for every vanishing sequence of events {E i } i 1,2... an N > 0 such that altering arbitrarily the events f and g on the set E i , where i > N, does not alter the probability ranking of the events, namely, W f > W g , where f and g are the altered events.For other events f and g with W f > W g , there exist vanishing sequence of events {E i } i 1,2... where for every N, altering arbitrarily the events f and g on the set E i , where i > N, does alter the probability ranking of the events, namely W f < W g , where f and g are the altered events.Definition 3.1.A probability W is said to be biased against rare events or insensitive to rare events when it neglects events that are small according to Villegas and Arrow; as stated in Arrow 3, page 48 : "An event that is far out on a vanishing sequence is "small" by any reasonable standards" Arrow 3, page 48 .Formally, a probability is insensitive to rare events when given two events f and g and any vanishing sequence of events {E j }, ∃N N f, g > 0, such that W f > W g ⇔ W f > W g for all f , g satisfying f f and g g a.e. on E c j ⊂ R when j > N, and E c denotes the complement of the set E.
Theorem 4.1 establishes that neither Savage's approach nor Villegas' and Arrow's satisfy the three axioms stated above.These three axioms require more than the additive subjective probabilities of Savage, since purely finitely additive probabilities are finitely additive and yet they are excluded here.At the same time the axioms require less than the countably subjective additivity of Villegas and Arrow, since countably additive probabilities are biased against rare events.Theorem 4.1 above shows that a strict combination of both does the job.
Theorem 4.1 does not however prove the existence of likelihoods that satisfy all three axioms.What is missing is an appropriate definition of continuity that does not conflict with the Swan Axiom.The following section shows that this can be achieved by identifying an event with its characteristic function, so that events are contained in the space of bounded real-valued functions on the universe space U, L ∞ U , and endowing this space with the sup norm.

Axioms for Probability with Black Swans, in R or a, b
From here on events are the Borel sets of the real line R or the interval a, b .This is a widely used case that make the results concrete and allows to compare the results with the earlier axioms on choice under uncertainty of Chichilnisky 9, 12, 14 .We use a concept of "continuity" based on a topology that was used earlier by Debreu 13 and by Chichilnisky 1, 2, 9, 10, 12, 14 : observable events are in the space of measurable and essentially bounded functions L L ∞ R with the sup norm f ess sup x∈R |f x |.This is a sharper and more stringent definition of closeness than the one used by Villegas and Arrow, since two events can be close under the Villegas-Arrow definition but not under ours, see the appendix.
A subjective probabiliy satisfying the classic axioms by De Groot 20 is called a standard probability, and is countably additive.A classic result is that for any event f ∈ L ∞ a standard probability has the form The next step is to introduce the new axioms, show existence and characterize all the distributions that satisfy the axioms.We need more definitions.A subjective probability W : L ∞ → R is called biased against rare events, or insensitive to rare events when it neglects events that are small according to a probability measure μ on R that is absolutely continuous with respect to the Lebesgue measure.Formally, a probability is insensitive to rare events when given two events f and g ∃ε ε f, g > 0, such that W f > W g ⇔ W f > W g for all f , g satisfying f f and g g a.e. on A ⊂ R and μ A c < ε.Here A c denotes the complement of the set A. W : L ∞ → R is said to be insensitive to frequent events when given any two events f, g ∃ε f, g > 0 that W f > W g ⇔ W f > W g for all f , g satisfying f f and g g a.e. on A ⊂ R and μ A c > 1 − ε.W is called sensitive to rare respectively frequent events when it is not insensitive to rare respectively frequent events.
The following three axioms are identical to the axioms in last section, specialized to the case at hand.
The first and the second axiom agree with classic theory and standard likelihoods satisfy them.The third axiom is new.Lemma 5.1.A standard probability satisfies Axioms 1 and 2, but it is biased against rare events and therefore does not satisfy Axiom 3.
since f and g are characteristic functions and thus positive.Therefore W is linear.W is continuous with respect to the Since the sup norm is finer than the L 1 norm, continuity in L 1 implies continuity with respect to the sup norm Dunford and Schwartz, 22 .Thus a standard subjective probability satisfies Axiom 1.It is obvious that for every two events f, g, with W f > W g , the inequality is reversed namely W g > W f when f and g are appropriate variations of f and g that differ from f and g on sets of sufficiently large Lebesgue measure.Therefore Axiom 2 is satisfied.A standard subjective probability is however not sensitive to rare events, as shown in Chichilnisky 1, 2, 9, 10, 12, 14, 23 .

Existence and Representation
Theorem 6.1.There exists a subjective probability W : L ∞ → R satisfying Axioms 1, 2, and 3. A probability satisfies Axioms 1, 2 and 3 if and only if there exist two continuous linear functions on L ∞ , denoted φ 1 and φ 2 and a real number λ, 0 < λ < 1, such that for any observable event where φ 1 ∈ L 1 R, μ defines a countably additive measure on R and φ 2 is a purely finitely additive measure.
Proof.This result follows from the representation theorem by Chichilnisky 9, 12 .
Example 6.2 "Heavy" Tails .The following illustrates the additional weight that the new axioms assign to rare events; in this example in a form suggesting "heavy tails."The finitely additive measure φ 2 appearing in the second term in 6.1 can be illustrated as follows.On the subspace of events with limiting values at infinity, L ∞ {f L ∞ : lim x → ∞ x < ∞}, define φ 2 f lim x → ∞ f x and extend this to a function on all of L ∞ using Hahn Banach's theorem.The difference between a standard probability and the likelihood defined in 6.1 is the second term φ 2 , which focuses all the weight at infinity.This can be interpreted as a "heavy tail," a part of the distribution that is not part of the standard density function φ 1 and gives more weight to the sets that contain terminal events, namely sets of the form x, ∞ .Corollary 6.3.In samples without rare events, a subjective probability that satisfies Axioms 1, 2, and 3 is consistent with classic axioms and yields a countably additive measure.
Proof.Axiom 3 is an empty requirement when there are no rare events while, as shown above, Axioms 1 and 2 are consistent with standard relative likelihood.

The Axiom of Choice
There is a connection between the new axioms presented here and the Axiom of Choice that is at the foundation of mathematics Godel, 24 , which postulates that there exists a universal and consistent fashion to select an element from every set.The best way to describe the situation is by means of an example, see also for some a ∈ R}, and otherwise ρ A 0. Such a measure would not be countably additive, because the family of countably many disjoint sets {V i } i 0,1,... defined as i 0 ρ V i 0, which contradicts countable additivity.Since the contradiction arises from assuming that ρ is countably additive, such a measure could only be purely finitely additive.
One can illustrate a function on L ∞ that represents a purely finitely additive measure ρ if we restrict our attention to the closed subspace L ∞ of L ∞ consisting of those functions f x in L ∞ that have a limit when x → ∞, by the formula ρ f lim x → ∞ f x , as in Example 6.2 of the previous section.The function ρ • can be illustrated as a limit of a sequence of delta functions whose supports increase without bound.The problem however is to extend the function ρ to another defined on the entire space L ∞ .This could be achieved in various ways but as we will see, each of them requires the Axiom of Choice.
One can use Hahn-Banach's theorem to extend the function ρ from the closed subspace L ∞ ⊂ L ∞ to the entire space L ∞ preserving its norm.However, in its general form Hahn-Banach's theorem requires the Axiom of Choice Dunford and Schwartz, 22 .Alternatively, one can extend the notion of a limit to encompass all functions in L ∞ including those with no standard limit.This can be achieved by using the notion of convergence along a free ultrafilter arising from compactifying the real line R as by Chichilnisky and Heal 27 .However the existence of a free ultrafilter also requires the Axiom of Choice.
This illustrates why any attempts to construct purely finitely additive measures, requires using the Axiom of Choice.Since our criteria include purely finitely additive measures, this provides a connection between the Axiom of Choice and our axioms for relative likelihood.It is somewhat surprising that the consideration of rare events that are neglected in standard statistical theory conjures up the Axiom of Choice, which is independent from the rest of mathematics Godel,24 .helpful comments and suggestions.An anonymous referee provided insightful comments that improved this article.
which establishes De Groots's Axiom SP 4 .Therefore Monotone Continuity is equivalent to De Groot's Axiom SP 4 .A proof that each of the axioms implies countable additivity is in Villegas 5 , Arrow 3 and De Groot 20 .
Example 7.1 illustration of a purely finitely additive measure .Consider a possible measure ρ satisfying the following: for every