Advancing Shannon Entropy for Measuring Diversity in Systems

From economic inequality and species diversity to power laws and the analysis of multiple trends and trajectories, diversity within systems is a major issue for science. Part of the challenge is measuring it. Shannon entropy H has been used to rethink diversity within probability distributions, based on the notion of information. However, there are two major limitations to Shannon’s approach. First, it cannot be used to compare diversity distributions that have different levels of scale. Second, it cannot be used to compare parts of diversity distributions to the whole. To address these limitations, we introduce a renormalization of probability distributions based on the notion of case-based entropy Cc as a function of the cumulative probability c. Given a probability density p(x), Cc measures the diversity of the distribution up to a cumulative probability of c, by computing the length or support of an equivalent uniform distribution that has the same Shannon information as the conditional distribution of pc(x) up to cumulative probability c. We illustrate the utility of our approach by renormalizing and comparing three well-known energy distributions in physics, namely, the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac distributions for energy of subatomic particles. The comparison shows that Cc is a vast improvement overH as it provides a scale-free comparison of these diversity distributions and also allows for a comparison between parts of these diversity distributions.


Diversity in Systems
Statistical distributions play an important role in any branch of science that studies systems comprised of many similar or identical particles, objects, or actors, whether material or immaterial, human or nonhuman.One of the key features that determines the characteristics and range of potential behaviors of such systems is the degree and distribution of diversity, that is, the extent to which the components of the system occupy states with similar or different features.
As Page outlined in a series of inquiries [1,2], including The Difference and Diversity and Complexity, diversity within systems is an important concern for science, be it making sense of economic inequality, expanding the trade portfolio of countries, measuring the collapse of species diversity in various ecosystems, or determining the optimal utility/robustness of a network.However, an important major challenge in the literature on diversity and complexity, which Page also points out [1,2], remains: the issue of measurement.
Although statistical distributions that directly reflect the spread of key parameters (such as mass, age, wealth, or energy) provide descriptions of this diversity, it can be difficult to compare the diversity of different distributions or even the same distribution under different conditions, mostly because of differences in scales and parameters.Also, many of the measures currently available compress diversity into a single score or are not intuitive [1][2][3][4].
At the outset, motivated by examples of measuring diversity in ecology and evolutionary biology from [3,4], we sought to address these challenges.We begin with some definitions and a review of our previous research.
First, in terms of definitions, we follow the ecological literature, defining diversity as the interplay of "richness" and "evenness" in a probability distribution.Richness refers to the number of different diversity types in a system.Examples include (a) the different levels of household income in a city, (b) the number of different species in an ecosystem, (c) the diversity of a country's exports, (d) the distribution of different nodes in a complex network, (e) the various health trends for a particular disease across time/space, or (f) the cultural or ethnic diversity of an organization or company.In all such instances, the greater the number of diversity types (be these types discrete or continuous), the greater the degree of richness in a system.In the case of the current study, for example, richness was defined as the number of different energy states.
In turn, evenness refers to the uniformity or "equiprobability" of occurrence of such states.In terms of the above examples, evenness would be defined as (a) a city where household income was evenly distributed, (b) an ecosystem where the diversity of its species was equal in number, (c) a country with an even distribution of exports, (d) a complex network where all nodes had the same probability of occurrence, (e) a disease where all possible health trends were equiprobable, or (f) a company or organization where people of different cultural or ethnic backgrounds were evenly distributed.In the case of the current study, for example, evenness was defined as the uniformity or "equiprobability" of the occurrence of all possible energy states.
More specifically, as we will see later in the paper, we define the diversity of a probability distribution as the number of equivalent equiprobable types required to maintain the same amount of Shannon entropy  (i.e., the number of Shannon-equivalent equiprobable states).Given such a definition, a system with a high degree of richness and evenness would have a higher degree of , whereas a system with a low degree of richness and evenness would have a low degree of .In turn, a system with high richness but low evenness (as in the case of a skewed-right system with long tail) would have a lower degree of  than a system with high richness and high evenness.

Purpose of the Current Study.
Recently, we have introduced a novel approach to representing diversity within statistical distributions [5,6], which overcomes such difficulties and allows the distribution of diversity in any given system (or cumulative portions thereof) to be directly compared to the distribution of diversity within any other system.In effect, it is a renormalization that can be applied to any probability distribution to produce a direct representation of the distribution of diversity within that distribution.Arising from our work in the area of complex systems, the approach is based on the notion of case-based entropy,   [5].This approach has two major advantages over the Shannon Entropy , which, as we alluded to above, is one of the most commonly used measures of diversity within probability distributions and which calculates the average amount of uncertainty (or information, depending on one's perspective) present in a given probability distribution.First,   can be used to compare distributions that have different levels of scale; and, second,   can be used to compare parts of distributions to their whole.
After developing the concept and formalism for casebased entropy for discrete distributions [5], we first applied it to compare complexity across a range of complex systems [6].In that work, we investigated a series of systems described by a variety of skewed-right probability distributions, choosing examples that are often suggested to exhibit behaviors indicative of complexity such as emergent collectivity, phase changes, or tipping points.What we found was that such systems obeyed an apparent "limiting law of restricted diversity" [6], which constrains the majority of cases in these complex systems to simpler types.In fact, for these types of distribution, the distributions of diversity were found to follow a scale-free 60/40 rule, with 60% or more of cases belonging to the simplest 40% or less of equiprobable diversity types.This was found to be the case regardless of whether the original distribution fit a power law or was longtailed, making it fundamentally distinct from the well-known (but often misunderstood) Pareto Principle [7].
In the following, we continue to explore the use of casebased entropy in comparing systems described by statistical distributions.However, we now go beyond our prior work in the following ways.First, we extend the formalism in order to compute case-based entropy for continuous as well as discrete distributions.Second, we broaden our focus from complexity/complex systems to diversity in any type of statistically distributed system.That is, we start to explore distributions of diversity for systems where richness is not a function of the degree of complexity types.
Third, the discrete indices we used had a degree of subjectivity to them, for example, how should household income be binned and what influence does that have on the distribution of diversity?As such, we wanted to see how well   worked for distributions where the unit of measurement was universally agreed upon.
Fourth, we had not emphasized how   was a major advance on Shannon entropy .As known, while  has proven useful, it compresses its measurement of diversity into a single number; it is also nonintuitive; and, as we stated above, it is not scale-free and therefore cannot be used to compare the diversity of different systems; neither can it be used to compare parts of the diversity within a system to the entire system.
Hence, the purpose of the current study, as a demonstration of the utility of   , is to renormalize and compare three physically significant energy distributions in statistical physics: the energy probability density functions for systems governed by Boltzmann, Bose-Einstein, and Fermi-Dirac statistics.

Renormalizing Probability: Case-Based Entropy and the Distribution of Diversity
The quantity case-based entropy [5],   , renormalizes the diversity contribution of any probability distribution (), by computing the true diversity  of an equiprobable distribution (called the Shannon-equivalent uniform distribution) that has the same Shannon entropy  as ().  is precisely the number of equiprobable types in the case of a discrete distribution, or the length, support, or extent of the variable in the case of continuous distributions, which is required to keep the value of the Shannon entropy the same across the whole or any part of the distribution up to a cumulative probability .We choose the Shannon-equivalent uniform distribution for two reasons: (i) First, it is well known that, on a finite measure space, the uniform distribution maximizes entropy: that is, the uniform distribution has the maximal entropy among all probability distributions on a set of finite Lebesgue measures [8].
(ii) Second, a Shannon-equivalent uniform distribution will, by definition, count the number of values (or range of values) of  that are required to give the same information as the original distribution () if we assume that all the values (or range of values) are equally probable.
Hence, the uniform distribution renormalizes the effect of varying relative frequencies (or probabilities) of occurrence of the values of  without losing information (or entropy).In other words, if all choices of the random variable are equally likely, the number of values (or the length, if it is a continuous random variable) needed for the random variable to keep the same amount of information as the given distribution is a measure of diversity.In a sense, each new value (or type) is counted as adding to the diversity, only if the new value has the same probability of occurrence as the existing values.Diversity necessarily requires the values of the random variable to be equiprobable since lower probability, for example, means that such values occur rarely in the random variable and hence cannot be treated as equally diverse as other values with higher probabilities.Hence, by choosing an equiprobable (or uniform) distribution for normalization, we are counting the true diversity, that is, the number of equiprobable types that are required to match the same amount of Shannon information  as the given distribution.
This calculation (as we have shown elsewhere [5]) can be done for parts of the distribution up to a cumulative probability of .This means that a comparison of   for a variety of distributions is actually a comparison of the variation of the fraction of diversity   contributed by values of the random variable up to .
Since, regardless of the scale and units of the original distribution,  and   both vary from 0 to 1, one can plot a curve for   versus  for multiple distributions on the same axes.  thus provides us with a scale-free measure to compare distributions without omitting any of the entropy information, but by renormalizing the variable to one that has equiprobable values.What is more, it also allows us to compare different parts of the same distribution, or parts to wholes.That is, we can generate a   versus  curve for any part of a distribution (normalizing the probabilities to add up to 1 in that part) and compare the   curve of the part to the   curve of the whole or another part to see if the functional dependence of   on  is the same or different.In essence,   has the ability to compare distributions in a "fractal" or selfsimilar way.
In [5], we showed how to carry out the renormalization for discrete probability distributions, both mathematical and empirical.In this paper, as we stated in the Introduction, we make the case for how   constitutes an advance over , in terms of providing a scale-free comparison of probability distributions and also comparisons between parts of distributions.More importantly, we demonstrate how   works for continuous distributions, by examining the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac distributions for energy of subatomic particles.We begin with a more detailed review of   .

Case-Based Entropy of a Continuous Random Variable
Our impetus for making an advance over the Shannon entropy  comes from the study of diversity in evolutionary biology and ecology, where it is employed to measure the true diversity of species (types) in a given ecological system of study [3,4,9,10].As we show here, it can also be used to measure the diversity of an arbitrary probability distribution of a continuous random variable.
Given the probability density function () of a random variable  in a measure space , the Shannon-Weiner entropy index  is given by The problem, however, with the Shannon entropy index , as we identified in our abstract and Introduction, is that while being useful for studying the diversity of a single system, it cannot be used to compare the diversity across probability distributions.In other words,  is not multiplicative: a doubling of value for  does not mean that the actual diversity has doubled.To address this problem, we turned to the true diversity measure  [3,11,12], which gives the range of equiprobable values of  that gives the same value of :  =   . ( The utility of  for comparing the diversity across probability distributions is that, in , a doubling of the value means that the number of equiprobable ranges of values of  has doubled as well. calculates the range of such equiprobable values of  that will give the same value of Shannon entropy  as observed in the distribution of .We say that two probability densities  1 () and  2 (x) are Shannon-equivalent if they have the same value of Shannon entropy.Case-based entropy is then the range of values of  for the Shannon-equivalent uniform distribution for ().We also note that Shannon entropy can be recomputed from  by using  = ln().
In order to measure the distribution of diversity, we next need to determine the fractional contribution to overall diversity up to a cumulative probability .In other words, we need to be able to compute the diversity contribution   up to a certain cumulative probability .To do so, we replace  with   , the conditional entropy, given that only the portion of the distribution up to a cumulative probability  (denoted by   ) is observed with conditional probability of occurrence ( The value of   for a given value of cumulative probability  is the number of Shannon-equivalent equiprobable energy states (or of values of the variable in the -axis in general) that are required to explain the information up to a cumulative probability of  within the distribution.If  = 1, then   =  is the number of such Shannon-equivalent equiprobable energy states for the entire distribution itself.
We can then simply calculate the fractional diversity contribution or case-based entropy as It is at this point that the renormalization (  as a function of ) becomes scale independent as both axes range between values of 0 and 1 with the graph of   versus  passing through (0, 0) and (1, 1).Hence, irrespective of the range and scale of the original distributions, all distributions can be plotted on the same graph and their diversity contributions can be compared in a scale-free manner.
To check the validity of our formalism, we calculate   for the simple case of a uniform distribution given by () =  [0,] () on the interval  = [0, ].Intuitively, if we choose   = [0, ], then, owing to the uniformity of the distribution, we expect   =  itself.In other words, the diversity of the part [0, ] is simply equal to , that is, the length of the interval [0, ], and hence the   versus  curve will simply be the straight line with slope equal to 1.This can be shown as follows: With our formulation of   complete, we turn to the energy distributions for particles governed by Boltzmann, Bose-Einstein, and Fermi-Dirac statistics.

4.1.
for the Boltzmann Distribution in One Dimension.We first illustrate our renormalization by applying it to a relatively simple case: that of an ideal gas at temperature .The kinetic energies  of particles in such a gas are described by the Boltzmann distribution [8].In one dimension, this is where   is the Boltzmann constant and  = (1/  ).
The entropy of  ,1 () can be shown to be   = 1 − ln(), and hence the true diversity of energy in the range [0, ∞) is given by The cumulative probability  from  = 0 to  =  is then given by Hence,  can be computed in terms of  as Equation ( 9) is useful for the one-dimensional Boltzmann case to eliminate the parameter  altogether in (11) to obtain an explicit relationship between   and .It is to be noted that, in most cases, both   and  can only be parametrically related through .The other quantities introduced in Section 3 can then be calculated as follows: We note that, in (13), the temperature factor  cancels out, indicating that the distribution of diversity for an ideal gas in one dimension is independent of temperature.The resulting graph of   as a function of  is shown in Figure 1.It is worth noting in passing that   reaches 40% when  ≈ 69%, indicating that approximately 69% of the molecules in the gas are contained within the lower 40% of diversity of energy probability states at all temperatures (here, diversity is defined as the number of equivalent equiprobable energy states required to maintain the same amount of Shannon entropy ).Thus, the one-dimensional Boltzmann distribution obeys an interesting phenomenon that we have identified in a wide range of skewed-right complex systems, which (as we briefly discussed in the Introduction) we call restricted diversity and, more technically, the 60/40 rule [6].The independence of temperature in the   versus  curve, for the Boltzmann distribution, shows that the effect of increasing  is to shift the mean of the distribution to higher energies and to increase its standard deviation, but not to change its characteristic shape.Still, what is key to our results is that the temperature independence of the   curve for the Boltzmann distribution in one dimension validates that our renormalization preserves the fundamental features of the original distribution.

𝐶 𝑐 for the Boltzmann Distribution in Three Dimensions.
We now turn to the calculation of   for the physically more important case of the Boltzmann distribution in three dimensions [8]: where the additional factor of √4/ accounts for the density of states.The cumulative probability  from  = 0 to  =  can be computed as follows: As we would hope, (15) has the property that as  → ∞, the cumulative probability  → 1.
However, it is difficult to solve (15) for  directly in terms of .We therefore compute   in parametric form with  being the parameter.Also, analytical forms are not possible, so Matlab was used to compute   ,   , and   , respectively: Thus,   can also only be computed in parametric form with parameter  that varies from 0 to ∞. Figure 2 shows the   curve thus calculated for the Boltzmann distribution in three dimensions.
Although the temperature independence of this distribution is not immediately evident from Figure 2, one would, following the same logic as for the one-dimensional case, expect the distribution of diversity to be the same for all .That is, as in the one-dimensional case, because changes in  do not affect the original distributions characteristic shape, we expect the renormalized distribution to be independent of temperature.This does, indeed, turn out to be the case.This is illustrated in Figure 2, which overlays the results of the calculations for  = 50 K, 500 K, and 5000 K.It is also worth noting that, just like our one-dimensional case, the curve obeys the 60/40 rule of restricted diversity [6]: regardless of temperature, over 60 percent of molecules are in the lower 40 percent of diversity of energy probability states (here again, diversity is defined as the number of equivalent equiprobable energy states required to maintain the same amount of Shannon entropy ).
In addition, it is worth noting that as we might expect, adding more degrees of freedom increases the average energy by a factor of (1/2)   per degree while maintaining the same shape for the distribution of energy.Hence, the current result will still hold true for gas molecules with higher degrees of freedom; that is, the distribution of diversity is always exactly the same for an ideal gas, whether monoatomic or polyatomic.

The Bose-Einstein Distributions for Massive and Massless
Bosons.We now move on to consider the second of our example distributions.The Bose-Einstein distribution gives the energy probability density function for massive bosons above the Bose temperature   as where  is a normalization constant and where  is the Riemann zeta function.In the following calculations, we use the Bose temperature for helium,   = 3.14 K.
For massless bosons such as photons, the energy probability density function is [13] It is important to note that the "density of states" factors shown in ( 17) and (19) result in different energy distributions, despite the two types of boson obeying the same statistics.The conditional probabilities, conditional entropies, true diversities, and case-based entropies for these distributions cannot be calculated analytically but can be calculated numerically.The results of such calculations, using the software Matlab, are shown in Figure 3.
As with the Boltzmann distributions, we find that the distributions of diversity for the two boson systems are  independent of temperature.Although the curves for the two types of boson are very similar, it is evident that the distributions of diversity do differ to some extent.For helium-4 bosons, a slightly larger fraction of particles are contained in lower diversity energy states than is the case for photons, with 60% of atoms contained in the approximately 37% of the lowest diversity states, as compared to approximately 42% for photons.In other words, using   , we are able to identify, even in such instances where intuition might suggest it to be true, common patterns within and across these different energy systems, as well as their variations.With this point made, we move to our final energy distribution.

The Fermi-Dirac Distribution.
The final distribution we use to illustrate our approach is the Fermi-Dirac distribution: where  is again a normalization constant and  is the Fermi energy [13].In the following, we calculate distributions for sodium electrons, for which  = 3.4 eV.Once again, p,   ,   , and   cannot be calculated analytically and so we rely on numerical calculations using Matlab.The Fermi-Dirac distribution differs from the previous examples in that it is not simply scaled by changes in energy.Instead, its shape changes, transforming from a skewed-left distribution, with a sharp cut-off at the Fermi energy at low temperatures, to a smooth, skewed-right distribution at high temperatures.Thus, unlike the situation for Boltzmann and Bose-Einstein distributions, one would expect the distributions of diversity for fermions such as electrons to be dependent on temperature.Figure 4 compares the results of calculating   as a function of  for electrons in sodium at temperatures of 2.7 K (the temperature of space), 300 K (representing temperatures on earth), 6000 K (the temperature of the surface of the sun), and 15 × 10 6 K (the temperature of the core of the sun).
This figure shows that the degree of diversity is the highest for fermions at low temperatures; for example, at 2.7 K, fully 70% of the lowest equiprobable diversity states are need to contain 60% of the particles, compared with only approximately 38% at 15×10 6 K.It also shows that, for sodium electrons, the diversity curve at normal temperatures on earth (300 K) is almost identical to that at very low temperatures.That is, a room temperature Fermi gas of sodium electrons has a distribution of diversity very similar to that of a "Fermi condensate."

Using 𝐶 𝑐 to Compare and Contrast Systems
With our renormalization complete for all three distributions, we sought next to demonstrate, albeit somewhat superficially, the utility of   for comparing and contrasting systems, given how widely known the results are for these three classic energy distributions.To begin with, it is usual to assume that, in the limit of high , both Bose-Einstein and Fermi-Dirac distributions reduce to Boltzmann distributions, and so the physical properties of both bosons and fermions in this limit should be those of an ideal gas.
In Figures 5 and 6, we show a comparison of all three energy distributions for temperatures of 6000 K and 15 × 10 6 K (the Bose-Einstein distribution for massless bosons is included for comparison).In these figures, it appears that, by 6000 K, the Bose-Einstein distribution for helium-4 is indistinguishable from the 3D Boltzmann distribution.Also, while the Fermi-Dirac distribution has clearly not reduced to the Boltzmann distribution even at 15 × 10 6 K, it appears to be trending towards it.However, comparison of the diversity distributions suggests that even when the energy probability density functions appear to coincide, significant physical differences remain between the systems.Figure 7 compares all the diversity curves calculated in the present work.
It is clear from Figure 7 that the distributions of diversity for a classical ideal gas and for both Bose-Einstein and Fermi-Dirac distributions are significantly different.Because these renormalized distributions are independent of temperature, this suggests that there is no limit in which the Bose-Einstein distribution for the photon becomes completely indistinguishable from the Boltzmann distribution.Even more strikingly, the distribution of diversity in a system obeying Fermi-Dirac statistics only approaches that of bosonic systems at extremely high temperatures, similar to those at the core of the sun.At lower temperatures, the Fermi gas has substantially higher degrees of diversity than all the other systems.This is because, at lower temperatures, most of the fermions are yet to surpass the barrier created by the Fermi energy and hence are all restricted to the lower end of the energy.Thus, the transformation from the usual probability distribution to a distribution of case-based entropy (  versus ) has allowed us to make direct scale-free comparisons, of the ways in which the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac energy distributions are similar or differ both internally (as a function of temperature ) and across distributions.It appears that, except for extremely high temperatures, the Fermi-Dirac distribution has a larger value of   than the others.This means that there are a larger number of Shannon-equivalent equiprobable states of energy for the Fermi-Dirac distribution as compared to the others.A speculative explanation could be that Pauli's exclusion principle does not allow for more than one fermion to occupy the same quantum state, thereby restricting the accumulation of fermions in the same state (i.e., more diversity).

Conclusion
As we have hopefully shown in this paper, while Shannon entropy  has been used to rethink probability distributions in terms of diversity, it suffers from two major limitations.First, it cannot be used to compare distributions that have different levels of scale.Second, it cannot be used to compare parts of distributions to the whole.
To address these limitations, we introduced a renormalization of probability distributions based on the notion of case-based entropy   (as a function of the cumulative probability ).We began with an explanation of why we rethink probability distributions in terms of diversity, based on a Shannon-equivalent uniform distribution, which comes from the work of Jost and others on the notion of true diversity in ecology and evolutionary biology [4,9,10].With this approach established, we then reviewed our construction of case-based entropy   .Given a probability density (),   measures the diversity of the distribution up to a cumulative probability of , by computing the length or support of an equivalent uniform distribution that has the same Shannon information as the conditional distribution of p () up to a cumulative probability .
With our conceptualization of   complete, we used it to renormalize and compare three physically significant energy distributions in physics, namely, the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac distributions for energy of subatomic particles.We chose these three distributions for three key reasons: (1) we wanted to see if   works for continuous distribution; (2) where the focus was on diversity of types and not on their rank order in terms of complexity; and (3) where the unit order of measure was both objective and widely accepted.Based on our results, we concluded that   is a vast improvement over  as it provides an intuitively useful, scale-free comparison of probability distributions and also allows for a comparison between parts of distributions as well.
The renormalization obtained will have a different shape for different distributions.In fact, a bimodal, right skewed, or other kinds of distributions will lead to a different   versus  curve.There are two interesting points of inquiry in future papers, namely, (a) how the shape of the original distribution influences the   versus  curve and (b) whether we can reconstruct the original shape of the distribution given the   versus  curve.Because of the scale-free nature of   , Complexity all distributions can be compared in the same plot without reference to their original scales.In our future work, we will endeavor to connect the shape of the   versus  curve to the shape of the original distribution.This will allow us to locate portions of the original distribution (irrespective of their scale), where diversity is concentrated, and portions where it is sparse, even though the original distributions cannot be plotted on the same graph due to huge variation in their scales.

Figure 1 :
Figure 1:   as a function of  for the Boltzmann distribution in one dimension.

Figure 4 :
Figure 4: Diversity curves for sodium electrons at a range of temperatures with   on the -axis and  on the -axis.