Power Law and Entropy Analysis of Catastrophic Phenomena

1 Department of Electrical Engineering, Institute of Engineering, Polytechnic of Porto (ISEP), Rua Dr. António Bernardino de Almeida, 431, 4200-072 Porto, Portugal 2 Institute of Engineering, Polytechnic of Porto (ISEP) and Center of Mathematics of the University of Porto, Rua Dr. António Bernardino de Almeida, 431, 4200-072 Porto, Portugal 3 UISPA, IDMEC-Polo FEUP, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal


Introduction
Power laws (PLs) have been widely reported in the modeling of distinct phenomena and have been associated with long memory behavior, self-similarity, fractal structures and fractional calculus.In [1], for example, PLs are interpreted as a manifestation of the long memory property of systems with fractional dynamics and, in [2], several complex systems exhibiting PL behavior are analysed in the perspective of fractional dynamics.Fractional calculus and PLs are used in [3,4] to model the dynamics of financial markets.In [5] the complexity of the human body is characterized through fractal measures and its dynamics described by means of fractional calculus.
Catastrophic events are characterized by a huge severity, usually defined by a large number of casualties.By catastrophic events, we mean wars, terrorist attacks, tornadoes, earthquakes, floods, and landslides.The distribution of the number of casualties in these events is proved to be a PL [6][7][8][9][10][11][12].
PL distributions were first mentioned in 1896, when Pareto described the distribution of income [13].Pareto proved that the relative number of individuals with an annual income larger than a certain value  was proportional to a power of .This has been known by Pareto distribution.After this work, Auerbach [14] demonstrated an analogous result for city size distributions.Ranking cities from 1 to , with the city with bigger population ranked as 1, Auerbach demonstrated that the product of cities populations by their ranks was approximately constant, for a given territory.Estoup [15] and Zipf [16,17] applied PLs to words frequencies in texts.They found that there are words that are used more often than others and the distribution of word frequencies follows a PL.Zipf [17] described the distribution of city sizes by a Pareto distribution.
Often, to show that a certain data set follows a PL distribution, researchers depict a plot of the size versus frequency of the event studied.In logarithmic scales, they obtain a straight line with negative slope.In the case of Mathematical Problems in Engineering the Pareto distribution, the behavior is exactly linear, and is given by ln ( [ ≥ ]) = ln  − ln α − α ln , where  is a random variable following a PL distribution, α > 0, C = /α > 0. In these distributions, the tail falls asymptotically according to the value of α, translating in heavy tails, comparatively to other distributions.Zipf 's law is a special case of the Pareto's law, with coefficient α = 1.
In many cases a single PL holds for the entire range of the random variable that represents the system.In other cases, the statistical distribution is better described by multiple PLs [21].In such cases, different PLs, characterized by distinct PL parameters, fit, more adequately, the real data.Double PL behaviors have been pointed out by others in different phenomena.For example, in [22] many instances are shown of two PLs expressed by means of a generalized beta distribution function and, in [23], the double PL behavior is explicitly studied in the frequency of words in texts.Moreover, beyond the ranking problem, two PLs are manifested in other type of problems such as in turbulence, earth magnetic pole fluctuations, paleolake sedimentation density subject to volcanism, and avalanche distributions, to mention a few [24].
In this paper, we analyze the statistical distributions of the number of victims caused by catastrophic phenomena and find double PL behavior.Moreover, we plot the PL parameters, corresponding to several events, and observe an interesting pattern in the charts, where the lines that connect each pair of points defining the double PLs are roughly parallel to each other.Then, a complementary data analysis is performed by means of the computation of the Shannon entropy.The results reveal relationships hidden in the data that may trigger a future comprehensive explanation of this type of phenomena.
Bearing these ideas in mind, this paper is organized as follows.In Section 2, the results found in the literature concerning PLs and casualties in natural and human-made disasters are summarized.Section 3 analyses the PL behavior of catastrophic phenomena using data from real disasters.In Section 4 several entropy measures are used to characterize the data.Finally, in Section 5, the main results and conclusions of this paper are discussed.
Many attentive explanations have arisen in the literature.Nevertheless, a complete understanding of these patterns is a complex task.Important and intricate political, geographical, historical, and, even cultural, factors oppose to a better understanding.Predicting the number of casualties in natural or human-made disasters is extremely important in developing predisaster strategies.Aspects like rationalization of medical supplies and food, gathering emergency teams, organize shelter spaces, amongst others, have to be dealt with, in order to minimize the damage.A PL behavior is indicative of a particular property of a system, it indicates that the size of an event is inversely proportional to its frequency.In this sense, large casualties are associated with low frequency phenomena, and more frequent events are less harmful in terms of preserving human lives [7,25].Examples of phenomena with low probability and huge casualties are the two world wars (WWs), high magnitude earthquakes, strong tornadoes, huge tsunamis, and amongst others.
In 1948, Richardson [7], analyzed domestic and international cases of violence, in the period from 1820 to 1945.He distributed the cases, according to casualties measured in powers of 10, into five categories.The two WWs were classified in the highest category.In a later work [25], the same author showed that if the frequency of an occurrence decreased by a factor close to three, then the number of casualties increased by a power of 10.
Guzzetti [26] considers landslide events in specific periods in different countries, such as Italy, Canada, Alps, Hong Kong, Japan, and China.He shows that the plot of the cumulative distribution function of the number of landslide events versus the number of casualties is well approximated by a straight line.This result suggests a PL distribution of the data.
Cederman [11] followed Richardson's work [7,25].He used data from the Correlates of War (COW) Project [29], focusing on interstate wars.He computed the cumulative relative frequency of war size and showed that it obeyed a PL.The author proposed a self-organized critical dynamical system, that replicated the PL behavior seen in real data.Its model allowed conflict to spread and diffuse, potentially over long periods of time, due to the quasi-parallel execution.
In 2005, Jonkman [27] studied the distribution of killings in global events, focusing on the number of human deaths caused by three types of floods (river floods, flash floods, and drainage issues), between January 1975 and June 2002.The author plotted the global frequency of events with  or more deaths versus .He observed a PL behavior for earthquakes but not for flood data.Becerra et al. [30] use the same data set as Jonkman [27], but consider all disasters combined, both globally and disaggregated by continent.They obtained straight-line log-log plots for all disasters combined.The slopes of the casualties PL distributions were smaller than those for modern wars and terrorism.The explanation for this remained an open question.Another unsolved issue was the existence of PL behavior in combined disasters and not in individual disasters, such as floods.Here it is worth mentioning that casualties in earthquakes verified a PL distribution [6,27,30].
Johnson et al. [28] suggested a microscopic theory to explain similarity in patterns of violence, such as war and global terrorism.The similarity was observed regardless of underlying ideologies, motivations, and the terrain in which events occurred.The authors introduced a model where the insurgent force behaved as a self-organizing system, which evolved dynamically through the continual coalescence and fragmentation of its constituent groups.They analyzed casualties' patterns arising within a given war, unlike previous studies that focused on the total casualty figure for one particular war [7,11,25,31].A PL behavior fitted well the data not only from Iraq, Colombia, and non-G7 terrorism, but also with data obtained from the war in Afghanistan.The PL parameter for Iraq, Colombia, and Afghanistan was (close to) α = 2.5.This value of the coefficient equalized the coefficient value characterizing non-G7 terrorism.In the literature, the PL parameter value was α = 2.51 for non-G7 countries [32] and α = 1.713 for G7 countries.This result suggested that PL patterns would emerge within any modern asymmetric war, fought by loosely-organized insurgent groups.
In 2006, Bogen and Jones [33] treated the severity of terrorist attacks in terms of deaths and injured.They applied a PL distribution to victim/event rates and used the PL to predict mortality due to terrorism, through the year 2080.Authors claimed that these PL models could be used to improve strategies "to assess, prevent and manage terror-related risks and consequences".
Clauset et al. [34] studied the frequency and the number of casualties (deaths and injuries) of terrorist attacks, since 1968.They observed a scale-invariance behavior, with the frequency being an inverse power of the casualties.This behavior was independent of the type of weapon, economic development, and distinct time scales.The authors presented a new model to fit the frequency of severe terrorist attacks, since previous models in the literature failed to produce the heavy tail in the PL distribution.Their model assumed that the severity of an occurrence was a function of the execution plan, and that selection tools were better suited to model competition between states and nonstate actors.Finally, researchers claimed that periodicity was a common feature in global terrorism, with period close to roughly 13 years.
Bohorquez et al. [12] studied the quantitative relation between human insurgency, global terrorism and ecology.They introduced a new model to explain the size distribution of casualties or the timing of within-conflict events.They considered insurgent populations as self-organized groups that dynamically evolved through decision-making processes.The main assumptions of the model were (i) being consistent with work on human group dynamics in everyday environments, (ii) having a new perception of modern insurgencies, as fragmented, transient, and evolving, and (iii) using a decision-making process about when to attack based on competition for media attention.Authors applied a PL distribution to Iraq and Colombia wars, with parameter value close to α = 2.5.A coefficient value of α = 2.5 was in concordance with the coefficient value of α = 2.48 ± 0.07 obtained by Clauset et al. [34] on global terrorism.A PL fit to Spanish and American Civil wars revealed a PL parameter value smaller (around α = 1.7).Authors claimed that their model suggested a remarkable link between violent and nonviolent human actions, due to its similarity to financial market models.

Power Law Behavior in Catastrophic Phenomena
In this section we investigate the statistical distributions of random variables that represent the number of human casualties in several human-made and natural hazards.Data from the EM-DAT International Disaster Database (http://www.emdat.be/)and the Global Terrorism Database (GTD) (http://www.start.umd.edu/gtd/) are analyzed.The EM-DAT database contains information on over than 18000 worldwide natural and technological disasters, from 1900 to present.The EM-DAT is maintained by the Centre for Research on the Epidemiology of Disasters (CRED) at the School of Public Health of the Université Catholique de Louvain, located in Brussels, Belgium [35].The GTD database is an open-source database that includes information on more than 98000 worldwide terrorist attacks, from 1970 up to 2010 [31].
PLs are observed in several natural and man-made systems.Examples of single and double PLs in real data are given in Figures 1 and 2, respectively.The former represents the complementary cumulative distribution of the severity of industrial accidents in Central/South America over the period 1900-2011.The adopted measure to quantify the severity of an event is the total number of fatalities.The depicted graph corresponds to a rank/frequency log-log plot.To construct the graph, we first sort the data (i.e., the accidents) in decreasing order according to their severity, and number them, consecutively, starting from one [36].Then a normalization of the values is carried out, meaning that the number of fatalities (-axis) is divided by the corresponding highest value, and the rank (-axis) is divided by the rank of the smallest event.Finally, PLs are adjusted to the data using a least squares algorithm.All the log-log plots presented in this paper are made following this procedure.
Figure 2 corresponds to the distribution of casualties caused by earthquakes in Central/South America in the period 1900-2011, representing one event that can be approximated by a double PL.
We analyzed the data available at the EM-DAT database in terms of disaster type (DT  ) and disaster location (DL  ),  = 1, . . ., 11 and  = 1, . . ., 6 categories, respectively: DT  = {Drought, Earthquake, Epidemic, Extreme temperature, Flood, Industrial accident, Mass movement wet, Storm, Transport accident, Volcano, Wildfire}; DL  = {Africa, North America, Central & South America, Europe (including Russia), Asia (not including SE Asia), Oceania (including SE Asia)}.The period of analysis was 1900-2011 for every case.The total number of combinations (location/type) is 11 × 6.Nevertheless, for 14 cases, there is insufficient data to compute reliable statistical distributions.For all cases, taking the number of casualties as the variable of interest, we obtain statistical distributions that can be approximated by either a SPL (16 cases) or a DPL (36 cases), similar to the ones depicted in Figures 1 and 2.
In Figure 3 we depict the locus of the parameters ( C , α ),  = 1, 2, corresponding to the analyzed cases.As can be seen, an interesting pattern emerges, where the lines that connect the pairs of points that characterize the DPLs have identical orientation.This geometrical pattern reflects a relationship between the two parts of the DPL distributions (DPL1-part closer to the head; and DPL2-part closer to the tail).Besides the observation that α2 > α1 , in all cases, further investigation on the reason for this behavior is needed.
We pursued our study with the analysis of the GTD database.First, the events associated to human casualties were grouped by year (  ) starting in 1980 up to 2010 (except 1993, because there is no data available):   = {1980, . . ., 2010} \ {1993},  = 1, . . ., 30.We found that all the statistical distributions can be approximated by DPLs.In Figures 4 and 5, the time evolution of the parameters of the DPLs ( C , α ),  = 1, 2, is shown.Regarding the parameters C , it can be seen that they have identical behavior, although C2 varies more than C1 and is always smaller than it.
With respect to α , we have a similar evolution but, in this case, the parameter α2 is always greater than α1 .As severe terrorist attacks correspond to points closer to the tail of the distribution, DPL2, which is characterized by a larger α, this means that those events are more similar between each other than the smaller events (that correspond to DPL1).
To complement the analysis with respect to the date of the occurrences, the parameters ( C , α ) of the PLs, corresponding to   , were plotted (Figure 6).As can be seen, a pattern similar to the described previously (Figure 3) is observed.
We have also studied the distributions of the casualties in terrorist attacks, occurred in the period 1970-2010, but with respect to other criteria, namely, the type of used weapon (  ), region where the event took place (  ), target (  ), and type of attack (  ).Each criterion was then divided into  = 1, . . ., 6,  = 1, . . ., 13  Bombing/Explosion, Facility/Infrastructure Attack, Hijacking, Hostage Taking (Barricade Incident), Hostage Taking (Kidnapping), and Unarmed Assault}.Most cases are characterized by DPLs.However, in a few situations a SPL fits better to the data.The main results are summarized in Table 1.Moreover, we observed that the parameters corresponding to all distributions characterized by DPLs display a pattern similar to the ones mentioned previously (Figures 3 and 6), where the lines connecting the slopes and intercepts of DPL1 (α 1 , C1 ) to its companion DPL2 (α 2 , C2 ), for the same data set, have identical orientation in the ( C, α) Cartesian space.

Entropy of Catastrophic Phenomena
In this section we analyse the entropy of data collected from the GTD database, that is, data related to terrorism.To calculate the entropies we construct histograms of relative frequencies, using bins of width one (one casualty), and approximate the probabilities   by the relative frequencies.We present results obtained for terrorist events grouped by year (  ), as defined in the previous section.Nevertheless, it should be noticed that similar results are obtained for all other human-made and natural hazards.
Clausius [37] and Boltzmann [38] were the first authors to define entropy in the field of thermodynamics.Later on, Shannon [39] and Jaynes [40] applied their results to information theory [41].
The most celebrated entropy is the so-called Shannon entropy  defined by The Shannon entropy represents the expected value of the information − ln   .Therefore, for the uniform probability distribution we have   =  −1 and the Shannon entropy takes its maximum value  = ln , yielding the Boltzmann's famous formula, up to a multiplicative factor  denoting the Boltzmann constant.Thus, in thermodynamic equilibrium, the Shannon entropy can be identified as the "physical entropy" of the system.Rényi and Tsallis entropies are generalizations of Shannon's entropy and are given by, respectively,  Tsallis entropy reduces to Rényi entropy when  → 1. Tsallis entropy was applied to diffusion equations [42] and Fokker-Planck systems [43].Rényi entropy has an inverse power law equilibrium distribution [44] and satisfies the zeroth law of thermodynamics [45].The two parameters Sharma-Mittal entropy [46] is accepted as a generalization of Tsallis, Rényi, and Boltzmann-Gibbs entropies, for limiting cases of the parameters [47].
Recently, more general entropy measures have been proposed in the literature, where the additivity axiom has been relaxed.For instance, Ubriaco [48] proposed the following formula for the fractional entropy: that has the same properties as the Shannon entropy except additivity.Applications of entropy in distinct complex systems can be found in [49][50][51][52][53][54][55][56][57].
As discussed in the previous section, the statistical distributions of real data can be approximated by either single or double PLs.In the latter case, we study, not only the total entropy, but also the entropy associated to each part of the distributions.Therefore, we compute the entropy associated to DPL1 and DPL2, that approximate the first and second part of the distributions, respectively.When adopting this procedure we are restricted to the Shannon and Ubriaco entropies, as Tsallis' and Rényi's do not admit the associativity described above.
In the sequel we present several results of the analysis, taking into account the grouping criteria   ,   ,   ,   , and   .Figure 12 depicts the Shannon entropy versus parameter α for all statistical distributions.The black squares (denoted "SPL") correspond to the plot of α versus the total entropy of the respective distributions, .The black circles (denoted "DPL1") are the plot of α1 versus the entropy associated to the first parts of the distributions,  1 .The white circles (denoted "DPL2") represent the plot of α2 versus the entropy associated   , as a function of the year of the events,   , and parameter .
to the second parts of the distributions,  2 .As can be seen, for the distributions that behave as single PLs, higher entropies correspond to the lower values of the parameter α and the two parameters are linearly related.A similar pattern is observed for the parameters corresponding to DPL1.For DPL2 the parameter α2 increases with entropy, but the almost linear relation between both parameters remains.
Figure 13 shows identical results for the Ubriaco entropy.The plot corresponds to  = 0.5, nevertheless, identical results are obtained for other values.
It is worth noticing that (5) is similar to Poisson's law of an adiabatic reversible process, involving ideal gases, given by where variables   and   represent pressure and specific volume, respectively.Equation ( 6) implies that   = const.Parameter  is called Poisson's coefficient, taking values  = 5/3 ≃ 1.67 and  = 7/5 = 1.40 for monoatomic and diatomic gases, respectively.Additionally, it should be noticed that the absolute value of the exponent  is very similar to the one observed for an ideal gas undergoing a reversible adiabatic process. , α); "DPL1" corresponds to ( () 1 , α1 ); "DPL2" corresponds to ( () 2 , α2 );  = 0.5.

Conclusions
PLs have been widely reported in the modeling of distinct phenomena and have been associated with long memory behavior, self-similarity, fractal structures, and fractional calculus.
In this paper we reviewed interesting and important results on PLs distributions and their applications to the modeling of the number of victims in catastrophic events.We found double PL behavior in real data of catastrophic occurrences, in particular, terrorism.We have plotted the two PLs parameters, ( C , α ),  = 1, 2, corresponding to certain events, and observed an interesting pattern in the chart, where the lines that connect each pair of points defining the double PLs are almost aligned to each other.We have also computed the entropy of the data sets.This complementary analysis of the numerical data revealed extra relationships but the fact is that these phenomena have a dense and rich volume of characteristics and further research efforts are needed to a deeper understanding.

Figure 1 :
Figure 1: Rank/frequency log-log plot corresponding to the distribution of casualties caused by industrial accidents in Central/South America over the period 1900-2011 (min size = 10; max size = 2700; max rank = 66).

Figure 2 :
Figure 2: Rank/frequency log-log plot corresponding to the distribution of casualties caused by earthquakes in Central/South America over the period 1900-2011 (min size = 1; max size = 222570; max rank = 179).

Figure 3 :
Figure 3: Locus of the parameters ( C , α ),  = 1, 2, that characterize the PLs corresponding to the number of casualties in certain combinations of disaster type/location, DT  /DL  .

Figure 9 :
Figure 9: Total Tsallis normalized entropy,  () , as a function of the year of the events,   , and parameter .

Figure 10 :
Figure 10: Total Rényi normalized entropy,  () , as a function of the year of the events,   , and parameter .

Figure 11 :
Figure 11: Total Ubriaco normalized entropy,  () , as a function of the year of the events,   , and parameter .

Table 1 :
PL fit to the distributions of casualties in terrorist attacks.