Hotspots Detection in Spatial Analysis via the Extended Gustafson-Kessel Algorithm

We show a new approach for detecting hotspots in spatial analysis based on the extended Gustafson-Kessel clustering method encapsulated in a Geographic Information System (GIS) tool. This algorithm gives (in the bidimensional case) ellipses as cluster prototypes to be considered as hotspots on the geographic map and we study their spatiotemporal evolution. The data consist of georeferenced patterns corresponding to positions of Taliban’s attacks against civilians and soldiers in Afghanistan that happened during the period 2004–2010. We analyze the formation through time of new hotspots, the movement of the related centroids, the variation of the surface covered, the inclination angle, and the eccentricity of each hotspot.


Introduction
Hotspot detection is a known spatial clustering process in which it is necessary to detect spatial areas on which specific events thicken [1]; the patterns are the events georeferenced as points on the map; the features are the geographical coordinates (latitude and longitude) of any event.Hotspot detection is used in many disciplines, as in crime analysis [2][3][4], for analyzing where crimes occur with a certain frequency, in fire analysis [5] for studying the phenomenon of forest fires, and in disease analysis [6][7][8][9] for studying the localization and the focuses of diseases.Generally speaking, for detecting more accurately the geometrical shapes of hotspot areas algorithms based on density [10,11] are used and they measure the spatial distribution of patterns on the area of study, but these algorithms have a high computational complexity.
In [5,12,13] a new hotspot detection method based on the extended fuzzy C-means algorithm (EFCM) [14,15] was proposed, which is a variation of the famous fuzzy Cmeans (FCM) algorithm that detects cluster prototypes as hyperspheres.With respect to the FCM algorithm, the EFCM algorithm has the advantages of determining recursively the optimal number of clusters and being robust in the presence of noise and outliers.In [5,12,13] the EFCM is encapsulated in a GIS tool for detecting hotspots as circles displayed on the map.The pattern event dataset is partitioned according to the time of the event's detection, so each subset is corresponding to a specific time interval.The authors compare the hotspots obtained in two consecutive years by studying their intersection on the map.In this way it is possible to follow the evolution of a particular phenomenon studying how its incidence is shifting and spreading through time.
In this paper we present a new hotspot detection method based on the extended Gustafson-Kessel algorithm (EGK) [14,15] for studying the spatiotemporal evolution of hotspots.Our aim is to improve the shape of the hotspots, maintaining a good computational complexity: indeed the EGK algorithm gives the cluster prototypes as hyperellipsoids and ellipses in the bidimensional case.The EGK algorithm is an extension of the Gustafson-Kessel (GK) algorithm [16] which we briefly present.
Let X = {x 1 , . . ., x  } ⊂   be a dataset composed of  patterns x  = ( 1 ,  2 , . . .,   )  , where   is the th component (feature) of the pattern x  .The GK algorithm minimizes the following objective function: where  is the number of clusters fixed a priori,   is the membership degree of the pattern x  to the th cluster ( = 1, . . ., ), V = {v 1 , . . ., v  } ⊂   is the set of points given by the centroids of the  clusters,  is the fuzzifier parameter, and   is the distance between v  = (V 1 , V 2 , . . ., V  )  and x  .
The general form of this distance is given by where A  is the norm matrix, defined to be symmetric and positive.In the FCM algorithm A  is equal to the identity matrix I.In the GK algorithm the following Mahalanobis distance [17] is used: where P  is the covariance matrix of the th cluster given by The covariance matrix P  provides information about the shape and orientation of the cluster.The length of the th axis of the hyperellipsoid is given by the root square of the th eigenvalue   of P  .The directions of the axes of the hyperellipsoid are given by the directions of the eigenvectors of the matrix P  .In Figure 1 we show an example of ellipsoidal cluster prototype.
Using the Lagrange multipliers for minimizing objective function (1), we obtain the following solution for the centroids of each cluster prototype: Figure 2: Intersection of two elliptical hotspots detected for events that happened in two consecutive periods.
where  = 1, . . .,  and   are given by: Initially the   's and the v  are assigned randomly and updated in each iteration.If  () = ( ()   ) is the matrix U calculated at the th step, the iterative process stops when where  > 0 is a prefixed parameter.This algorithm is sensitive to the presence of outliers and noise and the number of cluster  is fixed a priori; as in the FCM algorithm, we need to use a validity index for determining an optimal value for the number of clusters .In order to overcome these shortcomings, in [1,16] the EGK algorithm is proposed which is a variation of the GK algorithm: there the optimal number of clusters is obtained during the iteration process.Furthermore, the EGK algorithm is robust with respect to the presence of noise and outliers.
In this paper we propose a new approach based on the EGK clustering method for detecting hotspots and studying their spatiotemporal evolution.Taking into consideration the bidimensional case, we obtain ellipses to be approximated as hotspot area better than the circular areas produced in the EFCM method.Figure 2 shows an example of two intersecting elliptical hotspots, obtained as clusters detected by means of EGK method in two consecutive periods.
Figure 2 show three different regions: (i) the area in which the hotspot A is not intersected by the hotspot B (corresponding to A − (A ∩ B) = A − B): this region can be considered as a set of geographical areas in which the prematurely detected event disappears successively; (ii) the area of intersection A ∩ B: this area can be considered a geographical area in which the event persists in the course of time; (iii) the area in which the hotspot B is not intersected by the hotspot A (corresponding to B − (A ∩ B) = B − A): this region can be considered as a set of geographical areas in which the prematurely not detected event propagates successively.
We can study the spatiotemporal evolution of the hotspots by analyzing the interactions between elliptical hotspots detected for consecutive periods, by verifying the presence of clusters in areas in which clusters have not yet been detected previously and the disappearance of clusters in areas previously covered by hotspots.
In this research we present a method for studing the spatiotemporal evolution of hotspots areas of war in Afghanistan; we apply the EGK algorithm for comparing consecutive years' event datasets corresponding to positions of Taliban's attacks against civilian and soldiers.Each event corresponds to the geolocalization of the site where Taliban's attack happened as well.
We study the spatiotemporal evolution of the hotspots by analyzing the intersections of hotspots corresponding to two consecutive years, the displacement of the centroids, the increase or reduction of the hotspots areas, and the emergence of new hotspots.
In Section 2 we give an overview of the EGK algorithm.In Section 3 we present our method for studying the spatiotemporal evolution of hotspots in spatial analysis.In Section 4 we present the results of the spatiotemporal evolution of hotspots.Our conclusions are contained in Section 5.

The EGK Algorithm
In the EGK algorithm we consider clustering prototypes given by hyperellipsoids in the -dimensional feature's space.The th hyperellipsoidal cluster prototype   is characterized by a centroid v  = (V 1 , . . ., V  ) and a mean radius   and we say that   belongs to   if   ≤   .

Hotspots Detection and Evolution in Military War
Each pattern is given by the event corresponding to a place in which an attack has occurred.The two features of the pattern are the geographic coordinates of this place.We divide the event dataset into subsets corresponding to the events that occurred in a specific year or set of years.For each subset of events we apply the EGK algorithm to detect the final cluster prototypes.
The dataset is extracted from the URL http://www .acleddata.com/data/asia/; the data are the geolocalizations of Taliban's attacks in Afghanistan during the period 2004-2010.The EGK algorithm is encapsulated in the ESRI/ArcGIS tool. Figure 3 shows the mask used for setting the parameters and running the EGK algorithm.
We can set other numerical fields for adding other features to the geographical coordinates.Initially we set the initial number of clusters, the fuzzifier  (equal to 2 by default), and the error threshold for stopping the iterations (equal to 0.01 by default).At the end of the process we displayed on the form of the number of iterations, the final number of clusters, and the error calculated at the last iteration.The resultant clusters are shown as ellipses on the  In Figure 4 we show the mask used for displaying the information of each elliptical prototype detected: centroid's coordinates, length of each semiaxis, and orientation of the ellipses with respect to the horizontal plane on the geographical map.
The final process concerns the comparative analysis of the hotspots obtained by the final clusters resulting for each subset of events.Figure 5 shows an example of the display of hotspots obtained as final clusters corresponding to three consecutive years.
In order to assess the expansion and the displacement of any hotspot, we measure the area covered by each hotspot, the distance between the centroids of two intersecting hotspots detected in consecutive periods, the variation of the inclination angle, the eccentricity, and the length of both semiaxis.
We present the details relating to the comparison of the hotspots by considering the event data that occurred in the five periods.In Figures 6,7,8,9, and 10 we show the hotspots detected.By analyzing Figures 6-8 we can deduce that in the period 2004-2008 seven hotspot areas approximated as ellipses are present; in these periods each hotspot modified only slightly its angle, width, and position of the centroid.In the years 2009 and 2010 a new hotspot is detected in a region neighboring with Turkmenistan.In Figure 10 the hotspots obtained for two consecutive years 2009 and 2010 are overlapped as well.In blue (resp., red) we enumerate the hotspots corresponding  In Table 2 the first column shows the labels of each hotspot; the second and third columns show the area, in km 2 , of the hotspot detected in 2009 and 2010, respectively.The fourth column (resp., fifth) shows the intersection area of the two hotspots (resp., the percentage of area of the hotspot detected in 2009 covered by the corresponding hotspot detected in 2010, that is, the ratio "intersection area/area hotspot detected in 2009").
The results in Table 2 show that over 65% of the area of each hotspot detected in 2009 is also covered by the corresponding hotspot detected in 2010.Another significant result is the increase of the area of the hotspot 8, which exceeds 2 × 10 4 km 2 in 2010.In Table 3 we show the eccentricity of each hotspot and the distance between the centroids of each hotspot detected in 2009 and the corresponding one detected in 2010.
The results show that the eccentricity increases significantly in 2010 for hotspots 4 and 6, whereas it decreases for hotspot 3; the eccentricity remains almost unchanged for the remaining hotspots in 2010.Another significant result is the

Conclusions
We present a new approach for detecting hotspots in spatial analysis using the EGK clustering method encapsulated in a GIS tool.Similar to the EFCM algorithm, the EGK method is robust with respect to noise and outliers and we obtain the optimal number of the clusters iteratively during the process; furthermore, it has the advantage to detect hotspots of elongated shape.In our experiments we consider the site of Taliban's attacks in Afghanistan during the period 2004-2010.The spatial dataset is partitioned into subsets in order to study the evolution of the hotspots through time.We study the evolution of each hotspot in terms of movement of the centroids, surface covered, inclination, and eccentricity.

Figure 1 :
Figure 1: Example of ellipses cluster prototype using the GK algorithm.

Figure 3 :
Figure 3: Mask created in the ESRI/ArcGIS tool for managing the EGK process.

Figure 4 :
Figure 4: Mask created in the ESRI/ArcGIS tool for displaying the information of the detected cluster prototypes.

Figure 5 :
Figure 5: Spatiotemporal evolution of hotspots detected in three consecutive years.

Table 1 :
Results of the EGK applied to the event's subsets.Year Initial number of clusters Final number of clusters | () −  (−1) × 10 −2 1 × 10 −2 geographical map and can be saved in a new geographical layer.
The results show the formation, starting from 2009, of a new hotspot in the north-western zone neighboring with Turkmenistan.The results of the comparison of the hotspots detected in 2009 and 2010 show that this hotspot is increased with an extension of (about) 2 × 10 4 km 2 .

Table 2 :
Areas of hotspots detected in 2009 and 2010.

Table 3 :
Eccentricity and centroid's distance between the hotspots detected in 2009 and 2010.