Finding Possible Precursors for the 2015 Cotopaxi Volcano Eruption Using Unsupervised Machine Learning Techniques

Cotopaxi Volcano showed an increased activity since April 2015 and evolved into its eventual mild eruption in August 2015. In this work we use records from a broadband seismic station located at less than 4 km from the vent that encompass data from April to December of 2015, to detect and study low-frequency seismic events. We applied unsupervised learning schemes to group and identify possible premonitory low-frequency seismic families. To find these families we applied a two-stage process in which the eventswere first separated by their frequency content by applying the k-means algorithm to the spectral density vector of the signals and then were further separated by their waveform by applying Correntropy and Dynamic Time Warping. As a result, we found a particular family related to the volcano’s state of activity by exploring its time distribution and estimating its events’ locations.


Introduction
Volcano monitoring is a crucial task on potentially and currently active volcanoes. Particularly, detecting transitions between states of volcano activity is of interest as it is essential for risk management and mitigation. Monitoring can be performed in several manners, but, among all the geophysical signals used, one that adequately represents the volcano's activity is the seismicity [1]. Volcanic seismicity is very varied compared to tectonic seismicity due to different physical processes at the source, and it is followed by varied event classification and terminology [2]. In general, 5 main types of events are widely recognized in volcanic environments: Volcano-Tectonic (VT), Long Period (LP), Very Long Period (VLP), Tremors (TR), and Hybrid (HB) events. While VTs are associated with fractures within the volcanic edifice, high attention has been paid to LP events as possible precursors for eruptions [3]. LPs are associated to resonance of structures within the volcanoes due to fluid movements [4] and have most of its energy in the low frequency band usually ranging from 0.3 to 3 Hz [5] or up to 5 Hz [3]. VLPs with periods from 2 to 100 s are directly related to mass movement and can appear in transitions of volcanic state of activity [6].
To promptly assess changes in volcanic activity is necessary to detect and identify these seismic-volcanic events in near real-time. However, this recognition might not be possible or might be unreliable in volcano observatories where there are limited personnel or there are no fully automated monitoring systems implemented during crisis, as large amounts of seismic data can overwhelm the monitors. This problem can be solved by using catalogs of seismic events previously recorded and subsequently train a system to detect and classify these events accordingly. This approach is known as supervised machine learning (SML) and many observatories around the world have transitioned or are transitioning to this way of operation. Some volcanoes where this approach has been studied/implemented are Galeras [7] and Nevado de Ruiz [8] in Colombia, San Cristobal and Telica [9] in Nicaragua, Colima [10] in Mexico, Merapi [10] in Indonesia, Etna and Stromboli [9] in Italy, Piton de la Fournaise [10] in France, and Deception Island [11] in Antarctica.
SML techniques applied on volcano monitoring systems have been proved useful for reliable detection and classification of seismic events, but they require a high quality catalog of volcano seismic events to be implemented robustly.

2
International Journal of Geophysics Therefore, SML implementation can be hard or impossible on volcanoes with poor quality seismic catalogs or not having any at all. Poor quality seismic catalogs are common and some reasons are fairly new seismic networks, low rate activity, poor operator classification, incorrect or nonexistent instrument, and path or site deconvolution.
In absence of high quality catalogs other machine learning techniques can be implemented to mitigate volcanic crises, instrument and systems failures, or other logistic difficulties. An effective alternative to rapidly identify and distinguish seismic events is performing unsupervised classification. Many studies have used unsupervised classification either to address several situations as a complementary tool to SML or just to look for new patterns in data. For instance, there have been works to analyze specifics types of events, given a clean dataset, such as tremors [12][13][14][15] and long period events [16] or to directly try to classify different types of volcano seismic events with no previous cleaning or labeling [17]. Unsupervised learning has the advantage that it does not need labeled data beforehand (so the data preprocessing is much faster and even in some cases removes human bias) and also that it can be applied to search for particular patterns (or families of events) in the data given by the data itself. The disadvantage is that it is much less reliable than supervised learning in finding these families and that it needs human validation after the classification has been performed. In this work, we perform an unsupervised learning process on automatically picked, not previously labeled events possessing low frequencies that occurred from April to December of 2015 at Cotopaxi Volcano, Ecuador, which included the volcano's short reawakening.
Cotopaxi Volcano (0.68056S, 78.4378W) is a ∼5897 m high active stratovolcano located in the central Andes cordillera of Ecuador. This volcano is probably the most hazardous in the country due to near populated areas, historic damaging lahars, and recurrent eruptions in the recent history. In the last decades, the volcano has presented periods of increased activity, which included augmented seismic rate, deformation, and continuous degassing. At the beginning of April 2015 the volcano showed an increase on general seismic activity comprised mostly of LPs. On June 4, the seismic activity shifted back and forth from transient events (mostly LPs) to tremors (however these tremors were not the merging of repeating LPs) [18]. On the 13th of August the transient signals appeared again in an accelerated fashion few hours before the eruption on the 14th the first after more than 70 years of quiescence. This eruption was the peak of the activity which diminished progressively into the last significant gas emission on November 2015. Minor activity was still detected through 2016 [19].
Low frequency events have been detected in Cotopaxi since 1989 [20] and have been associated with possible magma intrusions [21]. Since continuous detections of low frequency events are reported throughout the 2015 Cotopaxi reawakening, we look for families within these events in a two-staged manner. First, we search for similarities by comparing their Power Spectrums (spectral density vectors) and then, after groups with similar frequency content are obtained, we compare the waveforms with two methods, Dynamic Time Warping and Correntropy. Finally, applying these procedures, we find a family of low frequency events, which relate to the volcano's state of activity, and they could be precursors, without previously selecting or classifying the events and without heavily filtering the signals.

Data Processing and Methods
In mid-2006, five seismic stations were installed on the flanks of Cotopaxi volcano (see [22]; see Figure 6 left). These stations feature Guralp CMG-40T broadband seismometers with flat responses between 0.02 and 60 s and they are connected to Smart24D digitizers that sample the signals at 50 Hz and send the data to the Instituto Geofísico-EPN in Quito [22,23]. We only process data from BREF station as it is the closest (∼ 3 km) station to the summit and presents the highest signal-to-noise ratio. In particular, we are interested on events that have low frequency content (but are not limited to LPs or VLPs, meaning that they could contain high frequencies too such as hybrid events) and are recorded on the vertical component of BREF station from April to December 2015. To identify events with low frequency content we first apply a Butterworth fourth order bandpass-filter between 0.05 and 1 Hz to the daily seismic traces. After signals were filtered, a classic STA-LTA is performed on the filtered daily seismic traces and detection triggers were obtained from the characteristic functions with a detection threshold of 4.5 (found by exploration) to obtain 1655 events with low frequency content. With the aforementioned procedure we secure that the detected signals have a low frequency content; however, as will be explained later, we are also interested in the high frequencies that these events possess for the application of the unsupervised learning algorithms. From this process a total of 1655 events are detected on the whole period as shown in Figure 1.
On the detected signals we extend the frequency analysis to 0-10 Hz, as high frequencies can reveal fracture processes [24] and ultimately higher frequencies can help us to better discriminate events. The logic of using a broader frequency band is that events possessing both, higher and lower frequencies, might be related to mixed volcanic processes or could even be shallow LPs [1,20] and should be taken into account as possible signs of fluid movements. The first step of the clustering consists of grouping events based on their frequency similarities in the band of 0-10 Hz. To group these events, we calculate their power spectra and sample them as well as calculating the max of each interval of size 0.2 Hz. Since the 0.2 Hz step is a relatively small one, adequate information on the frequencies is preserved to be used as a differentiation criterion between events. We show these vectors separated in different colors in Figure 2. Then we concatenate both vectors to obtain a feature vector of length equal to 100. Feature vectors of similar size have been constructed in other works for the identification of families or subfamilies of events in an unsupervised manner [14][15][16][17] and have succeeded in identifying categories of events [17]. Signals were not transformed to velocity series since that processing yields no effect while performing the unsupervised learning. Another possible approach is explored by selecting  the mean frequency power over the intervals instead of the sample every 0.2 Hz. While this approach gives slightly better results after clustering has been performed (discussed later), a 95% coincidence in groupings is found after comparing them and the first procedure is maintained since a better match in frequencies is required for similar events to be grouped (averaging can act as a filter for the frequency vector).
We compare and group feature vectors through the kmeans scheme [25] using the Euclidean metric, setting the number of families (k) equal to 3. In this study, this number is chosen based on two criteria, the Davies-Bouldin index and the elbow criterion. In the first criterion, the index is smaller for k = 3 than for k = 2 or k = 4, except in some cases for the alternative approach discussed earlier in which average overfrequency intervals are chosen instead of 0.2 Hz sampling. The DB-index was slightly higher in the first approach but is consistently slower in families of size 3, supporting the choice of the first approach. The elbow criterion also shows that the number of families could be chosen as either 3 or 4, so the first option is chosen based on the DB-index. Higher number of families will automatically produce groups more similar in terms of frequency content, but this could overshadow the fact that some events might be different in overall spectral content but could have similar low frequency signals; hence we choose broader families. From this grouping out of the 1655 events, 1075 are grouped together in a family with interesting time distribution and signals ( Figure 1). This family possesses events with a wide range of frequencies but with an important content on the lower frequencies (Figure 3), hereafter referred as family 1. Also family 1 represents the 65% of the total events and permits robust statistic quantification. Further classification is made in the time domain for events within family 1 as waveform character can also be used to identify similar physical processes based on similar event shapes. In the second clustering stage, we compare signals using two techniques (for verification purposes): Dynamical Time Warping (DTW) and Correntropy. We use these two methods because they account for (dis)similarities in data which may not be linear contrary to simple cross-correlation between signals. Thus, these techniques are not used for detection purposes but to define (dis)similarities among the signals previously detected by STA/LTA and grouped by Kmeans.
DTW algorithm has been widely used in speech recognition and looks for the optimal match between time series by "warping" the data to optimally align both series [26]. The idea behind the algorithm is to find the best way to alter one time series' data points to produce the others' , with the restriction that the endpoints of the series coincide as schematically shown in Figure 4. The cost of this warping defines a measure of dissimilarity between the time series: the harder it is for the series to be matched, the more dissimilar the two series are. This nonlinear dissimilarity measure is used in supervised classification schemes in other contexts [27] as well as in seismology, outperforming cross-correlation [28].
Correntropy is a generalization of cross-correlation and is often called Generalized Correlation Function [30]. While cross-correlation only takes into account second order moments (covariance) of the data, with a Gaussian kernel, Correntropy takes into account all even moments of the data and induces a dissimilarity measure that can be used in supervised and unsupervised learning [31]. The induced metric (1) derived from the definition Correntropy (Correlation Induced Metric) with the Gaussian kernel is where X and Y are two vectors of size N and is known as the kernel bandwidth which is related to the extent of the space in which CIM acts as L0, L1, or L2 norm. The kernel bandwidth parameter was set to 0.1 in this work after performing tests from ranges 0.1 to 1.0 in steps of 0.1 (large values of this parameter induce Correntropy to behave like Correlation) and visually verifying that large values of CIM showed generally different shapes between pairs of events.
Using these two methods, dissimilarities between every pair of signals were calculated for family 1. Since Correntropy needs to be calculated for vectors of the same size, signals were aligned with respect to their onset (picked automatically) and then zero-padded to be of the same size. DTW is also applied to the same signals for performance comparisons. Dissimilarities are calculated for raw data as well as data filtered between 0.3 and 1 Hz since we are mainly interested on the low frequency content of the data. Also a normalization against the maximum amplitude was performed for each signal to only account for waveform shape similarity rather than event amplitude size. Finally, time domain comparisons difference matrices are obtained and used to apply hierarchical clustering.
Hierarchical clustering is applied to the differences matrices using Ward's method, because they verify positive definiteness [32]. Several clusters are obtained within family 1, but one stands out ( Figure 5) during 2015, as it appears in similar fashion from Correntropy's and DTW's difference matrices, hereafter referred as Cluster A. Events in Cluster A appear 94% of times in both procedures and they will be further validated as their temporal appearance coincides with key activity regimes in Cotopaxi Volcano during the 2015 eruption [18]. Cluster A consisted of 308 events.

Analysis and Results
The analysis of the high coincidence family between different methods was done in different manners: by looking at the events temporal distribution and also by looking at the shape of the events themselves.
Two approaches are used to investigate events in Cluster A: temporal distribution and waveform character. In terms of waveform character, low frequency contents of these events repeat almost identically during 2015 ( Figure 5). This constitutes evidence of a possible source that is maintained over time in the volcano, and that might indicate the base level of activity of the volcano. Another highlight is that events in Cluster A almost disappear from early June to mid-August, during increased tremor activity, but reappear in an International Journal of Geophysics accelerated manner just days before the eruption. On August 13th, just a few hours before the eruption, these events appear more than 5 times per hour. It is noteworthy that this rate is higher than the rate at any time since, although the daily rate of events is higher in late May 2015 than in August, in an hour scale events occurred in a much faster rate prior to the eruption.
In light of the temporal distribution of events in Cluster A, we decided to further explore the nature of these events by computing source locations via particle motion ( Figure 6). To do so, we only use 25 events that have the higher signalto-noise ratio in the frequency band from 0.1 to 0.6 Hz. The locations are estimated by crossing polarization directions from BREF and BTAM. The locations of events prior to the eruption are shown in red in Figure 6, while events after the eruption are shown in blue.
K-means clustering on the constructed features vectors takes few seconds to execute, while DTW and Correntropy each takes around 30 minutes in the process described.

Discussion
The time distribution of events in Cluster A and their source locations reveal a possible magma movement to surface prior to Cotopaxi eruption on August 14th, 2015. In terms of the time distribution, these events consistently appear during four mounts prior to the eruption and event appearance accelerates just hours before the onset of the eruption. In the step by step procedure depicted in Figure 1, what we see is a "narrowing" of the families, and while time distributionwise daily frequency might seem enough to monitor volcanic activity, the occurrences from the last family are less sparse compared to the clustering found by only classifying using frequencies, and the signals themselves are more similar. The subfamily appears at crucial moments related to the volcanic activity: just before a "high frequency quiet down" mentioned in [18] and just before the eruption. While a daily rate count is not good to differentiate a possible false positive activity in May-June from this family, the hourly date showed its peak hours before the eruption (with 5 fast successive events from this family within an hour, compared to a maximum rate of 3 during May-June). Since low frequency events are associated with fluid movement inside the volcano, this might indicate a process in which magma and gasses are steadily moving up in the volcano months before the eruption and even magma supply to surface accelerates towards the eruption onset. In terms of the spatial distribution of the events, a notable behavior is observed: events before the eruption were highly clustered compared to events after the eruption. This could indicate changes on source characteristics after the eruption either due to the shallowing of the source, or because the physical conditions inside the volcano edifice have changed. It is worth mentioning that these locations are consistent with VLPs found in another study which showed similar events as back as 2002 [20]. In particular, a total of 71 out of the 308 events had a cross-correlation higher than 0.7 with a particular VLP prior identified and studied previously in 2009 which has been linked to possible gas release processes at Cotopaxi volcano [21]. Also, 38 of these events are coincidental with visual identification of LPs/VLPs performed by personnel from IG and can be disregarded as being part of attenuation processes. Unluckily, due to some of the stations being off during the study, more precise means of localizing the events were not performed. These recurrent events nearly disappeared after the eruption. It is important to study these events in a longer time period as they may be used as precursors of eruptions or they can help us to better understand the base level of activity at Cotopaxi.

Conclusions
Unsupervised machine learning schemes, in a two-staged process, are applied to 2015 Cotopaxi Volcano's reawakening to find possible precursory events. Low frequency signals are first discriminated based on their spectral content and then further separated by their similarities on waveform shape. To  measure waveform similarities two nonlinear approaches are implemented, Dynamic Time Warping and Correntropy, to later form families with hierarchical clustering. Both methods show great coincidence in finding events in Cluster A, which then is explored in terms of time event distribution and source locations. Notably, Correntropy approximately found Cluster A with no need of filtering the signals. Further, events in Cluster A show well-defined differences related to key stages in the volcano's activity. This work is an example of the potency of applying unsupervised learning to datasets of events which are not previously labeled and can give insights to identifying precursors to eruptions. While the procedure is not perfect and a possible false positive could be introduced, by rapidly inspecting the events belonging to the family, one can at least identify signals that are associated to possible volcanic eruption. Further studies are needed since this procedure might need further refinement in the case of seismic swarms and noisy ambient. However, it can still provide some information with little validation time available in observatories with limited previous data and better tuning can be achieved by exploring the parameters further.

Data Availability
The ownership of the data used in this work belongs to Instituto Geofísico-EPN, Ecuador, and JICA. This data is partially public and, to access it, any request should be done to IG-EPN.

Conflicts of Interest
The authors declare that they have no conflicts of interest.