A Novel Flexible Model for the Extraction of Features from Brain Signals in the Time-Frequency Domain

Electrophysiological signals such as the EEG, MEG, or LFPs have been extensively studied over the last decades, and elaborate signal processing algorithms have been developed for their analysis. Many of these methods are based on time-frequency decomposition to account for the signals' spectral properties while maintaining their temporal dynamics. However, the data typically exhibit intra- and interindividual variability. Existing algorithms often do not take into account this variability, for instance by using fixed frequency bands. This shortcoming has inspired us to develop a new robust and flexible method for time-frequency analysis and signal feature extraction using the novel smooth natural Gaussian extension (snaGe) model. The model is nonlinear, and its parameters are interpretable. We propose an algorithm to derive initial parameters based on dynamic programming for nonlinear fitting and describe an iterative refinement scheme to robustly fit high-order models. We further present distance functions to be able to compare different instances of our model. The method's functionality and robustness are demonstrated using simulated as well as real data. The snaGe model is a general tool allowing for a wide range of applications in biomedical data analysis.


Introduction
Electrophysiological brain signals are widely studied to get insights into the inner function of the brain. e electroencephalogram (EEG), as an example, has been analyzed for decades and is particularly popular because of its noninvasiveness, wide availability, relatively small cost and excellent temporal resolution which enables capturing the fast neural dynamics. Because it is known that electrophysiological brain signals exhibit important spectral characteristics, frequency transforms are oen applied. However, since in general brain signals do not possess the statistical property of stationarity, time-frequency transforms are of special interest. Such methods are able to represent a given signal jointly in the timefrequency domain, called its time-frequency representation (TFR). ereby the signal's spectral components can be analyzed in relation to its temporal dynamics. In a wide sense, TFRs can be interpreted as images containing complex pixel intensities in general.
An important feature of biological signals, and particularly brain signals, is their inter-and intraindividual variability. at is, under �xed experimental conditions, the obtained signals exhibit heterogeneousness not only between groups of subjects, but also between subjects within the same experimental group and even within the same subject between multiple experimental trials. Existing signal analysis techniques either do not take this issue into account adequately or usually treat it by de�ning frequency bands of interest rather than single frequencies, and similarly time intervals instead of sharp instants. However, this strategy requires a priori knowledge about the variability to appropriately set the interval widths, and it is imprecise because it blindly includes all information contained in that time/frequency region.
A good example is the time-frequency coherence analysis of two given input signals, for instance, by means of the cross short-time Fourier transform [1], the cross Wigner-Ville distribution [2], or the wavelet coherence [1,3]. All of these methods relate both signals at �xed time/frequency (or scale) locations. us, if signal A exhibits the same neural activation as signal B, but signal A's pattern is shied in frequency just a little, none of the abovementioned techniques will be able 2 International Journal of Biomedical Imaging to �nd the strong similarity of A and B. Although coherence estimation and other rigid strategies have been successfully applied "for more than 30 years" [4], this issue has inspired us to develop a general �exible method of pattern analysis and corresponding feature extraction in electrophysiological TFR data.
By abstracting from the TFR images and working with the representation of a TFR pattern, numerous applications in biomedical signal processing emerge. TFR patterns quantify neural activity and therefore extract useful features for subsequent analyses. e model proposed in this work goes even further by offering interpretable parameters. TFR patterns reduce dimensionality by representing neural activity in a wide spectrotemporal region by comparably few quantities. Pattern-based outlier detection has the potential to become a useful tool for data quality assurance. In ongoing studies we are employing the presented method, for instance, to estimate functional brain connectivity by means of a pattern-based approach.
In the following, we present the developed neuroinspired interpretable model which is able to capture general timefrequency patterns. We use solely EEG data for demonstrations here, but our method is applicable to general electrophysiological signals or even to other signals showing similar behavior. Section 2 is devoted to developing our idea by extending the multivariate Gaussian model. Algorithms for robustly �tting the novel model to time-frequency representations are presented. A strategy for �nding an appropriate model order is given, and distance functions are de�ned which quantify (dis-)similarity of two given models. ese methods are tested in Section 3, where real as well as simulated data are used to demonstrate our technique's functionality and robustness.

Methods
While our technique is not restricted to speci�c timefrequency distributions, we employ the smoothed pseudo Wigner-Ville distribution [5] in this work. is is a quadratic transform estimating signal power in the time-frequency domain, whereby all quantities in this work are real numbers. e transform generates quite smooth TFRs, which means that neighboring pixel values are correlated. Although only positive values can be interpreted as signal power, the Wigner-Ville distribution introduces also negative values in general [6]. ese data properties will be taken into account by our method.
We will refer to time-frequency representations as mappings (TFR) ∶ which estimate signal power for each point in the time-frequency domain.
Of the numerous ways to quantify TFR patterns, we choose to �t a parametric surface to the data. Because of the spatial correlation inherent in the data, traditional regression assumptions about independence of observations do not hold here [7]. is absence of strong gradients in the TFR images also invalidates most image feature extraction techniques, which are oen based on edges and texture [8]. Our model, however, is especially designed for spatially correlated data; furthermore its parameters are interpretable. ese quantities are useful features which embody important information about the underlying signal and thereby considerably reduce data dimensionality. Using our method, feature extraction can be fully automated, and no training data are necessary. Nevertheless, a priori information can be incorporated fairly easily.
In the following, we propose an extension of the wellknown Gaussian model for TFR analysis.

e Gaussian
Model. e Gaussian model for multivariate data is de�ned by with being a constant additive offset, the amplitude relative to the offset, the constantdimensional mean vector, and Σ denoting a symmetric positive de�nite matrix. Positive de�niteness ensures that the argument to the exponential function is always negative; additionally we know that is bounded by zero and one for negative . erefore, the exponential factor scales the �nal amplitude between 0 and relative to the offset . e term ( − ) Σ −1 ( − ) is also known as the squared Mahalanobis distance of with respect to and Σ.
Because in our context this function represents arbitrary data in contrast to statistical distributions, will also be called the position vector, and Σ is the spread matrix, its entries are denoted spread parameters.
Gaussian models are quite robust in various ways. Firstly, the model will be shaped like a peak for all possible parameter values by imposing the constraint that Σ (and thus also Σ −1 ) is symmetric positive de�nite. ereby, the model will never be able to completely "degenerate. " Because the model is not �exible enough to �t small local variations of an expected pattern, the Gaussian model is relatively insensitive to local data outliers and is also unsusceptible to over�tting. An additional aspect of robustness is that extreme peak deformations are directly re�ected in extreme parameter values. ereby degenerated models can be easily detected or may even be prevented by imposing parameter constraints.

2.1.1.
Interpretability. e Gaussian model is well-suited to extract bivariate peaks from brain signals' TFR data, re�ecting short intervals of neural excitement in a speci�c frequency range. An instance of the above described surface, in the bivariate case, is fully identi�ed by its parameter vector e absolute peak height , the peak position ( 1 2 ) , and the peak orientation can be derived from the parameter vector. Further relevant quantities are the temporal peak onset, peak offset, and peak duration (as the difference of the previous two).

2.2.
Extending the Gaussian Model: e snaGe Model. As already mentioned, the Gaussian model's robustness comes at the cost of in�exibility. While some local effects in TFR data International Journal of Biomedical Imaging 3 can be appropriately explained by (1), more general patterns of activation do not follow peak-like shapes, as will be shown later. us a generalization of the Gaussian model would be desirable, especially concerning the ability to represent patterns of activation rather than just independent "events" in the spectrotemporal domain. At the same time, a generalized method should maintain maximum robustness in order not to degenerate easily and to prevent over�tting. e so-called Gaussian mixture modeling (GMM) is a straightforward extension [9], but this method still assumes (multiple-) peakshaped data. In the following, the smooth natural Gaussian extension (snaGe) model is presented as a �exible extension of the multivariate Gaussian model.
Before giving a formal de�nition, we explain the idea in an intuitive way, guided by Figure 1. e -variate Gaussian model can be described by its ( -dimensional peak point ( , … , , and its spread parameters controlling the exponential �attening relative to along the independent variables' dimensions. Now the idea is to not use only one, but peak points ( ∈ ℝ , … , interpolated by a smooth ( -dimensional curve of peaks. A way to think of the surface in Figure 1 modeled by snaGe is to "shape" it by sliding an -variate Gaussian model along the curve, its peak point being connected to the curve and thus varying in height ( ) and position (vector ). ereby complex smoothly "bent" patterns of data with varying amplitude (dependent variable) can be captured. e term snaGe is inspired by these snake-like forms. Analogously to the Gaussian model, an spread matrix determines the model's shape, which is a surface for . e tradeoff between robustness and �exibility can be controlled by choosing the number of peak points . Using many peak points will allow for good �ts to complex patterns but will also increase the danger of over�tting. �hoosing a small yields a robust model, but its ability to capture complex patterns will be limited. By setting , snaGe reduces to the traditional Gaussian model as a special case.
Regarding the standard Gaussian model, each function value ( ( is fully determined by the Mahalanobis distance of the point to the unique mean point . But since the snaGe model offers in�nitely many mean points, the question arises how to calculate the surface values. is issue will be addressed in the next section, where a formal de�nition is given.
������ �or�a� �e�nition� Let the "number of peak points" be denoted by ∈ , . Let , ℝ denote a smooth -dimensional curve of "means, " and let , ℝ be a smooth one-dimensional curve of "amplitudes" along . Let further the "offset" ∈ ℝ, and let Σ ∈ ℝ be a symmetric positive de�nite matrix. �e�ne F 1: Example of an instance of the snaGe model, in the bivariate case. A surface plot is shown as well as its two-dimensional projection (colored contours). three-dimensional points were smoothly interpolated to yield a "curve of peaks" (curve connecting the circles). is curve's 2d projection is ( , the black line. A surface is determined by the spread parameters (dashed 2d ellipses), controlling the shape of the exponential �attening to both sides of the curve.
Note that ( is a family of traditional Gaussian models, parameterized by the curve parameter . In order to construct a function which is independent of , we de�ne * ( arg max ∈ , e snaGe model is then given by e function * ( de�nes that traditional Gaussian model which assigns to the largest absolute amplitude relative to the offset among all members of the Gaussian family ( . is is necessary to cope with positive as well as negative ( . In TFR analysis, ( can be restricted to only positive values (see Section 2.3.1), in which case (5) in fact simpli�es to For ( as well as ( the max( function is necessary in case ( is a "near self-intersecting" curve, in the sense that ‖ ( − ( ‖ is small, but | ( − ( | is large. Figure 2 illustrates such an exemplary scenario. We assume that the diagonal entries of the spread matrix Σ, that is, the spread along each dimension, are sufficient to control a TFR pattern's "width. " erefore, we �x offdiagonal entries to zero for the sake of robustness. ereby, the Mahalanobis distance in (3) reduces to the weighted Euclidean distance.

e two curves
and are yet to be de�ned in terms of discrete parameter values. For good parameter interpretability, we choose to form both curves by interpolating points ∈ ℝ , by B-splines of degree ≤3, which yields the curve: By combining cubic B-splines with the Gaussian shape, we obtain a sufficiently smooth model which inherits both the splines' �exibility and the Gaussian standard model's robustness. Our model further inherits the B-splines' local control property; that is, varying a will affect the model only in the 's vicinity. Additionally, the degree of �exibility can be adapted to the data at hand by varying from 1 (single Gaussian peak) to arbitrary �exibility with . An instance of the -variate snaGe model with points (or of order ) is fully represented by the parameter vector of length : , , 22 , , , While the offset and the spread parameters are mainly responsible for the prediction of data values, the curve interpolating the is directly interpretable as it models the main path of peaks in the data.
In the next section we show how to �t the model to data in a robust manner.

Fitting the Model to TFR Data. Given a time-frequency representation TFR , ∈
, we aim to �nd a parameter vector so that the respective model �ts the data "best, " in the sense that it minimizes a cost function. We use the sum of squared differences of the data and the modeled surface: e parameters are implicitly represented by in the above formula. is quantity is also called the "sum of squares due to error" which is zero if the model perfectly �ts the data. SSE is a nonlinear function dependent on . Using squared differences pronounces outliers, but these are not expected to occur frequently in our smooth TFR data. In order to �nd a locally minimum solution of SSE, a nonlinear least squares algorithm implemented in the F 2: Depiction of the maximum rule. e curve of means is sharply bent. e surface value for a point * , * 2 (coordinates depicted by red lines) is chosen among all possible traditional Gaussians (black graphs visualize 1D cuts of two of these) by the maximum rule (red circles), see (6).
MATLAB Optimization Toolbox, lsqnonlin [10], is employed. Given an initial parameter vector and the cost function SSE, this optimizer produces a sequence of models . e iteration hopefully converges to a * with minimum cost, that is, best resemblance between model and data.
We attempt to provide advantageous starting conditions for the optimizer by preprocessing the TFR and by obtaining an initial parameter vector 0 which is expected to be close to the optimum with respect to SSE. Moreover, an iterative re�nement scheme is proposed to be able to robustly �t models of high order. e process of �tting is outlined by 2.3.1. TFR Preprocessing. TFR data are badly scaled, showing differences of several orders of magnitude in values of time, frequency, and signal power, which affects optimization performance [11]. To address this problem, lsqnonlin offers a way to take into account typical values for each dimension for gradient estimation. Also concerning this issue, any Euclidean distance operating on TFR data in our algorithms is weighted appropriately. Furthermore, smooth objective functions are desirable so that the low-order Taylor approximations used during optimization resemble the cost function in a relatively large neighborhood around the current point. To this end, the TFR images are smoothed and subsampled, which has the additional bene�t of faster cost function evaluations. Finally, since negative values in the time-frequency domain are not interpretable, they are usually set to zero. F 3: Illustration of an optimal horizontal image path (red line) found by a dynamic programming algorithm. e sum of signal power values along the red line is higher or equal to that of any other horizontal path. e trajectory is smoothed (black) for the robust extraction of the initial peak points' time-frequency coordinates. e background TFR image is computed from simulated data, whose three consecutive peaks of activity are correctly connected by the path. of the cost function's (unknown) global minimum, that is, a vector whose cost is already low. To this end, the constant offset and the spread parameters are initialized to , 11 ( max − min )/5 and 22 ( max − min )/5 if the underlying TFR's time and frequency axes are bounded by min , max and min , max , respectively. One may choose any sensible values alike, but the spreads should not be initialized too small in order to obtain a generalizable model. Yet, more thought has to be put in choosing the number and coordinates of the peak points ( ) . In the following we propose a way to compute initial estimates of the time, and frequency coordinates of all ( ) directly from the data.

Refine
A reasonable approximation to the unknown optimal curve of peaks can be found by tracing a path through the TFR image (TFR img) from le to right, which runs through areas of high pixel intensity. More precisely, the sum of all pixel intensities along this path should be as high as possible. is is an optimization problem in turn, yet its solution can be computed in quadratic time complexity (provided that the path's slope is bounded) by a dynamic programming algorithm, see [12]. Because a global optimum is guaranteed to be found, this strategy is insensitive to local outliers and noise. To this end, a similar approach as described in [13] is employed. Following the notation therein, we de�ne our energy function to be equal to the TFR values themselves, that is, ( ) . A horizontal path * (called seam in [13]) which maximizes (this is in contrast to [13], where minimum energy seams are computed) this simple cost function is found by dynamic programming. See Figure 3 for an example. Additionally, the paths are constrained by imposing an upper bound on their slopes. is value depends on the time-frequency resolution here and once more represents a compromise between robustness and �exibility.
Given the found path * and the desired model order, evenly spaced samples are subsequently drawn from a smooth approximating curve to obtain estimates of the �rst two coordinates of the ( ) , 1 . We choose to empirically set the ( ) s' last components, interpretable as amplitudes relative to the initial constant offset , to max( (TFR) ) − max( (TFR) ). Once a parametric representation of the data is available, its accuracy can be improved in a step-wise manner, as is presented in the following section.
������ ��e�a�i�e �e�nemen�� As already stated, the number of points ( ) controls the model's robustness which complements its ability to resemble complex patterns. erefore, the demand for a near-optimal initial parameter vector increases with the model order . Employing the optimal image path method described in the previous section yields a "reasonable" estimation, but sampling equidistant points ( ) , 1 from the resulting curve is a simpli�cation. In fact, it can be observed that the optimizer tends to concentrate the ( ) in time-frequency regions of high signal variability. For low model order , this shortcoming of our initial parameter estimation algorithm can be compensated easily by the optimization algorithm, but it may become a problem for increasingly �exible models. For this reason we propose an iterative scheme.
(1) Find initial parameters by means of an optimal path (see Section 2.3.2) for a �rst, robust model of low order . Let .
(2) Fit the model to the data to obtain optimal parameters * .
(3) Construct the optimal curve of peaks ( ) by interpolation of the peak points (see Section 2.2.1).
(4) Obtain the 1 1 peak points for a re�ned model by uniformly sampling (with respect to the spline's sites ) the curve computed in the previous step. e curve found in steps 2 and 3 will exhibit smaller gradient magnitude, that is, traversal speed, in areas of high signal variability than in other regions. We aim at maintaining the curve-de�ning points' optimum distribution found by the �tting algorithm and at enhancing the model's �exibility mainly in these areas. It turns out that simply by uniformly sampling the �tted curve (step 4) we obtain a new interpolated curve which retains these properties.
An application of this algorithm is demonstrated in Section 3.1, where the resulting sequence of nested models is evaluated.

Model Distance.
In this section we propose two functions for calculating the distance between two models ( 1 , ( 2 regarding (dis-)similarity of shape. Distance measures are necessary, for instance, to quantify how well the data exhibit an expected pattern. We will also employ these functions to assess our model's robustness.
Distance functions which are based solely on the curve of peaks ( were found to be quite effective. Other possibilities include parameter vector distances and pixelwise differences of signal power of the models' generated data. By comparison, curve-based distance functions have the advantage of being able to interrelate models of different orders. Additionally, they are not in�uenced by the less informative parameters (offset and the entries of Σ).
A popular distance measure for parametric curves is the Fréchet distance [14]. In the continuous case, the Fréchet distance of two parametric curves 1 ( and 2 ( is de�ned by _ max Here, ( and ( are monotone reparameterizations of the two curves, and ( denotes (weighted) Euclidean distance. In words, we search for those reparameterizations which make the curves the most similar with respect to maximum point-wise Euclidean distance along the curves. is maximum for these reparameterizations is returned as the two curves' continuous Fréchet distance. In practice, the discrete Fréchet distance is frequently applied, whose computation is based on dynamic programming once more [15]. In the discrete case, an additional distance function _sum ( 1 , 2 can be obtained by replacing the max function with a sum over . at way, _sum represents an average distance, being less prone to outliers in the curves.

Real Data.
We demonstrate the work�ow to determine the appropriate model order by �tting a TFR of real EEG data in this section. Typically we determine the necessary model complexity by �tting data with good signal to noise ratio (SNR) in order to prevent the overestimation of . For example, one possibility to achieve sufficient data quality is to average several TFRs which are expected to show similar patterns. e averaged TFR of real EEG data shown in Figure  5 will guide the following explanations. e depicted brain signals located in the lower frequency bands were recorded from the temporal brain region during a face recognition experiment. ese data exhibit a pattern of activity which is too complex to be captured by a traditional Gaussian model. F 5: TFR of real EEG data which is �tted by iteratively re�ned models. Multiple local peaks are visible which are connected by a path of increased activity. is forms a complex pattern.
e iterative scheme described in Section 2.3.3 is employed to �t models of increasing �exibility to the highquality data. An optimal path (see Section 2.3.2) estimates the initial parameters for the �rst, least �exible model of order 0 . A minimum value of 0 = 3 is necessary to model bent patterns. Since the appropriate is still unknown, a sufficiently large number max = 7 is chosen for the re�nement. At each re�nement stage the respective model is evaluated, and in the end the most suitable * ∈ 0 , … , max , 0 ≤ * ≤ max (11) is chosen as the model order for future �ttings on lowerquality data. Model evaluation is realized by three measures. ese are the cost function value (SSE, see Section 2.3), the coefficient of determination 2 and its adjusted version 2 adj [16]. Although the use of quantities based on the coefficient of determination is discouraged for nonlinear models [17], they are applied here nonetheless for two reasons. ey are found to perform well for our purposes, and the proposed alternatives (AIC [18] and BIC [19]) are not easily applicable here. is is because the assumption of normally distributed residuals oen does not hold, which is supported by a highly signi�cant Shapiro-Wilk test [20] at = yielding 10 −3 for this experiment. Figures 6, 7, and 8 illustrate the results. e results show that for the data at hand a model of order * = 4 is sufficient to capture the variability.
Having determined the maximum model complexity on high-quality data, such a model can now be �tted to the rest of the data. If the TFRs are not expected to vary substantially, like when �tting a model to signals from several nearby sensors, a previous �t may serve as the initial model. However, if, for instance, multiple data segments of the same sensor should be �tted, the TFRs' patterns may vary strongly. In this case, initial parameters should be chosen depending on the data by using the method of optimal paths described in Section 2.3.2. Since in this example * = 4 is a quite moderate number, the iterative re�nement may also be skipped. However, in general we would start with 0 = 3 or 0 = 4 and re�ne up to the determined * , as proposed in Section 2.3.3.

Synthetic Data.
We want to assess our model's robustness by simulating data and measuring how strongly the model is affected by additive Gaussian noise. To this end, arti�cial data are created in the time domain, and their TFRs are computed to which our model will be �tted.

Description of the Simulated Data.
We created a signal consisting of three consecutive oscillations, representing an alpha-theta-alpha EEG pattern at 10 Hz/4 Hz/10 Hz respectively over a time span of 2.5 seconds. e simulated sample rate is 250 Hz. A plot is shown in Figure 9. ese data are quite challenging for our model because three distinct peaks emerge in the TFR which could be more appropriately modeled by a mixture of independent Gaussian peaks. However, we want to demonstrate the �exibility of the snaGe model which should also be able to cope with patterns of this form.
For the following experiments we chose to start the �tting with a model of order to account for the pattern�s complexity and perform one re�nement step. Initial parameters are estimated by �nding optimal paths, which means that no a-priori information about the known optimal model is passed to the �tting procedure other than the number of ( to use. We de�ne the optimal model by �tting the noise-free simulation in the same way. e distance measures from Section 2.4 are used to determine how well the simulated pattern is found.

Noise Experiment.
In this experiment we added Gaussian noise, which is appropriately �ltered with respect to the sampling frequency, to the simulated data in the time domain. Signals exhibiting signal to noise ratios of −15 dB up to +10 dB were generated in steps of 2.5 dB. At each SNR, ten distinct noise realizations are created to obtain representative results. is independent noise in the time domain will produce correlated noise in the time-frequency domain due   to smoothing. erefore, the pattern shown in Figure 9 will be distorted. is experiment serves to assess how strongly our algorithm is affected by pattern variability, respectively, to investigate its robustness. Small pattern distortions should ideally only slightly alter the optimal model, re�ecting its robustness and avoidance of over�tting. We further want to �nd out to what degree our model is able to �nd the simulated pattern at all. We note here that adding noise increases the TFRs' maximum amplitudes exponentially which strongly affects the comparability of different models. Without normalization, one would observe an exponentially decreasing distance for increasing signal to noise ratio. But this would merely re�ect the decreasing data amplitudes and contain no information about the quality of �t. �owever, normalizing maximum data values are not an appropriate option either, because, for negative SNRs, this would keep the noise constant while exponentially shrinking the pattern's pixel intensities. Even if the optimal model was perfectly recovered from the noisy simulation, high distances would arise. Only if signal power is excluded from model distance estimation, the returned values are useful representatives of how well the pattern was found. e snaGe's robustness to noise with respect to the pattern's power is therefore not regarded here. is is done by setting the third dimension of the path of peaks to zero during Fréchet distance computation. Figure 10 visualizes the results. Both curves of mean distance consistently decrease with improving data quality. Convergence to the optimal model seems to require high signal to noise ratios. At 7.5 dB, the distance measures' variances fall off, re�ecting the point of reliable pattern extraction. Apparently, the noise and interferences introduced in this experiment considerably impair the �tting process. In Figure  11, this issue is exemplarily investigated. At the positive SNR of 5 dB, where distance variances across the noise realizations are still high, the �t which exhibits the largest distance is plotted. e pattern was in fact found, but only in a different way than was expected. is leads to high Fréchet distances. Nevertheless, this example shows that the impact different kinds of noise may have on the �tting process.
To get a better feel for the average ability to �t the pattern under the in�uence of noise, see Figure 12. At each noise level, the ten �tted models are averaged by computing the mean parameter vector. Shown is a sequence of mean models which progressively look more similar to the true pattern. In fact, the average �tting capability concerning both the positioning of the peak points in the time-frequency domain and the estimation of surface values is better than expected aer having studied Figure 10. Apparently, although the mean distances are still decreasing at negative signal to noise ratios, they are already small enough for successful pattern extraction on average. An example is the subplot corresponding to SNR = 0 dB in Figure 12, which already clearly resembles the simulated pattern.is experiment shows that interferences between the desired signal and additive noise affect the �tting process quite strongly in the worst case. Positive signal to noise ratios of at least 7.5 dB are found to be necessary for reliable pattern extraction in this investigation. However, successful data modeling is also possible at lower SNRs, as is seen in the average case.

Discussion
e snaGe model is especially suited for time-frequency representations of electrophysiological signals because of their (expected) nonnegativity, smoothness and their patterns following a path of peaks. However, our robust model is able to cope with data which do not exactly meet these requirements.
In order to retain robustness, we imposed several restrictions on our model, like neglecting offdiagonal spread parameters and holding the spread matrix constant over the curve of peaks. An interesting question remains how the model's �exibility and robustness would be affected if these constraints were dropped. Little effort would be necessary to include the stated extensions.
As is typical for nonlinear optimization problems, the choice of initial parameters is crucial to obtain satisfying results. erefore, a-priori knowledge about the optimal model can be incorporated by starting the optimization with a model which was previously �tted to similar data. Moreover, an algorithm based on an optimal path is developed to estimate initial model parameters directly from the data. Its robustness stems from the guarantee to �nd the globally optimal path. However, this method is limited to positive peak polarity by trying to maximize the path's average amplitude. Additionally, the technique will not be able to �nd initial models which exhibit multiple contemporary components. In such a case, the nonlinear optimization algorithm, which was found to work well, must compensate. Further strategies for the estimation of initial parameters would be desirable. In particular, the extraction of the curve interpolation points ( from the optimal path possesses potential for improvement. An open question is how we should deal with the spatial correlation of both the dependent variable and the residuals in a statistical inferential context. Further work is necessary to facilitate statistical testing, for instance, to assess the null F 11: Fitted model (white curve) at SNR = 5 dB, which has the largest distance _sum to the target pattern across all ten noise instances. e noisy TFR is shown in the background. e signal was successfully extracted, yet a high-distance results due to a different connection of the three peaks compared to the pattern.
hypothesis that an expected pattern is not contained in the data.
Concerning the presented measures of model distance, the Fréchet distances were found to be very useful to assess model similarity in our experiments involving simulated noise. eir distinct advantage is their independence of model order and the disregard of the less interpretable parameters. On the other hand, spurious high distances could be observed when in fact the pattern was found. is can be attributed to the fact that the simulated data exhibit three independent peaks, which is a violation of the snaGe's assumption of a connected path of peaks. erefore, a combination of Fréchet values and a pixel-wise distance function based on the models' generated data seems advantageous.
When applied to noisy time signals, the snaGe model adapts too well to the corresponding smooth time-frequency representations. Because the optimized cost function does F 12: Mean snaGe models per simulated SNR. Models were averaged across the ten noise realizations by direct parameter vector averaging. e curve of peaks (white line) is shown as well as the models' predicted data ( ( . Compare with Figure 9.
not take into account information about the expected pattern, the model simply tries to capture the TFR data as accurately as possible. Data preprocessing and TFR interference suppression are therefore extremely important. Adding penalty terms to the cost function and/or providing explicit initial parameters are ways to point the optimizer in the right direction. However, even without specifying a-priori knowledge, the model was able to �nd the simulated pattern for low signal to noise ratios in the average case.

Conclusion
e analysis of time-frequency representations of electrophysiological signals calls for �exible methods accounting for inter-and intraindividual data variability. We present the �exible, robust, and interpretable model snaGe, which extends the established Gaussian model. Its ability to extract 3D features from time-frequency representations of electrophysiological data is demonstrated. However, the model applies to general multivariate data which exhibit similar behavior.
In this work, several techniques to improve the model �tting performance are described. We show how to estimate start parameters directly from the data. An iterative scheme to re�ne optimized models is proposed so that high-order models can be robustly �tted.
Experiments with real as well as simulated data demonstrate the snaGe model's robustness and �exibility. �nder the in�uence of severe noise, the developed technique is best suited for patterns which are too complex to be appropriately captured by a Gaussian model, but still simple enough to facilitate robust �ts.
To summarize, due to its robustness and �exibility the snaGe model possesses the potential to become a bene�cial tool for practical EEG/MEG analysis, including functional brain connectivity analysis, outlier detection, time-frequency denoising, and feature extraction.