An Efficient Method for Detection of Outliers in Tracer Curves Derived from Dynamic Contrast-Enhanced Imaging

Presence of outliers in tracer concentration-time curves derived from dynamic contrast-enhanced imaging can adversely affect the analysis of the tracer curves by model-fitting. A computationally efficient method for detecting outliers in tracer concentrationtime curves is presented in this study. The proposed method is based on a piecewise linear model and implemented using a robust clustering algorithm. The method is noniterative and all the parameters are automatically estimated. To compare the proposed method with existing Gaussian model based and robust regression-based methods, simulation studies were performed by simulating tracer concentration-time curves using the generalized Tofts model and kinetic parameters derived from different tissue types. Results show that the proposedmethod and the robust regression-basedmethod achieve better detection performance than the Gaussian model based method. Compared with the robust regression-based method, the proposed method can achieve similar detection performance with much faster computation speed.


Introduction
Dynamic contrast-enhanced imaging using computed tomography (DCE-CT) or magnetic resonance imaging (DCE-MRI) is a functional imaging method that can be used for in vivo assessment of tumor perfusion and allows for quantitative estimation of physiological parameters pertaining to tumor microcirculation [1,2].Analysis of DCE imaging data can be performed by fitting a tracer kinetic model to the tissue tracer concentration-time curves estimated from the DCE images.
It is well-recognized that noise in the DCE imaging dataset poses a major problem for model-fitting.This is often further compounded by movement artefacts as the dynamic images are acquired over a period of time.Sudden body movements during imaging can result in distinct spikes in the sampled tracer concentration-time curve, on top of the smaller background fluctuations due to image noise.Such spikes in the data points are called outliers (Figure 1) and they can affect the outcome of model-fitting dramatically.Outliers could also appear in the tracer concentration curves as artefacts due to prior data processing steps, for example, image registration error.In DCE-MRI, tracer concentration is estimated by additional calculations performed on the MR signal and outliers could appear due to certain computational operations, such as division by small numbers contaminated by noise.Outliers are commonly encountered in tracer concentration-time curves sampled from DCE imaging and screening of concentration curves for outliers before modelfitting should be recommended.
Visually, an outlier appears to be "out-of-place" and does not conform to the mass of the data.If one has prior knowledge of the data, then a model can be used to describe the majority of data and outlier detection can be carried out by the approach of hypothesis testing of the suspicious data point against the model.For example, a frequently used model for hypothesis testing of outliers is the Gaussian model [3,4].Let {(  ) |  = 1, . . ., } denote a set of data at Mathematical Problems in Engineering timepoints   .The Gaussian model assumes that the data is sampled from a Gaussian distribution, (,  2 ), with mean  and standard deviation : For a random variable following the Gaussian distribution, the probability of sampling a value outside the interval  ±   would be 2(1 − ), where   = √ 2erf −1 (2 − 1), and  ∈ (0, 1) is the -quantile function of the standard normal distribution.Hence, hypothesis testing for outlier detection can be constructed simply as a thresholding step: and (  ) would be considered an outlier if (2) holds true.
In practice, the Gaussian model would be oversimplified as the observed data could be derived from a physical process and the underlying dynamics could be more complex than being constant over time (approximated by mean ).A natural extension is the truncated power series: where { 0 , . . .,   } are the model parameters and  is the error term which follows the Gaussian distribution (0, To estimate the model parameters  = ( 0 , . . .,   ), one can proceed to minimize min Setting the derivatives to zero and solving the resulting linear equations would yield the well-known ordinary least squares (OLS) solution.However, the OLS estimator would be biased if the data is heavily tailed or contains outliers.An unbiased solution can be derived through the inclusion of weights: where (  ) is a weight function of the residue which is designed to be small for large residues.By this design, outliers would have less influence in determining the solution.This technique has been referred to as the weighted least squares (WLS) method, and, in robust statistics, it is more commonly called robust regression [5][6][7][8].Previous studies have been performed on the design of the weight function and an example is the Welsch function: where  is a scaling parameter stipulated by the user.
Computation is a major issue in practical implementation of robust regression because of the need to iteratively alternate between the steps of least squares estimation and weight calculation, until convergence of solution.In a DCE imaging dataset, multiple concentration-time curves have to be processed in order to generate parametric maps of physiological parameters.A more efficient outlier detection method is warranted.
In this study, a new method for outlier detection is presented, which assumes that the underlying data dynamics can be approximated using a piecewise linear model.A robust clustering algorithm is designed to efficiently construct the piecewise linear model, which is subjected to hypothesis testing to determine the presence of outliers.As the piecewise linear model is derived in a nonrepetitive manner, the proposed method becomes much more efficacious than conventional robust regression-based methods.

Methods
2.1.Piecewise Linear Model.Graphically, a piecewise linear function in one dimension consists of line segments and a simple piecewise linear function (Figure 2) is given as follows: where ℓ  () =  0, +  1, ,  = 1, 2 is a line segment with slope  1, and intercept  0, .The function could be either continuous or discontinuous at the point  0 (Figure 2).In this study, as the detection process runs in the manner similar to a moving window filter which analyzes the local neighborhood of each data point, a piecewise linear model would be a reasonable (sufficient) approximation of the underlying dynamics in a local neighborhood of the studied point, and hence the piecewise linear model forms the basic assumption of the observed data: where  ∼ (0,  2 ) and () models the outlier.It should be pointed out that ( 9) is just one way to model the non-Gaussian noise arising from practice for which the true generation mechanism as well as the true distribution of the noise is unknown.There exist other approaches to non-Gaussian noise modeling, for example, the mixture Gaussian model [9] and the Laplacian distribution model [10].

Numerical Implementation for Outlier Detection: A Clustering Approach.
If one was to follow the conventional approach for outlier detection based on regression analysis, one would proceed to fit the data with the piecewise linear model and perform hypothesis testing by comparing the discrepancy between observed data and predicted value by the fitted model.In the following a clustering approach is presented so that regression and outlier detection is carried out in one sweep.
The basic idea of data analysis using piecewise linear models is that the data can be classified into two or more groups or clusters, each consisting of elements which share similar features.For example, if the deviation of a data point from a particular cluster centroid is smaller than a threshold, the data point can be assigned to that cluster.For outlier detection, the most interesting part would be to find the cluster to which the suspicious data likely belongs, after which a simple linear fitting would suffice for the desired hypothesis testing.Bearing this in mind, the detection process would consist of two stages: clustering and testing.

Clustering.
As the local neighborhood of each data point is assumed to follow the piecewise linear model which is composed of two line segments, the local neighborhood could be classified into two groups or two clusters.To account for the presence of possible outliers, an additional cluster is allocated, leading to three possible clusters in total.The elements in each cluster are arranged in ascending order of time.
To estimate signal deviation within cluster, the average value of the absolute first-order difference of the cluster elements is computed and denoted as   ,  = 1, 2, 3.If in cluster C  , the th element is (  () ,   () ), then the absolute first-order difference between the th and the (+1)th element is defined as Each cluster, C  , consists of the following three records: A key consideration in the clustering process is to determine whether a new data (  ) complies with the existing elements of a cluster.If (  ) is assigned to a particular cluster C  , the incremental absolute deviation of the cluster would be It is utilized for calculating a distance metric from the original cluster deviation   : If the distance is less than a predetermined threshold  nbs , then (  ) is assigned to C  and the records of the cluster are updated accordingly.In the situation that ( 14) holds true for multiple clusters, the decision would be to choose the cluster which contains the element with the nearest temporal distance (i.e., minimum   −   (  ) ) from (  ).An example is given in Figure 3 to illustrate the clustering process.The data of interest is marked with a cross and eight neighboring points (i.e., half-window size is four) are selected to assist in the inference of whether the data of interest is an  outlier.The clustering starts with assigning the first data from the left to the first cluster as shown in Figure 3(a).After that, the clustering proceeds from left to right and the next three points are found to be close to the first cluster; thus they are classified into the same cluster, as shown in Figure 3(b).When the fifth point (i.e., the point of interest) is encountered, it is found that its distance to the first cluster is too large; therefore, a new cluster is generated to accommodate this point, as shown in Figure 3(c).Similarly, the 4 points to the right of the point of interest are found to be close to the first cluster.Consequently, there are two clusters in this example and the final clustering result is shown in Figure 3(d).
The pseudocode of the clustering algorithm is described as shown in Pseudocode 1.

Hypothesis
Testing for Outlier Detection.After data clustering, the next step is to select the right cluster for data fitting.Usually the one with the largest number of elements would be selected.In the situation of two clusters with equal number of elements, then the cluster containing the element with the nearest temporal distance would be chosen.After data fitting, a value ŷ(  ) can be predicted and compared with the observed data.If the distance is larger than a threshold  err ,      (  ) − ŷ (  )     >  err , then an outlier would be detected.

Threshold Parameters.
There are two threshold parameters,  nbs and  err , involved in the algorithm.These threshold parameters can be determined from the statistics of the dataset, as follows. nbs is used for assessing cluster compliance by comparing the incremental deviation due to each additional data point with the overall cluster deviation.Note that the underlying signal is usually not constant over time, adding to signal intrinsic deviation,  signal , on top of deviation As for  noise , it is related to the standard deviation of noise .In a similar fashion to [11], the estimation of  is based on the median of absolute deviation (MAD) of {  (  )}: where the value of 1.4826 ensures an unbiased estimate when the noise follows a Gaussian distribution, and division by √ 2 transforms the estimation of standard deviation of noise in {  (  )} to the standard deviation of noise in {(  )}.For convenience, let σ denote the standard deviation of noise in {  (  )}, which relates to σ by σ = √ 2σ.
Then, an estimate of  nbs goes as follows: where  nbs plays the same role as   , as mentioned earlier in the hypothesis testing approach. nbs is set to 1.5 in this study, which implies the assumption that about 93.3% of the noisy data in {  (  )} could be inliers.
As  err is used for characterizing (  ) − ŷ(  ), which pertains to the noise in {(  )}, the estimation of  err is straightforward: where  err is set to 2.5 √ 2.
As pointed out in [12,13], statistical methods tend to be oversensitive to noise in signal hypothesis testing, particularly when the sample size is limited.In such cases, a simple solution is to add a small positive number  to the threshold parameter.Then ( 19) and (20) are modified as follows: where the cut-off noise scale σ  is set to 0.5 and  is a small percentage of the average signal intensity, that is, 5%.

Monte Carlo Simulation
Computer simulation experiments were performed to evaluate the performance of the proposed outlier detection method under various noise conditions.Similar to [14], the arterial input function   () was modeled as a sum of gamma-variate functions: where refers to the initial peak of the arterial input function.Γ is a gamma-variate function normalized (i.e., scaled) to ensure equal maximal value with the initial peak Γ, such that the second term in (22) describes the first recirculation peak reaching a height about 20% of the initial peak.Following [14], values of the various parameters in the gamma-variate functions are listed in Table 1 and the sampling interval of the arterial input function was set as 1.6 s.Tissue concentration-time curves  tiss () were generated using the generalized Tofts model [16]: where ⊗ denotes convolution and () is the Dirac delta function.Values of kinetic parameters ( trans , V  , V  ) used for simulating these tracer curves were shown in Table 2.
Gaussian noise (0,  2  ) was added to the  tiss () curves with   varying from 0.3 to 2.1 (in steps of 0.3) to simulate various noise conditions.For each Gaussian noise level   , 1000 simulation runs were performed.In each simulation run, the presence of outliers was simulated by adding saltand-pepper noise [17] at randomly selected time points in  tiss () where the concentration values were increased or reduced (randomly) by 80% of the maximum concentration value in  tiss ().The percentage of the number of outliers in each simulated  tiss () curve was kept constant at 10%.
Since the locations of the outliers are known in each simulation, one can study performance of a outlier detector by computing quantitative measures using the number of true positives #(TP), true negatives (TN), false positives (FP), and false negatives (FN) as follows [18].
(1) Sensitivity is also called true positive rate: (2) Specificity is also called true negative rate: To compare the performance of the proposed outlier detection method with existing methods, two wellestablished outlier detectors were also implemented in the Monte Carlo simulations.These are, namely, the Hampel and Welsch methods [4,11], which are, respectively, based on the Gaussian model and the robust regression approach mentioned before.The Hampel method replaces the parameters of mean and standard deviation in (2) with median and MAD, respectively, to make the inference more resistant against outliers.The Welsch method detects outliers using the thresholding condition (4) after a robust regression model is derived.
All computations were performed by using Matlab6 (The MathWorks, Natick, MA, USA) and running on a desktop personal computer with 3 GHz CPU and 8 GB memory.

Results and Discussion
4.1.Simulation Results. Figure 4(a) shows a simulated noisy tissue concentration-time  tiss () curve where outliers are marked with red squares, which represent the ground truth.The dashed line in Figure 4(a) shows the simulated  tiss () curve prior to the addition of noise.Figures 4(b)-4(d) present the outliers detected by the Hampel method, Welsch method, and the proposed piecewise linear method, respectively.Compared with the ground truth in Figure 4(a), all methods could successfully detect outliers located in the relatively flat regions of the  tiss () curve.However, among the six outliers in upslope region of  tiss (), the Hampel Gaussian-based method could not detect any of them, while the Welsch robust regression model succeeded in detecting five outliers with three false positives, and the proposed method also detected five of the outliers but with one false positive.
Figures 5-7 summarize the performance (i.e., sensitivity and specificity) of the Hampel, proposed, and Welsch methods for different tissue types.As illustrated in Figures 5-7, the proposed method generally achieved higher sensitivities than the Hampel method (by about 6∼11%) with similar levels of specificity.The performance of the proposed method (in terms of sensitivity and specificity) is close to the performance of the Welsch method.
The average processing time of different methods is given in Table 3.We can find that the Hampel and Welsch methods  require about 5 and 130 ms, respectively, to process one  tiss () curve, while the proposed method takes about 10 ms.The results show that the computation speed of the Hampel method and the proposed method is much faster than the Welsch method.

Parameter Sensitivity.
For the proposed method, threshold parameters were fixed during simulation experiments.Here, we evaluate the sensitivity of different parameters.The tracer concentration-time curves utilized in the evaluation are simulated using kinetic parameters typical of acute multiple sclerosis.Figure 8 plots the performance index (sensitivity on the left -axis and specificity on the right -axis) with respect to threshold  err under noise conditions   = 0.3 and   = 2.1.We can find that both sensitivity and specificity vary in a tight range, between 0.996 and 1 for sensitivity and between 0.992 and close to 1 for specificity, respectively, when   = 0.3 (as shown in Figure 8(a)), which provides much freedom in determining the threshold.When   = 2.1, as shown in Figure 8(b), the sensitivity drops below 0.9 after the threshold is larger than 20, and the specificity is nearly monotonically increasing from 0.99 to 1, which suggests that a threshold in the wide range between 1.5 and 10 times of   would be reasonable.Figure 9 shows the parameter sensitivity with respect to  nbs , which relates to the clustering process in the proposed method.When   = 0.3, either the sensitivity or the specificity is greater than 0.99 for  nbs ∈ [0.5, 3].In particular, both performance indexes are greater than 0.995 when  nbs > 1.5.When   = 2.1, the specificity is greater than 0.994 for  nbs ∈ [0.5, 3], and the sensitivity drops below 0.95 for  nbs > 2.9.
The effect of half-window size on the performance of the proposed method is shown in Figure 10 for the same simulation condition in Figure 9, which suggests that the moving window size (2 times half-window size plus 1) 9 or 11 would be suitable.objective statistical threshold metrics, allowing automation of the method under different noise conditions without user intervention.Monte Carlo simulation experiments of various tissue types show that the proposed method compares favorably with two well-established techniques in terms of computation speed and performance (sensitivity and specificity).Specifically, the proposed method achieves detection accuracy similar to the robust regression-based method at the computation speed close to the Gaussian model based method.

Concluding Remarks
Although the method has been designed to detect outliers in tracer curves from DCE imaging, it is expected to be also applicable in other scenarios where two main assumptions  hold: (1) underlying data can be approximated by a piecewise linear model and (2) outliers are present with noise that follows the Gaussian distribution and is independent and identically distributed.

Figure 1 :
Figure 1: Example of an outlier in a tissue tracer concentration-time curve sampled from DCE-CT images (HU: Hounsfield unit).

Figure 3 :
Figure 3: An example to illustrate the clustering process, where "x" stands for the data point of interest with the local neighborhood consisting of 4 points to the left and right of "x" and circles and squares represent the two clusters: (a) initialization of the first cluster, (b) next three datasets classified into the first cluster, (c) the fifth data initialized to the second cluster, and (d) the rest classified into the first cluster.

Figure 4 :
Figure 4: Example of a simulation run with outlier detection by different methods.

Figure 8 :
Figure 8: Parameter sensitivity with respect to threshold  err for the proposed method under noise condition {  = 0.3 or 2.1}, where the left vertical axis represents the sensitivity and the right vertical axis represents the specificity.

Figure 9 :
Figure 9: Parameter sensitivity relating to the clustering threshold for the proposed method under noise condition {  = 0.3 or 2.1}, where the left vertical axis represents the sensitivity and the right vertical axis represents the specificity. 2 ).Although the model in (3) describes a polynomial of power , it is a linear model with respect to the model parameters { 0 , . . .,   }.If the model parameters could be estimated from the observed data, then this model will predict a value ŷ( ) at time   , and the residue   = (  ) − ŷ(  ) follows (0,  2 ).Thus, hypothesis testing for outlier detection can similarly be constructed as           (  ) − ŷ (  )           >   .
Initialization: three empty clusters with deviation 0; Loop over all data  from 1 to  if C 1 is empty, then assign to C 1 elseif C 2 is empty if data  complies with C 1 , then assign to C 1 else assign to C 2 elseif C 3 is empty if data  complies with C 1 or C 2 , then assign accordingly to C 1 or C 2 else assign to C 3 else assign data  accordingly among C 1 , C 2 or C 3  noise .Let   (  ) denote the first-order difference of the input signal.Then  signal is estimated by  signal =      median {  (  )}      .

Table 2 :
Kinetic parameters utilized in the simulation study.

Table 3 :
Processing time (millisecond) by different outlier detection methods.
Figure 7: Comparison of outlier detection with different Gaussian noise conditions, where the tracer concentration-time curves are simulated using kinetic parameters typical of chronic multiple sclerosis.
Salient features of the proposed method include (1) a robust clustering method for classification of data, (2) a piecewise linear model being utilized to describe underlying data dynamics from which distance measures of individual data points can be derived, and (3) outlier detection being formulated as a hypothesis testing of distance measures based on