Study on Parameter Optimization for Support Vector Regression in Solving the Inverse ECG Problem

The typical inverse ECG problem is to noninvasively reconstruct the transmembrane potentials (TMPs) from body surface potentials (BSPs). In the study, the inverse ECG problem can be treated as a regression problem with multi-inputs (body surface potentials) and multi-outputs (transmembrane potentials), which can be solved by the support vector regression (SVR) method. In order to obtain an effective SVR model with optimal regression accuracy and generalization performance, the hyperparameters of SVR must be set carefully. Three different optimization methods, that is, genetic algorithm (GA), differential evolution (DE) algorithm, and particle swarm optimization (PSO), are proposed to determine optimal hyperparameters of the SVR model. In this paper, we attempt to investigate which one is the most effective way in reconstructing the cardiac TMPs from BSPs, and a full comparison of their performances is also provided. The experimental results show that these three optimization methods are well performed in finding the proper parameters of SVR and can yield good generalization performance in solving the inverse ECG problem. Moreover, compared with DE and GA, PSO algorithm is more efficient in parameters optimization and performs better in solving the inverse ECG problem, leading to a more accurate reconstruction of the TMPs.


Introduction
The inverse ECG problem is to obtain myocardial transmembrane potential (TMPs) distribution from body surface potentials (BSPs), thus noninvasively imaging the electrophysiological information on the cardiac surface [1,2]. Generally, approaches to solving this inverse ECG problem can be relied on potential-based model, including epicardial, endocardial, or transmembrane potentials, which is used to evaluate the potential values on the cardiac surface [3] at certain time instants. Moreover, the cardiac electrophysiological information is closely associated with the transmembrane potentials (TMPs) of the myocardial cells. Compared to body surface potentials (BSPs) recordings, TMPs can provide more detailed and complicated electrophysiological information. In this study, we focus on implementing the reconstruction of TMPs from BSPs.
To study inverse ECG problems, various numerical methods have been proposed. In the last decades, regularization methods have been engaged for dealing with the inherent ill-posed property. The regularization techniques include truncated total least squares (TTLS) [4], GMRes [5], and the LSQR [6], which require an appropriate selection of regularization parameters so as to relax the ill-posedness of the inverse ECG problem and to produce a well-posed problem. However, the robustness and the quality of the inverse solution are not always guaranteed; despite that they can more or less deal with the geometry and measurement noises for the inverse ECG problems. In addition, during the solution procedure, the inverse ECG problem can be treated as a regression problem with multi-inputs (BSPs) and multi-outputs (TMPs). Therefore, an alternative method, support vector regression (SVR) method [7], was proposed to solve the inverse ECG problem. Compared with conventional regularization methods, SVR method can produce more accurate results in terms of reconstruction of the transmembrane potential distributions on epi-and endocardial surfaces [1,8]. SVR is an extension version of support vector machine (SVM) to solve nonlinear regression estimation problems, which aims at minimizing an upper bound of the generalization error instead of the training error [9] by adhering to the principle of structural risk minimization (SRM).
Although SVR is a powerful technique to solve the nonlinear regression problem, it has received less attention in relation to the inverse ECG problems, due to the fact that SVR algorithm is sensitive to users' defined free parameters. The involved hyperparameters of SVR model consist of penalty parameter , insensitive loss function parameter , and the parameter for kernel function. Inappropriate parameters in SVR can lead to overfitting or underfitting problems. How to properly set the hyperparameters is a major task, which has a significant impact on the optimal generalization performance and the SVR regression accuracy [10], especially when it comes to predicting the diagnosis of cardiac diseases, because even a slight improvement of prediction accuracy could have a significance impact on the patient's diagnosis [11,12].
In early development stage of the algorithm, a grid search optimizing method [13] and cross-validation method [8] are employed to optimize the hyperparameters. However, these methods are computationally expensive and data intensive. Recently, a number of new algorithms have been proposed for the optimization of the SVR parameters [14,15]. For example, Wang et al. [16] has proposed a hybrid load forecasting model combining differential evolution (DE) algorithm and support vector regression to deal with the problem of annual load forecasting. In the analysis of predicting tourism demand, the study [17] applies genetic algorithm (GA) to seek the SVR's optimal parameters and then adopts the optimal parameters to construct the SVR models. The experimental results demonstrate that SVR outperforms the other two neural network techniques. Another study [18] tries a new technology, particle swarm optimization (PSO), to automatically determine the parameters of SVR, then applies the hybrid model (PSO-SVR) to grid resource prediction, and the experimental results indicate high predictive accuracy.
In this paper, the above mentioned optimization algorithms (GA, DE, and PSO) are all adopted to dynamically optimize the hyperparameters of SVR model in solving the inverse ECG problem. For convenience, the SVR model with GA parameter selection is referred to as GA-SVR method, and the other two are termed DE-SVR and PSO-SVR, respectively. In this paper, we attempt to investigate which one is the most effective in reconstructing the cardiac TMPs from BSPs, and a full comparison of the performance for solving the inverse ECG problem will be evaluated.

Brief Overview of the SVR.
In this section, the basic SVR concepts are concisely described; for detailed description, please see [19,20]. Suppose a given training data of elements {( , ), = 1, 2, . . . , }, where denotes the th element in -dimensional space; that is, = { 1 , . . . , } ∈ , and ∈ is the output value corresponding to . According to mathematical notation, the SVR algorithm builds the linear regression function as the following form: where and are the slope and offset coefficients; ( ) denotes the high-dimensional feature space, which is nonlinearly mapped from the input space . And the previous regression problem is equivalent to minimizing the following convex optimization problem [see (2)]: In this equation, an implicit assumption is that a function essentially approximates all pairs ( , ) with precision, but sometimes this may not be the case. Therefore, by introducing two additional positive slack variables and * , the minimization is reformulated as the following constrained optimization problem [shown in (3) where the parameter is the regulator which is determined by the user, and it influences a tradeoff between an approximation error and the weights vector norm || ||; and * are slack variables that represent the distance from actual values to the corresponding boundary values of -tube. According to the strategy outlined by Schölkopf and Smola [10], by applying Lagrangian theory and the KKT condition, the constrained optimization problem can be further restated as the following equation: Here and * are the Lagrange multipliers. The term ( , ) is defined as the kernel function. The nonlinear separable cases could be easily transformed to linear cases by mapping the original variable into a new feature space of high dimension using ( , ). The radial basis function Computational and Mathematical Methods in Medicine 3 (RBF) was applied in the study, which has the ability to universally approximate any distribution in the feature space. With an appropriate parameter, RBF usually provides a better prediction performance, so it is adopted in this study as shown in (5): where and are input vector spaces and 2 is the bandwidth of the kernel function.
In the above equations, there exist three hyper-parameters to be determined in advance, that is, the penalty parameter , insensitive parameter , and the related kernel function parameters 2 . They heavily affect the regression accuracy and computation complexity of SVR. The penalty parameter controls the degree of punishing the samples whose errors go beyond the given value. The insensitive parameter controls the width of the -insensitive zone used to fit the training data. The value of can enhance the generalization capability; with the increase of , the number of support vectors will decrease, and the algorithmic computation complexity will also reduce. The bandwidth of the kernel function has a great influence on the performance of learning machine.
In this study, three optimization methods, that is, GA, DE, and PSO, are presented to determine the optimal hyperparameters of the SVR model. According to [15], the general range of , 2 , and has been given. In the trial operation, we narrowed it to avoid blindness in the optimization process. In this paper, the set of hyperparameter ( , 2 , ) is initialized in the given range ∈ [0, 10000], 2 ∈ [0, 2], and ∈ [0, 0.0001], where optimization methods (GA, DE, and PSO) are to seek the global optimal solutions.
In order to calculate the error uniformly, we adopt the same fitness function which plays a critical role in measuring these algorithms performance. The fitness function is determined as follows: where is the number of training data samples, is the actual TMPs of train data, and is the predicted TMPs. The solution with a smaller fitness of the training dataset has a better chance of surviving in the successive generations. The main tool, LIBSVM, was used for training and validating the SVR model [21].

Genetic Algorithm (GA) Optimization
Method. The GA [22] is a biologically motivated optimization technique guided by the concepts of biological evolution and Darwin's principle of survival of the fittest. It is a computer model of an evolution of a population of artificial individuals. In this study, for the specific optimizing problem of hyperparameters ( , 2 , ), the process is defined as follows.
Step 1 (initialize population). The population size NP is equal to 20; the dimension of parameter vectors is 3.
The termination criterion is set as follows: the number of iterations is set as 30, and the fitness tolerance value is set as 0.001. The probabilities of selection (Stochastic universal sampling), crossover (multipoint crossover) and mutation (mutation operator of the Breeder Genetic Algorithm) that were used herein are 0.9, 0.8, and 0.05, respectively.
Step 2 (encode chromosomes). According to the possible range of parameters , 2 , and given before, the GA utilizes binary encoding method, and each parameter is encoded by 20 bits of binary number. Therefore, the search space is defined as the solution space in which each potential solution is encoded as a distinct chromosome.
Step 3. The parameters , 2 , and of each individual are used to build the SVR model. With the BSPs of the training data, the cardiac TMPs are reconstructed. Then the performances of individuals in the generation are evaluated by the specific objective fitness function according to (6). The individual with the minimum fitness value will be selected, and then the chromosome of selected individual is preserved as the best result.
Step 4. The basic genetic search operators comprise selection, mutation, and crossover, which are applied orderly to obtain a new generation where the new individual ( , 2 , ) with the best performance is retained.
Step 5. The new best fitness value will be compared to that of the best result, and then select the better one to update the best result.
Step 6. This process will not come to an end until the termination criterion is met, and the best chromosome is presented as the final solution; otherwise, go back to Step 4.

Differential Evolution (DE) Optimization Method.
Differential evolution [23] is a population-based and parallel direct search method which is used to approximate the global optimal solution. As is mentioned above, the optimization of hyperparameter ( , 2 , ) can be transformed into solving the minimization of the fitness function Min: ( ), = , = 1, 2, . . ., and NP is three-dimensional parameter vectors including ( , 2 , and ). Subsequently, each generation evolves by employing the evolutionary operators involving mutation, crossover, and selection operations to produce a trail vector for each individual vector [24], and the detailed evolutionary strategies can be described as follows.
Step 1 (initialize DE parameters). The population size NP is set as 20 [25], and the dimension of parameter vectors is 3. The termination criterion is set as follows: the number of iterations is set as 30, and the fitness tolerance value is set as 0.001. The mutation factor is selected in [0.5, 1], and the crossover rate CR is selected in [0, 1].
Step 2 (initialize population). Set = 0. Generate an NP * D generation which consists of individuals ( , 2 , and ) with uniform probability distribution random values within the control variable bounds of parameter space.
Step 3. With the parameters, train the SVR model, and forecast the training datasets. Then calculate the fitness target of all the individuals in the generation, and record the minimum values of the fitness function and set ( , 2 , and ) correspondingly as the best .
Step 4 (mutation operation). For each target vector , , an associated mutant vector is generated according to The random indexes 1 , 2 , 3 , and 4 have to be distinguished from each other and from the current trial index .
Step 6 (selection operation). After the fitness values of the new generation being calculated, then selecting the new best individual as best, +1 , the selection operation is performed. To decide whether or not it could replace the best , the fitness value of ( best, +1 ) is compared to that of ( best ). The operation is given as follows: Step 7. Termination condition checking: if max iterations is met or the fitness value of ( best ) is reached within the fitness tolerance value, return the recorded global optimal parameter ( , 2 , and ); otherwise, go to Step 4.

Particle Swarm Optimization (PSO) Optimization
Method. The Particle swarm optimization (PSO) is a bioinspired stochastic optimization technique which was basically developed through simulating social behaviour of birds and insects that keep living by maintaining swarm actions. A particle is considered as a bird in a swarm consisting of a number of birds, and all particles fly through the searching space by following the current optimum particle to find the final optimum solution of the optimization problem [26]. The detailed experimental procedure for selecting the best hyperparameters ( , 2 , and ) is listed as follows.
Step 1. Particle initialization and PSO parameters setting [27]: set the PSO parameters including the scope of , 2 , and ; the number of particles is 20; particle dimension is 3. The termination criterion is set as follows: the number of iterations is set as 30 and the fitness tolerance value as 0.001.
Step 2. Randomly generate primal population of random particles ( , 2 , and ) and velocities inside the searching space.
Step 3. Construct SVR model with the parameters in each particle, and perform the prediction with the training data. Then evaluate the regression accuracy based on the defined fitness function. Then maintain the record of the best performance with the minimum error as the global best solution.
Step 4. Particle manipulations: suppose that ( ) and ( ) are the particle and velocity at step , and they are updated into their new values ( + 1) and ( + 1) at step + 1 by applying the two equations below: where ∈ [ min , max ], Δ is the unit time step value, and is the inertia weight. ( ) and ( ) are the best coordinates of particle number and the whole swarm until step , respectively. Positive constants 1 and 2 are learning rates while 1 and 2 are random numbers between 0 and 1.
Step 5. Train the SVR model using the particles ( , 2 , and ) in the new generation and forecast the training datasets. After evaluating the fitness function, pick the new best particle as the local best.
Step 6. Compare the local best and global best, and choose the better one to update the global best.
Step 7. Stop condition checking: if a termination criterion (the number of iterations) is met or the fitness value of the global best is reached within the extent predefined, return the recorded global best as the best parameter ( , 2 , and ); otherwise, go to Step 4.

Simulation Protocol and Data
Set. The SVR model is tested with our previously developed realistic heart-torso model [11,12]. In this simulation protocol, a normal ventricular excitation is illustrated as an example to calculate the data set for the SVR model. The considered ventricular excitation period from the first breakthrough to the end is 357 ms, and the time step is 1 ms, and thus 358 BSPs and TMPs temporal data sets are calculated. Sixty data sets ( and ) are chosen at times of 3 ms, 9 ms, 15 ms, . . ., 357 ms after the first ventricular breakthrough which will be used as testing samples to assess the SVR models prediction capability and robustness. The rest 298 in 358 data sets are employed as the training samples for the training and optimal parameter selection procedures. Moreover, during each ventricular excitation period, 412 potentials on the BSPs Computational and Mathematical Methods in Medicine 5 and 478 potentials on the TMPs are captured. That is to say, the matrix of BSPs is divided into the training data (298 × 412) and testing data (60 × 412), and the matrix of TMPs is also divided into the training data (298 × 478) and testing data (60 × 478). Each SVR regression model is built by using the mapping relations between BSPs training data (298 × 412) and one TMPs training point (298 × 1). Based on all the training data, 478 different regression models were built by using GA-SVR, DE-SVR, and PSO-SVR, respectively.
Using the corresponding SVR model, we can reconstruct the corresponding TMPs (60 × 1) from the BSPs testing data (60 × 412). For all testing samples, we can reconstruct the TMPs from the all 478 regression SVR models.

Overview of the SVR Parameters Optimization Framework.
As shown in Figure 1, in the first stage, the original input data is preprocessed by scaling the training data and feature extraction. In the second stage, the hyperparameters ( , 2 , and ) of SVR method are set carefully by using the GA, DE and PSO optimization methods. Moreover, GA-SVR, DE-SVR, and PSO-SVR are applied to construct the hybrid SVR model, respectively. Finally, according to the effective hybrid SVR models, we can reconstruct the transmembrane potentials (TMPs) from body surface potentials (BSPs) by using these testing samples.

Scaling the Training Set.
During preprocessing stage, each input variable is scaled in order to avoid the potential value from spanning a great numerical range and to prevent numeric difficulties. Generally, all input values including 358 BSPs and TMPs temporal data sets are linearly scaled to the range (0,1) by the following formula [see (11)]: where is the original potential value of each time , is the scaled potential value, max is the maximum potential value of each time , and min is the minimum potential value of each time .

Feature Extraction by Using KPCA.
The kernel principal component analysis (KPCA) is a nonlinear feature extraction method which is one type of a nonlinear PCA developed by generalizing the kernel method into PCA. Specifically, the KPCA initially maps the original inputs into a highdimensional feature space using the kernel method and then calculates the PCA in the high-dimensional feature space . The detailed theory of KPCA can be consulted in the paper [28]. Feature extraction is a significant task for preprocessing the original input data in developing an effective SVR model. As is investigated in our previous study [12], the SVR with feature extraction by using KPCA has superior performances to that of using PCA or the single SVR method in reconstructing the TMPs. In this paper, the kernel principal component analysis (KPCA) was proposed to implement the feature extraction. And the dimension of BSPs dataset is reduced from 412 to 200 properly, which can reduce the dimensions of the inputs and improve the generalization performances of the SVR method.

The Result of Parameters Selection.
In this study, the GA, DE, and PSO algorithms were proposed to seek the corresponding optimal hyperparameter ( , 2 , and ) of the 478 SVR models. Table 1 displays the average value and the standard deviations of three parameters in 478 SVR models by using GA, DE, and PSO optimization algorithms. From it, we can find that the gaps among these optimal parameters are obvious by using three different optimization methods. The standard deviations of are neglected because they are very small.

Comparison of Reconstruction
Accuracy. This study compares the reconstruction performance based on the proposed three intelligent optimization algorithms for selecting the hyperparameters of SVR. By using GA-SVR, DE-SVR, and PSO-SVR hybrid models, two types of experiments for reconstructing the TMPs have been conducted. The first experiment reconstructs the TMPs for one representative source point (the 50th point) on the epicardium over 60 testing times, as is depicted in Figure 2. Since the gaps among   Figure 2 and Table 2, the PSO-SVR outperforms the GA-SVR and DE-SVR methods in reconstructing the TMPs for one representative source point over all the testing times. Furthermore, the DE-SVR is superior to the GA-SVR method in reconstructing the TMPs. The second experiment reconstructs the TMPs on all the heart surface points at one time instant. These inverse ECG solutions are shown in Figure 3, in which two sequential testing time points (27 and 51 ms after the first ventricular breakthrough) are presented to illustrate the performances of the GA-SVR, DE-SVR, and PSO-SVR methods. It can be seen that, among the three parameters optimization methods, the hybrid PSO-SVR performs better than GA-SVR and DE-SVR methods because its reconstructed solutions are much closer to the simulated TMPs distributions.
Based on the simulation information of TMPs, we can evaluate the accuracy of the reconstructed TMPs of these three different intelligent optimization algorithms at the testing time by mean square error (MSE), the relative error (RE), and the correlation coefficient (CC): Here, is the simulated TMP distribution at time , and is the reconstructed TMPs. The quantities and are the The MSE, RE, and CC of the three sets of reconstructed TMPs over the 60 testing samples can be found in Figure 4. In contrast to GA-SVR and DE-SVR, the PSO-SVR method can yield rather better results with lower RE, MSE, and a higher CC. To validate the robustness of the three methods in reconstructing the TMPs, the quantities m-MSE, m-RE and m-CC, that is, the mean value of MSE, RE, and CC at different testing time instants, are presented. Moreover, std-MSE, std-RE and std-CC, that is, the standard deviation of MSE, RE, and CC over the whole testing samples, are also investigated. As shown in Figure 5, the mean value of the MSE, RE, and CC over the 60 testing samples obtained by GA-SVR, DESVR, and PSO-SVR methods are presented, and the standard deviations of those 60 MSEs, REs and CCs are also provided.

Comparison of Reconstruction Efficiency.
In solving the inverse ECG problem, seeking the optimal parameters in the settled range is time consuming. Therefore, how to successfully excogitate a relatively timesaving optimal method matters significantly in the practical application, particularly in the diagnosis of cardiac disease. Here, we figure out the mean time of the three optimization methods in selecting the optimal hyperparameters of SVR. In order to forecast the 478 TMPs over 60 testing samples on an epi-and endocardial surface, 478 sets of parameters ( , 2 , and ) have to be selected in each optimization method and with the testing data, 478 models are built accordingly. In this study, the mean time for selecting the optimal parameters by adopting GA, DE, and PSO is 208.27 s, 342.53 s, and 130.16 s, respectively. It can be seen that PSO can lead to a convergence more quickly and lessen more time than GA and DE. Therefore, according to the comparison of their efficiencies, PSO-SVR method significantly outperforms the GA-SVR and DE-SVR in the reconstruction of the TMP, which is the most efficient one among these three optimization methods.

Conclusion
In the study of the inverse ECG problem, SVR method is a powerful technique to solve the nonlinear regression problem and can serve as a promising tool for performing the inverse reconstruction of the TMPs. This study introduces three optimization methods (GA, DE, and PSO) to determine the hyperparameters of the SVR model and utilizes these models to reconstruct the TMPs from the remote BSPs. Feature extraction using the KPCA is also adopted to preprocess the original input data. The experimental results demonstrate that PSO-SVR is a relatively effective method in terms of accuracy and efficiency. By comparison, PSO-SVR method is superior to GA-SVR and DE-SVR methods, whose reconstructed TMPs are close to the simulated TMPs. Besides, PSO-SVR is much more efficient in determining the optimal parameters and in building the predicting model. According to these results, the PSO-SVR method can serve as a promising tool to solve the nonlinear regression problem of the inverse ECG problem.