A Computer-Aided Heart Valve Disease Diagnosis System Based on Machine Learning

Cardiac auscultation is a noninvasive, convenient, and low-cost diagnostic method for heart valvular disease, and it can diagnose the abnormality of the heart valve at an early stage. However, the accuracy of auscultation relies on the professionalism of cardiologists. Doctors in remote areas may lack the experience to diagnose correctly. Therefore, it is necessary to design a system to assist with the diagnosis. This study proposed a computer-aided heart valve disease diagnosis system, including a heart sound acquisition module, a trained model for diagnosis, and software, which can diagnose four kinds of heart valve diseases. In this study, a training dataset containing five categories of heart sounds was collected, including normal, mitral stenosis, mitral regurgitation, and aortic stenosis heart sound. A convolutional neural network GoogLeNet and weighted KNN are used to train the models separately. For the model trained by the convolutional neural network, time series heart sound signals are converted into time-frequency scalograms based on continuous wavelet transform to adapt to the architecture of GoogLeNet. For the model trained by weighted KNN, features from the time domain and time-frequency domain are extracted manually. Then feature selection based on the chi-square test is performed to get a better group of features. Moreover, we designed software that lets doctors upload heart sounds, visualize the heart sound waveform, and use the model to get the diagnosis. Model assessments using accuracy, sensitivity, specificity, and F1 score indicators are done on two trained models. The results showed that the model trained by modified GoogLeNet outperformed others, with an overall accuracy of 97.5%. The average accuracy, sensitivity, specificity, and F1 score for diagnosing four kinds of heart valve diseases are 98.75%, 96.88%, 99.22%, and 97.99%, respectively. The computer-aided diagnosis system, with a heart sound acquisition module, a diagnostic model, and software, can visualize the heart sound waveform and show the reference diagnostic results. This can assist in the diagnosis of heart valve diseases, especially in remote areas, which lack skilled doctors.


Introduction
Cardiovascular diseases (CVDs) are one of the leading causes of death each year, causing an estimated 17.9 million deaths each year, according to the statistical data of the World Health Organization (WHO) [1]. It includes coronary heart disease, valvular heart disease, rheumatic heart disease, and other conditions, and valvular heart disease accounts for about 20% of cases, which is a large contributor to the burden of disease [2]. Tere are four heart valves, namely the aortic, mitral, tricuspid, and pulmonary valves. Heart valves will close alternately to prevent the regurgitation of blood, and the sound of valve closure can be heard using a stethoscope. If the valve is abnormal, it will refect in the heart sound. A cardiologist can diagnose heart valve diseases by auscultation based on murmurs of heart sounds.
Since 2020, due to the pandemic of COVID-19, medical staf should wear a protective suit to prevent infection of COVID-19 when carrying out treatment and diagnosis. It is no longer possible to use a conventional stethoscope. Some experts suggested using handheld ultrasonic devices instead of a stethoscope [3]. Although an ultrasound examination can provide more information on heart conditions, it requires ultrasonic equipment and skilled cardiologists to interpret the images. In conclusion, auscultation is of great value in the diagnosis of valvular heart disease.
However, auscultation has very high demands of professionalism for clinicians, and it takes a long time to master auscultation techniques, which can only be improved by practising on a large number of patients [4]. Moreover, it is more difcult to train a doctor expert in auscultation in developing areas than in developed areas.
Since machine learning is more and more used in medical felds [5], it is helpful to use machine learning for heart disease diagnosis based on heart sounds.
Tere are mainly three steps in training a model to diagnose heart valve diseases by machine learning: (i) signal preprocessing, (ii) feature extraction and selection, and (iii) classifcation. Generally, the signal preprocessing step includes signal denoising and segmentation. Ten features will be manually extracted or used with the neural network to be automatically extracted. Finally, the model will be trained by machine learning algorithms to diagnose diseases.
In the early years, many researchers emphasized signal segmentation to divide heart sounds into S1 (the frst heart sound), systolic, S2 (the second heart sound), and diastolic segments [6][7][8][9][10][11]. However, accurate segmentation relies on the ECG signal to locate the boundaries of heart sounds. Other segmentation methods based on signal processing such as Shannon energy and average power spectrum, cannot access accurate segmentation, which will have a negative efect on classifcation. Recently, studies have shown that extracting features directly without segmentation can also achieve good performance on classifcation [12][13][14][15][16]. Tus, we proposed a model without segmentation. Table 1 shows some related works on heart sound classifcation.
In other studies, heart sounds are usually classifed into normal and abnormal categories, which cannot be used to diagnose specifc heart diseases [45,46]. In this study, we have trained two models based on a convolutional neural network and a classic machine learning algorithm weighted k-nearest neighbour to diagnose heart valve diseases. Te two models are assessed using given indicators, including accuracy, specifcity, sensitivity, and F1 score, and the model trained by modifed GoogLeNet was selected after model assessment. It can automatically classify the heart sounds of healthy people and diferent valvular conditions, which are aortic stenosis (AS), mitral stenosis (MS), mitral regurgitation (MR), and mitral valve prolapse (MVP).
In addition, we designed a computer-aided heart valve disease diagnosis system. Te framework of this system has four parts: collecting heart sounds and uploading the audio fle; preprocessing the signal; diagnosing based on the trained classifcation model; and showing the diagnosis on the app. We not only trained a model with high accuracy of valvular heart disease classifcation but also designed a diagnostic system, which can assist in clinical diagnosis.

Materials
Te heart sound data in this study were obtained from a public dataset [21]. Tis heart sound dataset contains 1000 heart sound audios of 5 classes: normal heart sound, aortic stenosis (AS), mitral stenosis (MS), mitral regurgitation (MR), and mitral valve prolapse (MVP). Each of the heart sound signal is sampled at 8000 Hz. Figure 1 shows an example of 5 classes of heart sounds.

Transfer Learning for Heart Valve Disease Diagnosis.
Training deep neural networks usually requires a large number of data sets and computing resources. Tis study has only 1000 heart sound data, which is too small for a deep neural network to train the model. Tis may afect the accuracy of the model. In this case, we can use the pretrained network to train the model, a method called transfer learning.
GoogLeNet is a convolutional neural network for image classifcation, which is based on the inception architecture. It has 22 layers with 9 inception modules. Each inception module contains diferent sizes of convolution kernels from diferent sizes of 1 * 1 to 5 * 5 and a pooling layer. Even for the same image, the size of the convolution kernels will afect the efect of convolution. By utilizing the inception architecture, the network can choose whether to use a bigger or smaller size of convolution kernel by adjusting diferent weightings. So, it can classify images with high accuracy.
In this study, GoogLeNet was used for training the heart valve disease classifcation model, mainly including the following steps, as shown in Figure 2 3.1.1. Signal Preprocessing. GoogLeNet requires RGB input images with a size of 224-by-224-by-3. To meet this requirement, we have to preprocess the heart sound signal, which is a one-dimensional time series. A time-frequency representation is used to transfer a one-dimensional heart sound signal to a scalogram, which is a three-dimensional RGB image [22,47].
Before transforming heart sound signals into images, signal normalization has been done using equation (1), where μ and σ is the mean and standard deviation of the signal. Te time-frequency representation is based on the continuous wavelet transform (CWT). Te steps of transforming heart sound signals to a scalogram are (i) create a CWT flter bank of the signal based on the Morse wavelet; (ii) perform CWT using the flter bank created in Step 1 to get the scalogram; and (iii) resize the scalogram as 224-by-224-by-3 to ft the input size of GoogLeNet.
Te scalogram is a three-dimensional image whose horizontal axis represents time, the vertical axis represents frequency, and the colour represents the magnitude of the corresponding frequency and time. Te scalograms of 5 categories of heart sound signals are shown in Figure 3. Journal of Healthcare Engineering  After collecting the scalograms of all categories of heart sound signals, they are divided into two parts, which are the training set and the validation set. Te former is used to train the model, and the latter is used to verify whether the model performs well. In this case, there are 800 scalograms in the training set and 200 scalograms in the validation set.

Training Model with GoogLeNet.
GoogLeNet is a kind of CNN, which is designed for image classifcation. It is widely used in medical image classifcation tasks [48]. However, it is seldom used in heart sound classifcation tasks because Goo-gLeNet requires the input of 224-by-224-by-3 RGB images, while heart sound signals are one-dimensional time series.
In this case, we transform the original heart sound signals into scalograms to adapt the pattern of GoogLeNet. Besides, GoogLeNet is designed to classify 1000 categories of images, so we tuned some parameters of a few layers, such as the fully connected layer and output layer, to make the network match our training set. Firstly, we adjusted the dropout probability to 0.6 of the fnal dropout layer in the neural network. Secondly, we change the output size of the fully connected layer "loss3-classifer" to 5, which corresponds to the categories of our dataset. Tirdly, we replace the output layer with the classifcation layer to classify diferent categories of heart sounds.
Other options for the neural network are set as follows: we set the learning rate to 0.0001, mini-batch size to 15, and max epoch to 20. At last, the stochastic gradient descent with the Momentum optimizer is used for optimization. Te modifed architecture of GoogLeNet is shown in Figure 4.

Classic Machine Learning
Algorithm. Classic machine learning algorithms include K nearest neighbour (KNN), support vector machine (SVM), decision tree (DT), etc. It is diferent from the neural network in that those classic machine learning algorithms mentioned above require manually extracting features from signals and then using algorithms to train the model. Te performance of a model may be highly impacted by feature selection.
Te main steps of training a heart valve disease diagnosis model using classic machine learning algorithms include feature extraction, feature selection, classifcation, and model assessment, which are introduced in the following sections.  Table 2. In the time domain, 10 features are extracted manually. RMS is the square of the mean square of the signal. Te shape factor can be calculated by dividing the RMS by the mean of the absolute value. Te skewness and kurtosis are the third and fourth moments of the signal, which are shown in equations (2) and (3). Te peak value is the maximum absolute value of the signal. Te impulse factor is given by the peak value divided by the mean value of the absolute value of the signal. Te crest factor is given by the peak value divided by the RMS. Te clearance factor is given by the peak value divided by the squared mean value of the square roots of the absolute value of the signal.
In the frequency domain, 16 features are extracted manually. SNR is the ratio of signal power to noise power, where noise is measured by the RMS value. Te signal-to-noise and distortion ratio is the ratio of total signal power to the total power of noise and distortion. THD is the ratio of total harmonic component power to fundamental component power.
MFCC is widely used as a signal feature in speech recognition tasks. Heart sound signals are also audio signals, so MFCC can be used as a feature to classify diferent categories of heart sound signals. To extract MFCC features, there are several steps, which are described as follows: frstly, a window technique such as hamming window should be performed to prevent spectral leakage. Secondly, a fast Fourier transform (FFT) is performed to get the spectrum of the signal. Tirdly, pass the spectrum through a set of triangle flters with the mel scale. Te relation between the mel scale and frequency is defned in equation (4). Finally, we perform discrete cosine transform (DCT) of the mel spectrogram to get MFCCs. In this case, we preserve 13 coefcients to represent the signal.   the model will probably improve after selecting the proper features. In this study, the chi-square test is used to test whether there is a signifcant diference between the expected frequencies and the observed frequencies of extracted features in diferent categories.
Chi-square test is one of the most widely used nonparametric test, which is used for data not satisfed with the parametric test such as normal distribution. Te chi-square test will give the p value based on the degree of freedom and the chi-square value, where the degree of freedom equals the number of categories minus one.
Te predictor importance score is calculated by the p value shown in equation (5). Te smaller the p value, the higher the importance score. Te p value shows the signifcance between the features and categories. Features with higher importance scores have higher importance to the model. Te predictor importance scores of the top ten important features are shown in Figure 5. Te boxplots of the top fve important features are shown in Figure 6. It can be found that the distribution of those features is quite diferent among diferent categories. Finally, ffteen features with the highest predictor importance score were selected.

Classifcation with Weighted K-Nearest
Signal-to-noise ratio (SNR)

Frequency domain 12
Signal-to-noise and distortion ratio 13 Total harmonic distortion (THD) [14][15][16][17][18][19][20][21][22][23][24][25][26] Mel frequency cepstral coefcients (MFCC1-MFCC13) 8 Journal of Healthcare Engineering Journal of Healthcare Engineering sample will be classifed into the category by majority voting among its k-nearest neighbours. WKNN is a modifed version based on KNN. KNN classifcation has a shortage on the skewed dataset, which means examples in a more frequent class will be more common among the k-nearest neighbours. WKNN is designed to overcome this problem by giving a weight of 1/d to the distance, where d is the distance to the neighbours. Terefore, a longer distance will have a smaller weight, while a shorter distance will have a bigger weight.
An example of misclassifed by the KNN algorithm is shown in Figure 7. Intuitively, the predicted sample (white square) belongs to the green category. However, if we choose k equal to fve, the predicted sample will be classifed in the red category by voting among the fve nearest neighbours, because the number of red samples is bigger than the green ones. After assigning the weight of distance, the distance from the green samples will have larger weights, and the predicted sample will be classifed in the right category.

Model Assessment.
Four model assessment indicators are used to evaluate the performance of our model, which are accuracy, specifcity, sensitivity, and F1 score [49]. Accuracy can refect the overall accuracy rate of the model. Specifcity can refect the level of misdiagnosis where the higher specifcity corresponds to the lower misdiagnosis rate. Sensitivity can refect the level of detection of the patients; the higher the sensitivity, the more patients the model can detect. Te F1 score is a harmonic mean of specifcity and sensitivity, which combines the information of both specifcity and sensitivity. A higher F1 score corresponds to a higher value of both specifcity and sensitivity.
Te equations of those four indicators are as follows: a true positive means the person with the disease gets a positive result. A true negative means the person without disease gets a negative result. False-positive means the person without disease but gets a positive result. False negative means the person with the disease but gets a negative result.

Design of Heart Sound Acquisition Module.
A heart sound acquisition module was designed and made to acquire heart sounds from clinical, which contains a chest piece, rubber tubes, and a 3.5 mm microphone with audio cables connected to computers. Te structure of the designed module is shown in Figure 8. Te chest piece is a kind of resonator, which can nonlinearly amplify the sound generated by the heart valves and transmitted by the rubber tubes. Te sound signal will be acquired through the microphone and imported into the diagnosis system to classify whether it is a normal heart sound or a kind of valvular heart disease.

Software of the Diagnosis System.
After training and selecting the best model for classifying heart sounds, a heart valve disease diagnosing system is designed to meet the requirements of clinical application. Te framework of the diagnosing system is shown in Figure 9. Te software of the diagnosis system was designed based on MATLAB GUI (Graphical user interface) and converted to an EXE fle, which can be installed and executed on diferent terminals of Windows with or without MATLAB software, which is robust and easy to use. Once the audio fle of a heart sound signal is chosen, the waveform of the signal will be displayed on the right side of the app, which makes the heart sound more intuitive. Ten click the "Diagnose" button, and the program will preprocess the input signal, including downsampling or resampling the signal to 8000 Hz, which is identical to the training set, and convert the signal to its scalogram. Ten the scalogram will be classifed by the trained GoogLeNet model. Finally, the result of the diagnosis will be shown on the app. Figure 10 shows the interface of the diagnosing system. Classifcation of aortic stenosis heart sounds was shown as an example. Figure 11 shows the confusion matrix for the validation set, which contains 200 heart sound signals from two models     trained by GoogLeNet and WKNN. Te blue grids are the ones correctly classifying cases, while the pink grids are misclassifying cases. Most heart sounds are correctly classifed. Table 3 shows the indicators of model assessment for two trained models. Accuracy, sensitivity, specifcity, and F1 score are calculated to assess the diagnostic performance of diferent kinds of heart valve diseases separately.

Results
Te model trained by GoogLeNet can perfectly identify healthy people and patients with valvular disease. For diagnosing four kinds of heart valve disease, the average accuracy, sensitivity, specifcity, and F1 score are 98.75%, 96.88%, 99.22%, and 97.99%, respectively.
Te model trained by WKNN also has high accuracy in diagnosing heart valve disease but is a little lower than the trained GoogLeNet model. Te average accuracy, sensitivity, specifcity, and F1 score for classifying four valvular diseases are 94.63%, 86.25%, 96.72%, and 91.11%, respectively.

Evaluation of Heart Valve Disease Diagnosing System
Eighteen pieces of heart sounds are recorded and downsampled to 8000 Hz from healthy people and valvular disease patients, which contain six pieces of normal heart sounds and twelve pieces of heart sounds from valvular diseases. Normal heart sounds were recorded from the mitral valve area of six healthy participants. Twelve pieces of valvular disease heart sounds are recorded from four participants, two each in MR and AS. Heart sounds of each kind of valvular disease are collected from three auscultation areas of each participant, which are the mitral valve area, the Erb area, and the aortic valve area. Tree examples of each category of heart sounds are shown in Figure 12. Import the audio fles into the diagnosis system. Te time-frequency scalograms are generated based on CWT.  Ten, a GoogLeNet model, which has been trained by scalograms of heart sounds, is used to classify imported heart sounds. Te trained model correctly classifed fve normal heart sounds and eleven heart sounds associated with valvular diseases. Tus, an overall accuracy value of 88.89% is obtained through the diagnosing system. Te classifcation accuracy of normal and valvular diseases is 83.33% and 91.67%, respectively.

Discussion and Conclusion
In this study, both models trained by the convolutional neural network GoogLeNet and the classic machine learning algorithm WKNN had high accuracy in separating healthy people from heart valve disease patients. In terms of detecting valvular heart disease, it is shown that the model trained by GoogLeNet has better performance than the model trained by WKNN by comparing the four indicators, which are accuracy, sensitivity, specifcity, and F1 score, especially in diagnosing mitral regurgitation. In addition to the trained model, we proposed a whole heart valve disease diagnosis system. Heart sounds can be acquired by the heart sound acquisition module and diagnosed by uploading the recorded heart sound. Moreover, we collected three kinds of heart sounds, which are normal, MR, and AS, from valvular disease patients to verify our diagnosing system, and the experiment shows that it reached a high accuracy. Te proposed diagnosis system can collect heart sounds and diagnose four categories of valvular heart disease with high accuracy, which can assist doctors in diagnosing heart valve diseases and may greatly improve the accuracy of diagnosis in remote areas, which lack skilled cardiologists. In the future, we can collect more categories of heart sounds to train the model to diagnose more types of cardiovascular diseases, which may have a signifcant impact on reducing the uneven distribution of medical resources.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this paper.