A New Classification Method in Ultrasound Images of Benign and Malignant Thyroid Nodules Based on Transfer Learning and Deep Convolutional Neural Network

,


Introduction
In recent years, the incidence of thyroid cancer has continued to rise.As a malignant tumor of the head and neck, it continues to threaten people's health [1].It is reported that, in the United States, thyroid carcinoma is expected to be the third disease threat among women, with approximately 37 cases per 100,000 people [2].
e thyroid nodule is a symptom of thyroid-related disease.
e nodule may be caused by the growth of thyroid cells or thyroid cyst.e thyroid tissues around the scattered lesion of thyroid nodule can be clearly distinguished through images [3,4].If the benign and malignant nodules can be judged earlier, even malignant nodules can be cured.In addition, the accurate distinguishing methods can provide an effective basis for the proper subsequent clinical treatments.Besides, the accurate diagnosis earlier can also reduce the medical risk to be suffered by patients and a large amount of health care costs caused by acupuncture detection.
Currently, there are two major methods for examining the nature of thyroid nodules: ultrasound image analysis and computer tomography imaging analysis.Between them, ultrasound imaging is cheap and common in hospital.is is why the ultrasound image analysis is more common.However, in the ultrasound image, the malignant thyroid nodule with prominent histopathological components and blurred boundaries usually adhere to other tissues, difficult to distinguish the morphology.is requires an efficient image classification method to improve accuracy and reduce the misdiagnosis rate.In the past studies, radiographers have summarized thyroid nodules ultrasonographic features images according to their characteristics, which function as signs of cancer.However, thyroid nodules diagnosis relying on these characteristics is time-consuming and poorly robust.To this end, the accurate computer aided diagnosis system based on ultrasound images is still to be judged by doctors.A fully automatic computer aided diagnosis system consists of image preprocessing, such as denoising, ROI extraction, and classification.Nowadays, most researches mainly focus on image denoising and ROI extraction.At present, it is still difficult to judge by ultrasonic images alone.e low quality and noise pollution of the ultrasound image makes it extremely challenging to classify it.Tsantis et al. [5] proposed an SVM classifier to divide thyroid nodules into high-risk and low-risk malignant tumors.Ma et al. [6] present a noninvasive and automatic approach for differentiating benign and malignant thyroid nodules based on support vector machines (SVM).Acharya et al. [7] proposed a wavelet transform filter to classify it.Shukla et al. [8] utilize artificial neural network dealing with thyroid disease.Prochazka et al. [9] proposed a dual threshold binary decomposition method to classify it.
Rapid progress in the automatic classification of medical image data is also made by this method.andiackal et al. [10] identified skin lesions through some pretrained classical classification networks.Convolutional Neural Network (CNN) models are a type of deep learning architecture introduced to achieve the correct classification of breast cancer [11].It proposed an in-depth model that uses limited chest CT data to distinguish malignant nodules and benign nodules [12][13][14][15].It proposed a classification algorithm for thyroid nodule ultrasound images based on DCNN [16].Nevertheless, these methods are defective in the following aspects at present: Needless to say, transfer learning has played an important role in ultrasound imaging diagnosis of thyroid cancer.However, few-shot learning is the challenging problem of making predictions based on a limited number of samples [17][18][19][20].Also, data labelling is a task that requires a lot of manual work [21][22][23].Finally, the inappropriate model and imbalanced training data are difficult to get better classification accuracy [24][25][26][27][28][29][30].
erefore, in view of the above problems, this paper discusses and does the following work: In response to the abovementioned few-shot learning, in this paper, a TV model is introduced for the automated preprocessing of original data collected by various institutions.Some image marks made by doctors also need to be removed.
e original image is then expanded by data augmentation for the purpose of supplementing inadequate training samples.Also, in response to what is mentioned above to select suitable learning transfer model, the Goo-gLeNet model was established for the experiments on thyroid ultrasound image data sets.e results showed that the model enhanced the accuracy for the nodule classification.Finally, in response to the above imbalanced training data problem, this paper puts forward the secondary transfer learning conducted on public thyroid database and the actual data sets collected by hospitals, which improves the classification accuracy.e structure of this paper is presented as follows.In Section 1, the writing motivation is given, and the relevant literature has been examined.Section 2 provides the traditional CNN structure and describes the Tv-Based Image Restoration.Section 3 describes the network structure of the proposed methods based on GoogLeNet.Section 4 shows the experimental results, including the application of the proposed method for diagnosis of thyroid cancer.e research results are summarized in Section 5.

Abbreviations and Acronyms.
is kind of network structure is usually called CNN, local connection, weight sharing, and other characteristics of feed-forward neural network [31].It then inputs the extracted features into the fully connected network; thereby, the parameters are to be optimized.In their research, Moon et al. used ultrasound images for cancer diagnosis.e difference from the previous method is that they use a variety of data sets and combine different CNN algorithms for fusion diagnosis.It was found that the accuracy rates of different data sets were 91.1% and 94.62%, respectively [32].Kim et al. used deep learning methods for intelligent diagnosis of breast ultrasound images.By calculating different performance standards, the AUC value was 89% [33].ere are also some methods that use a three-dimensional convolutional neural network structure.rough experiments, different performance standards have been found, and the accuracy rate can reach 96.7%.
It is an effective method to extract image features.e image input to the convolutional layer.In this layer, it can perform feature extraction tasks.Each feature can be extracted from each feature map through the convolutional layer.e weight is updated through continuous backward propagation during the training.e computing formula of the convolution layer is where X j is the output neuron cell, x i is the input signal of each network cell, f is the activation function, K ij is the convolution kernel, and b is the offset.After the features are extracted through the convolution layer, the output feature map reaches the pooling layer for feature selection and information screening.
e pooling formula is shown in the following equation: 2Complexity where β j is the weight coefficient and L is the sampling function [34][35][36].
After pooling, the data is input to the fully connected layer that is equivalent to the traditional forward neural propagation.e connected end of convolutional neural only transmits signals to other fully connected layers.e traditional CNN Structure is shown in Figure 1.
In the traditional CNN structure, the forward propagation is adopted to build the network structure, and the backward propagation is adopted to train the network parameters.e loss function, learning rate, and moving average are used for network optimization.Regularization and cross entropy are the loss functions in CNN.
e cross entropy formula is shown as where y is the standard answer and y ′ is the predicted value.e exponential decay learning rate is adopted, i.e., the magnitude of each parameter update.e formula of parameter update is given by where r is the learning rate and f ′ is the gradient of the loss function.

TV-Based Image
Restoration.e data sets collected for this experiment were few and needed to be augmented.In the present study, the data set was augmented only by rotation and translation.
e current data contained manual marks, as shown in Figures 2(a) and 2(c).Manual marks mainly refer to the marks made by the professionals on the lesion area in the ultrasound image, which destroy the part of the texture and affect the accuracy and integrity of the image of the area to be analyzed.
is also impacts the subsequent training.erefore, restoring the image was essential.In 2002, Shen et al. [37] extended the TV model to image inpainting and proposed an image inpainting method based on the TV model.e Total Variation-(TV-) based self-adaptive image restoration was adopted for images to estimate the value after pixel restoration: where G O represents the pixel of the current point O to be restored, G p represents the pixel of neighboring points of the current point , and H OO is the weight coefficient, which was mainly determined by W p .It is defined in where ∇g p is the divergence; where G W , G NW , and G SW are the pixels of the left neighboring point, the upper left neighboring point, and the lower left neighboring point of the current pixel; λ(O) is the parameter of λ at point O.
Finally, as shown in Figure 2, the image was well restored to an extent that its texture was similar to the surrounding texture.e same method was applied to restore the pixel of Figures 2(a

Proposed Network Structure.
e CNN model of Goo-gLeNet was established to realize the diagnosis of thyroid classification.e process is shown in Figure 3. Initially, the TV-based preprocessing was performed for the thyroid nodule image.Subsequently, the training of CNN model was conducted to extract the features of images of various sizes.
ereafter, the transfer learning was implemented based on the open source database and the database actually collected.
e features were integrated, and the dual-softmax assisted forward propagation was conducted.In the end, a softmax classifier was adopted to classify features.e diagnosis of thyroid classification was thus completed.

GoogLeNet CNN Structure.
GoogLeNet adopts the structure of the inception proposed in the Going Deeper with Convolutions [38].Generally, a CNN structure just simply augments the network, with two disadvantages, namely, overfitting and increase in the computation amount.Generally, the network depth and width can be increased by reducing the parameters, while the reduction of parameters turns the full connection to a sparse connection.For the dense matrix optimization mode, the computation amount does not have a qualitative improvement with this kind of change.e inception structure has a sparse structure and high computing performance.
e inception structure is shown in Figure 4. Complexity e use of various scale convolution kernels can get various sizes receptive fields.e final stitching refers to the integration of various scales.Different kernel sizes were set for alignment, such as 1 * 1, 3 * 3. Also, the convolution stride � 1 and the pad � 0, 1, 2, respectively, which was directly stitched together later.However, as the use of 5 × 5 convolution kernel still generated a large amount of computation, hence, the 1 * 1 convolution kernel was utilized to reduce the dimension.e specifically improved inception structure is shown in Figure 5.

Improved GoogLeNet Structure.
e GoogLeNet network model is stacked based on the Inception module.Being a network with a relatively large given depth, there is a problem with the backward propagation of effective communication gradient through all layers.For this task, the performance of the shallower network shows that the features generated by the intermediate layer of the network should be very discernible.
e discriminative ability of classifiers at low stages can be expected to add complex classifiers.It is considered as a method that overcomes the problem of vanishing gradient.It can adopt the forms of small CNN that are placed above the output of the inception module.ese auxiliary networks are discarded in case of inference.e subsequent control experiment results show that the influence of the complex networks is almost the same.One of them is adequate to achieve the same effect.
Dropout determined what percentage of fully connected nodes was shut off for a training cycle.Dropout improved the model generalizability by preventing nodes from overlearning the training data.e average pooling was finally adopted for the network to replace the fully connected layer.Furthermore, in order to prevent the gradient from vanishing, the network was provided with two additional softmax for the forward propagation gradient.e structure of inception is shown in Figure 1. e computation was performed after the number of channels was reduced through the 1 × 1 convolution to aggregate the information, effectively making use of computing power.
e integration of multidimensional features by combining the convolution and pooling of different scales also contributed to a better effect in terms of recognition and classification.By changing the computing power from being deep to being wide, it avoided the problem of dispersion of the training gradient.e global average pooling adopted by GoogleNet solved the typical problem of the complicated and weakly generalized parameters of traditional CNN network.   1 separately, and the remaining 20 percent was reserved for validation.

Experiments
e thyroid nodule ultrasound image data used was obtained from the hospitals.After the data augmentation, there were 2,763 images of malignant cases and 541 images of benign cases, with a total of 3,304 images.All images were cropped into a size of 240 * 240.e images were extracted from the thyroid ultrasound video sequence by the ultrasonic apparatus, at a frequency of 12 MHz.e TI-RADS score was given by a professional physician after the image diagnosis.3,123 images of cases were used for the training of improved models.541 images, as a test data set, were then randomly divided into 5 groups to test the above three models.Each of the benign and malignant samples is divided into a verification set, test set, and training set.e specific classification scheme is shown in Table 1.
e overall condition is shown in Table 1 below.

Comparative Analysis of Experimental Results.
e comparison of accuracy is important for different models.Table 2 shows that our mean (Improved Inception) was improved in terms of accuracy than the common Goo-gLeNet model, and it exhibited the highest accuracy rate in determining whether a thyroid nodule changed pathologically.
e confusion matrix and performance standards obtained in LeNet5, VGG16, GoogLeNet, and GoogLeNet (improved) models are shown in Figure 6.
e LeNet5 architecture correctly predicted 860 out of 1000 images and incorrectly predicted 140.e VGG16 architecture correctly predicted 920 out of 1000 images and incorrectly predicted 80.Although the GoogLeNet architecture correctly predicted 960 of the 1,000 pictures, it incorrectly predicted 40 of them.e most successful class of the GoogLeNet (improved) architecture is the ordinary class.
e GoogLeNet (improved) architecture correctly predicted 970 out of 1000 images and incorrectly predicted 30.
True Positive Rate (TPR) is shown on the vertical axis of Figure 6, and False Positive Rate (FPR) is shown on the horizontal axis of Figure 6. e entire graph is also called the ROC curve.Figure 6 shows the result of classification accuracy percentage of our proposed algorithm as 96.65%, 97.81%, 97.32%, 95.97%, and 0.97%, respectively.e loss values of different CNN models are shown in Table 3, and the GoogLeNet (Improved Inception) model was relatively minimal.
e change in the trend in  Complexity continuous iteration is shown in Figure 7. Table 4 shows the results of the time consumed by the different models to diagnose the same test image.As shown in Table 5, the LeNet5 model exhibited a shortest time to diagnose the thyroid ultrasound image, and the GoogLeNet model exhibited the second shortest.

Joint Training and Secondary Transfer Learning.
In GoogLeNet, the transfer is from the MINIST data set to the thyroid image.Generally, it is believed that the transfer effect is worse than that of the two similar data sets, when the two data sets have a great difference.MINIST, as a natural image, greatly differs from the medical image.erefore, the joint data training was conducted herein, based on the public database, and it was provided by the cooperative organization of this paper.Because of the lack of samples, the joint database was deemed as a whole in the training, which further expanded the overall database.
In transfer learning, the database of small samples was used as the aiming field, and a great quantity of marked database was used as the source domain.In the previous experiments, 2,210 images of malignant and 553 images of benign cases, a total of 3,374 images, were collected from hospitals.

Analysis of Experimental Results
. Table 4 shows the difference in the performance between the secondary transfer learning and the primary transfer, the data joint training, and the VGG16-based system.e results showed  that, for small medical data sets, the secondary transfer significantly improved the system performance.Figure 8 shows the comparison of system transfer and non-transfer learning based on LeNet5, VGG16, Inception V3.With α � 0.05, the p value of the VGG16 model is less than 0.001, and the t value � −28.71.Because the p value is less than α, there is enough evidence to reject these invalid hypotheses.e p value of the LeNet model is 0.05 and the t value � −1.66, which shows that there is not enough evidence to reject the null hypothesis α � 0.05.However, between LeNet and VGG16, the average value of VGG16 is higher, and the average value of the other two groups of GoogleNet     Complexity and GoogleNet (improved) is higher than that of LeNet and VGG16.
In case of data sets with similar category, the data joint training also showed a close agreement to the experimental result of secondary transfer.
e data joint training and secondary transfer were effective in further improving the system performance, while introducing the transfer learning.
is provided a reference for the classification of small data sets and medical image data sets.

Conclusions
In the present study, the thyroid ultrasound image was preprocessed by the TV-based self-adaptive image restoration method.Subsequently, the CNN model was established using the corresponding loss function, learning rate, moving average, and optimization algorithm set for optimization.ree improved models, namely, the LeNet5 model, VGG16 model, and GoogLeNet model, were trained to diagnose the benign and malignant thyroid nodules.ereafter, the accuracy rate of each model in terms of diagnosing results was obtained through the tests.
Although all of the three trained models completed the recognition, to verify the best CNN model for diagnosing such ultrasound images, we collected a large amount of image data for training and testing.In the comparison studies, it was found that the GoogLeNet (Improved) exhibited the relatively highest accuracy rate in determining whether a thyroid nodule changed pathologically.
e average accuracy rate of the GoogLeNet model was up to 96.04%; furthermore, GoogLeNet (Improved) achieves classification accuracy of 97%, with a loss value of 0.3844.It explains that the GoogLeNet model can diagnose whether the patient's thyroid is in diseased state or is normal.In the end, the data joint training and secondary transfer learning were performed for the open source data sets, and the thyroid ultrasound image data was collected from the hospitals, which further improved the classification accuracy.
In the experiment in this paper, deep learning was applied to the auxiliary medical diagnosis.Our next step is to gradually optimize the model and study the improvement of model, so as to ensure a high accuracy rate of the results.
e image classification and diagnosis method based on deep learning will provide a reference to the doctors to diagnose such diseases, help them improve diagnosis efficiency and accuracy, immensely save manpower, and provide new concepts for the ultrasound diagnosis of the thyroid nodules in future.

Table 1 :
Distribution of samples in training and validation of database 1.

Table 2 :
Classification of testing samples by different models.

Table 3 :
Loss values of different CNN models.

Table 4 :
Comparison of evaluation index obtained by different methods.

Table 5 :
Time consumed by different CNN models to diagnose thyroid ultrasound images.