Diagnosis of Thyroid Nodules Based on Image Enhancement and Deep Neural Networks

The diagnosis of thyroid nodules at an early stage is a challenging task. Manual diagnosis of thyroid nodules is labor-intensive and time-consuming. Meanwhile, due to the difference of instruments and technical personnel, the original thyroid nodule ultrasound images collected are very different. In order to make better use of ultrasound image information of thyroid nodules, some image processing methods are indispensable. In this paper, we developed a method for automatic thyroid nodule classification based on image enhancement and deep neural networks. The selected image enhancement method is histogram equalization, and the neural networks have four-layer network nodes in our experiments. The dataset in this paper consists of thyroid nodule images of 508 patients. The data are divided into 80% training and 20% validation sets. A comparison result demonstrates that our method can achieve a better performance than other normal machine learning methods. The experimental results show that our method has achieved 0.901961 accuracy, 0.894737 precision, 1 recall, and 0.944444 F1-score. At the same time, we also considered the influence of network structure, activation function of network nodes, number of training iterations, and other factors on the classification results. The experimental results show that the optimal network structure is 2500-40-2-1, the optimal activation function is logistic function, and the best number of training iterations is 500.


Introduction
e incidence of thyroid cancer has been increasing the fastest among all solid malignant tumors. e relatively high incidence that continues to grow makes thyroid cancer one of the most common endocrine malignancies worldwide, currently listed as seventh most common cancer in women and fifteenth most common cancer in men [1]. e research of thyroid cancer has become a topic of widespread concern in medical community and society [2]. e diagnosis of thyroid cancer is a challenging task. Most of thyroid cancers are shown as thyroid nodules, which are frequently detected incidentally during the diagnostic imaging of the neck [3][4][5]. High-resolution ultrasound is the gold standard test for the identification of thyroid nodules.
yroid disease is generally diagnosed by visual inspection of ultrasound images.
yroid nodule imaging improves on diagnosis of thyroid diseases by providing ultrasound images. Based on thyroid nodules ultrasonographic characteristics such as internal composition, echogenicity, calcification, margins, and size, radiologists can report on the risk of thyroid nodules being malignant using a standardized scoring system [6]. When using high-resolution ultrasound, the prevalence of thyroid nodules is as high as 19-68% in a randomly selected population. Since most of the nodules are benign and the percentage of malignant nodules is relatively low (7-15%), it is very important to distinguish benign and malignant thyroid nodules [7].
Most of thyroid nodules are heterogeneous with various internal components, which confuse many radiologists and physicians with their various echo patterns in thyroid nodules ultrasonography. In order to improve the diagnosis rate and reduce the loss caused by misdiagnosis, computeraided diagnosis (CAD) has been developed to help doctors discriminate nodules from benign or malignancy. In 2007, Savelonas et al. focused on the directional patterns of the image they extracted using radon transformation and proposed a computer-aided diagnosis system based on support vector machine on the data of 66 patients, achieving maximal classification accuracy of 0.89 [8]. In 2011, Ding et al. studied the classification of thyroid nodules using the support vector machine classifier and 125 thyroid nodules consisting of 56 malignant and 69 benign patients [9]. en, in 2019, Prochazka et al. used 40 thyroid nodules (20 malignant and 20 benign) to extract several features, such as histogram parameters, fractal dimension, and mean brightness value. e authors used random forests and support vector machine to differentiate nodules into malignant and benign classes based on these features [10]. In 2020, Harshini et al. used linear discriminant analysis and support vector machine to distinguish thyroid nodules and compared the results of two methods [11]. Sathyapriya and Anitha selected the optimal features using Dynamic Mutation based Glowworm Swarm Optimization (DMGSO) algorithm and used the Long Short-Term Memory (LSTM) scheme to classify the thyroid nodules in 2020 [12]. In 2020, Ma et al. used five common machine learning methods to distinguish thyroid nodule; meanwhile, they also considered the impact of four different distance measures on the performance of the related models [13]. In 2020, Miao et al. used logistic regression to analyze the variables that significantly affected malignant nodules on the data of 811 patients with a total of 1290 pathologically confirmed nodules (506 benign and 784 malignant). e sensitivity and specificity of ultrasound thyroid imaging reporting and data system classification results for benign and malignant tumors were calculated [14]. In 2020, Ataide et al. aimed to reduce subjectivity in the current diagnostic process by using geometric and morphological (G-M) features that represent the visual characteristics of thyroid nodules to provide physicians with decision support [15].
Numerous CAD deep learning-based systems have been studied for automated thyroid detection in recent years. In 2017, Chi et al. used deep convolutional neural network to extract features from thyroid ultrasound images and sent the extracted features of the thyroid ultrasound images to a costsensitive random forest classifier to classify the images [16]. In 2019, Ouyang et al. compared the classification performance of five nonlinear and three linear machine learning algorithms for the evaluation of thyroid nodules. ey obtained that nonlinear machine learning algorithms demonstrated similar AUCs compared with linear algorithms [17]. In 2019, Zhang et al. developed a thyroid nodules diagnostic model and performed better than radiologist diagnosis based on conventional US only and based on both conventional US and real-time elastography [18]. In 2020, Wang et al. compared the performance of radiomics and deep learning based methods for the classification of thyroid nodules from ultrasound images and found that the deep learning based method can achieve a better performance than using radiomics [19]. In 2020, Song et al. proposed a hybrid multibranch convolutional neural network based on feature cropping method for feature extraction and classification of thyroid nodule ultrasound images [20]. In 2020, Liang et al. developed a multiorgan CAD system based on convolutional neural networks for classifying both thyroid and breast nodules and investigated the impact of this system on the diagnostic efficiency of different preprocessing approaches. e authors proposed four models based on convolutional neural networks and found that the convolutional neural networks model using segmented images for classification achieved the best result [21]. In 2020, Zhang et al. proposed a novel method using two combined classification modules to separate and classify thyroid nodules images and experimental results show that the method leads to excellent performance [22]. In 2020, Xie et al. designed a deep neural network to classify whether a thyroid nodule is benign or malignant and proposed a structure which combines local binary pattern with deep learning [23]. In 2020, Yuan illustrated a method which involves the combination of the deep features with the conventional features together to form a hybrid feature space. ey compared the ResNet18, a residual convolutional neural network with 18 layers, with the Res-GAN, a residual generative adversarial network [24]. In 2021, Liu et al. developed an information fusion-based joint convolutional neural network for the differential diagnosis of malignant and benign thyroid nodules. e information fusion-based joint convolutional neural network contains two branched convolutional neural networks for deep feature extraction [25]. In 2021, Vadhiraj et al. compared the support vector machine and artificial neural network classification algorithms based on their accuracy score, sensitivity, and specificity in the data of 99 thyroid nodule cases (33 benign and 66 malignant) [26]. In 2021, Li et al. designed an end-to-end thyroid nodule automatic recognition and classification system based on convolutional neural networks for CT images. e classification output accuracy reaches 0.8592 in the test set [27]. In 2022, Luong et al. used the linear, nonlinear, and nonlinear-ensemble machine learning methods to predict malignancy of indeterminate thyroid nodules and produced an accuracy of 0.791 [28]. Nair and Pande gave the analysis of different machine learning techniques used in the detection and prediction of thyroid disease in 2022 [29]. More information on ultrasonic image classification of thyroid nodules using deep learning method can be found in literature [30]. ese results based on ultrasound images have good inspiration for our research. e medical diagnosis system based on deep learning heavily depends on the amount of medical images. Sufficient data can ensure the performance of deep learning algorithm. However, thyroid nodule images data is very rare and expensive in the field of medical image processing. e lack of data has become a challenging problem in the classification of thyroid nodules using deep learning. In order to solve this problem, a deep neural networks classification model based on image enhancement technology is proposed in this paper. e introduction of image enhancement technology enables us to make full use of the limited image data information. e accuracy, precision, recall, and F1-score of the model would be important determinants of the clinical utility of the computer-aided diagnosis system. is paper uses the above four metrics as indicators to measure the performance of the classification models. e experimental results show that our proposed method achieves better performance compared with the existing thyroid nodule diagnosis systems in terms of accuracy, precision, recall, and F1-score.

Materials and Methods
Based on the previous literature and the opinions of doctors and experts, an accurate thyroid nodule detection and classification model is proposed based on deep neural networks and image enhancement.

Dataset.
A digital database of thyroid ultrasound images is established. ese thyroid ultrasound images were collected at the third hospital of Hebei Medical University in China. e dataset images were divided into two categories: benign and malignant. e classification of thyroid nodules is the result of diagnosis according to the shape, nodule composition, shape, margin, calcifications, and TI-RADS score of thyroid nodules. A total of 508 patient ultrasound examinations were present in the dataset. Among these ultrasound images, there were 415 with nodules classified as benign and 93 with nodules classified as malignant. e specific original ultrasonic images data are shown in Figure 1.

Image Analysis and Preprocessing.
Ultrasound images are collected from different doctors and equipment; image preprocessing is very necessary. ese pieces of equipment include Philips IU22, Philips HD15, and Siemens Acuson SEQUOIA 512; the size of the images also ranges from 600 × 450 pixels to 768 × 576 pixels. All images had to be resized and changed to the same distance scale. In order to make better use of the images for the diagnosis of thyroid nodules, we preprocessed the ultrasound images in three steps. e first step of our image preprocessing is image segmentation, which is a process of segregating an image into numerous segments. Segmentation is required to change the representation of an image for easy analysis without altering meaningful information. Image segmentation is also an important step to extract redundant information from ultrasonic images. Figure 2 exhibits the segmentation of an image. e second step is to resize the images. It is resized to have the same distance scale which is the physical space represented by each pixel in the image. is step is used to improve the quality of the image by removing the undesired distortion from the image. e resizing of the image is also an important guarantee for the unity of the image data. In this study, the size of the image is uniformly adjusted to 200 × 200 pixels.
Image standardization is the third step, also the final step of preprocessing. Image standardization is the process of centralizing the data by removing the mean value. According to the convex optimization theory and the data probability distribution, data centralization conforms to the law of data distribution, which makes it easier to obtain the generalization effect after training. Data standardization is one of the common methods of data preprocessing. Figure 3 shows the changes of the image before and after image standardization.
2.3. Image Enhancement. Image enhancement algorithms are often used to adjust the brightness, contrast, saturation, and hue of the image, improve the definition of the image, and reduce the image noise. e purpose of image enhancement is to obtain the useful information of the image. Image contrast enhancement is the most important in image enhancement.
ere are two methods of image contrast enhancement: direct contrast enhancement method and indirect contrast enhancement method. Histogram equalization is a common indirect contrast enhancement method. e histogram of an image represents the probability density function (PDF) value of the pixel values in the image over the entire grayscale range. If most of the pixels are concentrated in the low gray area, the image will appear dark, but if they are concentrated in the high gray area, it will appear bright. Histogram equalization is to adjust the grayscale distribution of the image to make the distribution on the grayscale of 0∼255 more balanced. Histogram equalization uses the cumulative function to adjust the gray value to elevate the contrast of the image, thus improving the visual effect of the image. Figure 4 shows the changes of the image and the histogram before and after histogram equalization.

Deep Neural Networks (DNNs).
Neural network is an extension based on the perceptron, and a DNN is a feedforward, artificial neural network that has more than one layer of hidden units between its inputs and outputs. According to the location of different layers, the neural network layers of DNN can be divided into three categories: input layer, hidden layer, and output layer, as shown in Figure 5. e first layer is the input layer, the last layer is the output layer, and the middle layer is the hidden layer. Layers are fully connected; that is, any neural unit in layer n must be connected with any neural unit in layer n + 1. Each hidden unit j typically uses the logistic function to map its total input from the layer below, x i , to the scalar state, y j , that it sends to the layer above. e logistic function is given by where x i � b j + i y i w ij , b j is the bias of unit j, i is an index over units in the layer below, and w ij is the weight on a connection to unit j from unit i in the layer in the following.
Of course, there are other activation functions here. e logistic function is the most commonly used as the activation function.
DNN can be trained by back-propagating derivatives of the cost function that measures the discrepancy between the target outputs and the actual outputs produced for each Computational Intelligence and Neuroscience training sample. For the large training sets, it is more efficient to use the strategy of "minibatch" for training deep neural networks model. at is to say, we compute the derivatives on a small, random "minibatch" rather than the whole training set, before updating the weights in proportion to the gradient.

Classification of yroid Nodules.
Image classification task of thyroid nodules includes two parts: training and testing. After preprocessing the image data, we get the data that can be used directly. In order to make better use of image information, we carry out the image enhancement.
en, 80% of the data is used as training data and the other  e classification accuracy, precision, recall, and F1-score are used as the standard to evaluate the classification model. At the same time, we also compare the classification results with the general machine learning classification algorithm, for example, K-nearest neighbors, decision tree, naive Bayesian model, support vector machine, logistic regression, and reinforcement learning. e process of thyroid nodule processing and classification based on deep neural networks is shown in Figure 6. the performance with those of the general machine learning classification algorithms. At the same time, we also consider the effects of different network structures, the number of iterations, and network activation functions on classification performance. In this paper, our focus is the accuracy of the model in building computer-aided diagnostics that could classify thyroid nodules as benign or malignant. However, it is also essential to check the performance and efficiency of the model. In order to evaluate the performance of the model more comprehensively, we consider the four following metrics: accuracy, precision, recall, and F1-score.
(1) Accuracy: (2) Precision: (3) Recall: (4) F1-score: where TP is True Positive, FP is False Positive, TN is True negative, and FN is False Negative. Accuracy is one of the most important indexes in the performance metrics. Accuracy refers to how well the model can classify thyroid nodules correctly. Accuracy is the proportion of correctly predicted cases in the total cases. Precision is the proportion of positive cases determined by the classifier. Recall is the proportion of predicted positive cases in the total positive cases. F1-score is the harmonic average of precision and recall. Precision and recall have the same weight in F1-score.

Accuracy-Based Deep Neural Networks.
In this study, we construct a thyroid nodule classification model based on deep neural networks. Classification of thyroid nodules using deep neural networks at the best setting reaches accuracy of 0.901961. Precision and F1-score also reach 0.894737 and 0.944444; in particular, recall reaches 1. e best network has four layers in our experiment, the first layer is the input layer, the last layer is the output layer, and the two middle layers are the hidden layers; that is, there are two hidden layers. We scale the image with 200 × 200 pixels to an image with 50 × 50 pixels and transform an image with 50 × 50 pixels into a 2500-dimensional vector for deep neural networks. erefore, the number of nodes of the input layer is 2500. e node numbers of the two hidden layers are 40 and 2. Because it is a binary classification problem, the output layer has only one node. e activation function of the network node is the logistic function. e maximum number of iterations during model training is set to 500. Due to the difference between image acquisition channels and doctors, the image quality is relatively low. In particular, the recognition of some images is very difficult, and even manual recognition is difficult. In this case, the accuracy of deep neural networks classification model is satisfactory.

Influence of Different Neural Networks.
In this section, we consider the influence of different neural networks. We first consider the influence of network structure on Computational Intelligence and Neuroscience classification performance. For convenience, 2500-200-100-1 network denotes a four-layer network. 2500 is the number of network nodes in the first layer, that is, the number of network nodes in the input layer. 200 and 100 are the numbers of nodes in the second layer and the third layer; these two layers are hidden layers. 1 is the number of the output layers (the last layer). Table 1 gives the performance comparison of DNN with different network structures. e network has more layers and network nodes, which means that the network will need more training time. From our experimental results, it is not true that the more complex the network structure, the better the performance of the model. It requires us to make more attempts to find the optimal network structure. We mainly consider three-layer, fourlayer, and five-layer network structures. Six network structures are studied for each type of networks. We can find that the optimal network structure is four layers, 2500-40-2-1. Although the accuracy of a three-layer network (2500-100-1) also reaches 0.901961, the recall rate and F1-score are lower than those of the 2500-40-2-1 network. e recall and F1-score of the 2500-40-2-1 network are the highest in our experimental network structures; in particular, the recall reaches 1.
Next we discuss the influence of different activation functions. In order to consider only the influence of activation function, we set the same network structure, the number of iterations, and the learning rate. e network is the four-layer network (2500-40-2-1). e number of iterations and learning rate are set to 500 and 0.001, respectively. Logistic, tanh, and identity functions are considered. Logistic function is also sigmoid function. e formula of logistic function is seen in (1). e formulas of tanh and identity functions are as follows: e performance comparison of DNN with different activation functions is shown in Table 2. It is obvious that logistic activation function has the best performance. e four performance metrics of logistic function are all higher than tanh and identity activation functions. e performance of tanh activation function is slightly higher than that of identity function.
Finally, we consider the influence of the number of iterations during training. In the process of neural network model training, we need to iterate many times in order to achieve better training effect. e selection of iteration times is very important for model training. Too many iterations easily leads to a large amount of redundant training time and overfitting; on the contrary, too few iterations leads to poor training effect. e performance comparison of DNN with different maximum number of iterations is given in Table 3. It is easy to find that the overall classification performance of the model improves with the increase of the number of iterations. Although in the process of improvement one or two indicators will decline, when the number of iterations reaches 500, the performance of the model is the best. e classification accuracy reaches 0.901961, and the precision, recall, and F1-score reach 0.894737, 1, and 0.944444, respectively. After 500 iterations, even if the number of iterations increases, the classification performance remains unchanged. On the contrary, it will cause a violent increase in training time.

Comparison with Other Methods.
In this section, we compare the performance of deep neural networks with those of other common machine learning classification methods. Table 4 lists the comparison of deep neural networks and other six methods: K-nearest neighbors, decision tree, naive Bayesian model, support vector machine, logistic regression, and reinforcement learning. In the six methods, we choose the best results of each method for comparison. In K-nearest neighbors, we choose K � 3, which is also the best classification performance of K-nearest neighbors. In naive Bayesian method, we consider the Gaussian model, which is the most popular naive Bayesian model. Linear kernel function is used in support vector machine. We consider two reinforcement learning methods: hard voting and soft voting.
e performance of DNN is obviously the best among all methods.
e accuracy of DNN reaches 0.901961. e precision, recall, and F1-score reach 0.894737, 1, and 0.944444, respectively. All four performance metrics of deep neural networks are the highest in our experimental methods. In other six methods, reinforcement learning (soft voting), K-nearest neighbors, and support vector machine have better classification performance. e recall of soft voting reaches 1; however, the accuracy, precision, and F1score performance metrics are significantly lower than those of deep neural networks. e accuracy of K-nearest neighbors is closest to that of deep neural networks; the accuracy is 0.852941. e naive Bayesian model has the worst classification performance.

Influence of Image Enhancement.
e image enhancement method used in this paper is histogram equalization. Histogram equalization is to adjust the grayscale distribution of the image to make the distribution on the gray more balanced. Equalization helps to extend the image histogram. After equalization, the gray level range of the image is wider and the contrast of the image is effectively enhanced. Table 5 shows the performance comparison of deep neural networks with and without image enhancement. We can easily find the influence of image enhancement on classification performance. After image enhancement, all four performance metrics of deep neural networks have been improved greatly. e accuracy increases from 0.715686 to 0.901961. Other classification algorithms have similar results. e classification performance of the model is improved with different degree after image enhancement. 8 Computational Intelligence and Neuroscience

Conclusions
In this paper, we proposed a thyroid nodule diagnosis method based on deep neural networks and image enhancement. Experimental results show that our method has superior performance as compared to other common classification algorithms. Because the experimental data are collected by different doctors from the hospital, some images are very blurred. In order to maintain the integrity of the data, we did not abandon the blurred images. erefore, the classification performance of the classification model is relatively low in our experimental data. e classification accuracy still reaches 0.901961; in particular, the recall reaches 1. At the same time, we considered the influence of network structure on the classification results. We found that it is very difficult to find the optimal network structure. e determination of the optimal network structure is related not only to the network itself but also to the data to be processed. At present, we can only determine the optimal network structure through the experimental results. e experimental results show that the optimal network structure is 2500-40-2-1 in this paper. To determine the optimal network structure from the perspective of theoretical analysis is our next work.
We also considered the influence of activation function and the number of training iterations. e optimal activation function is logistic function and the best number of iterations is 500. In the last part of this paper, we considered the influence of image enhancement on classification performance. e accuracy before image enhancement is 0.715686 and the accuracy after image enhancement reaches 0.901961. e precision, recall, and F1-score have been improved after image enhancement.
Data Availability e datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.