With the development of Artificial Intelligence, the auxiliary diagnosis model based on deep learning can assist doctors to a certain extent. However, the latent information in medical images, such as lesion features, is ignored in most of the traditional methods. The extraction of this information is regarded as a learning task within the network in some recent researches, but it requires a large amount of fine-labeled data, which is undoubtedly expensive. In response to the problem above, this paper proposes an Adversarial Lesion Enhancement Neural Network for Medical Image Classification (ALENN), which is used to locate and enhance the lesion information in medical images only under weakly annotated data so as to improve the accuracy of the auxiliary diagnosis model. This method is a two-stage framework, including a structure-based lesion adversarial inpainting module and a lesion enhancement classification module. The first stage is used to repair the lesion area in the images while the second stage is used to locate the lesion area and use the lesion enhanced data during modeling process. In the end, we verified the effectiveness of our method on the MURA dataset, a musculoskeletal X-ray dataset released by Stanford University. Experimental results show that our method can not only locate the lesion area but also improve the effectiveness of the auxiliary diagnosis model.
In December 2012, a study [
Bone is the tissue with the highest density in the human structure, which has a clear contrast with surrounding tissues. Meanwhile, there also exists an obvious contrast between the cortex and cancellous bone of the bone itself so that the conventional X-ray examination can be used for general bone diseases diagnosis. In addition, due to the advancement of imaging technology and the upgrading of imaging equipment, various hospitals have produced a large amount of medical imaging data, and these precious data are helpful for many researches. Therefore, it is of great research significance to use computer-aided diagnosis technology to quickly and accurately classify musculoskeletal diseases based on a large number of existing medical images.
Many machine learning methods have been applied to medical image data classification tasks, including K-Means clustering [
Based on the problems above, this paper proposes an Adversarial Lesion Enhancement Neural Network for Medical Image Classification (ALENN), which automatically recognizes the lesion area in the image only through the supervision of category annotation, and enhances and optimizes the prediction accuracy of the auxiliary diagnosis model. This method is a two-stage model. The first stage is a structure-based lesion adversarial inpainting module, which is used to repair the lesion area in the image; the second stage is a lesion enhancement classification module, which is used to identify the lesion area and apply the data after the enhanced lesion to assist the modeling process of the diagnostic model. The core of the first stage is structural information, which represents the relatively fixed semantics of human body structure in medical images. We believe that better restoration results can be obtained by splitting the image restoration process into structural semantic restoration and texture detail restoration. The core of the second stage is sliding window. Through the sliding of the occluded area, the most significant abnormal area in the image can be found. Finally, we verified the effectiveness of this method on the MURA dataset [
The related works of traditional and latest auxiliary diagnosis as well as the related researches on the MURA dataset are introduced in the second section of this paper. The method proposed in this paper is introduced in the third section. The experimental results and analysis are shown in the fourth section. The summary and prospects of the work are given in the fifth section.
CNN-based medical image analysis method has shown excellent performance in many challenging tasks (disease classification [
The MURA dataset is the largest public musculoskeletal image dataset available currently. Many scholars have conducted numerous experimental studies on this dataset, including the use of traditional machine learning methods and deep learning algorithms. Among them, Pawan et al. [
Due to the inadequate utilization of lesion information by current deep learning models based on medical image datasets (e.g., MURA) and the inexplicability of deep learning itself, there is still much room for the reliability and credibility of auxiliary diagnosis models to improvement. Aiming at the problems above, this paper proposes an Adversarial Lesion Enhancement Neural Network for Medical Image Classification (ALENN). The method includes two main modules: a structure-based lesion adversarial inpainting module and a classification module based on lesion information fusion. The overall two-stage structure diagram is shown in Figure
The overall structure of the two-stage ALENN module.
The X-ray image of the elbow reflects the density difference among the different tissues of the elbow, and the density distribution in the negative data is obviously different from that in the positive data, as shown in Figure
Data distribution in positive and negative samples.
At the same time, in order to better restore the original semantic information of the image, considering that medical images often contain relatively fixed human structural features-bones and muscles, this module extracts the structural information of the data so that the model pays more attention to the relatively fixed structure, not limited to susceptible texture information. Consequently, inspired by the research [
For the extraction of structure information, we assume that the image is composed of structure and texture and uses the relative total variation (RTV) in [
However, the authors of [
In the architecture of this paper, the role of the generative adversarial network is to provide the lesion area in the elbow X-ray image for the final classification model. We additionally assume the MURA data: the distribution difference between the positive data and the negative data in the image results from the lesion. If the lesion area in the positive image is completely occluded, the part of the image that is not occluded at this time is similar in distribution to the negative data. The structure information obtained in Section
This section follows the definitions of
In addition to training the generator to make the overall model capable to reconstruct structural information, it is also necessary to train the generator, so that the overall model has the ability to gradually repair texture information through structural information. And because this paper assumes “image = structure + texture,” after the texture information is supplemented and perfected on the basis of the structure information, it can be regarded as the overall restoration of the image. Therefore, the output of the generator
Same as the repair structure stage, the discriminator still needs to predict the authenticity of the generator output image, and the output at this time is expressed as
The GAN in this paper requires two-step training. In other words, the training model firstly repairs the structure image with complex texture semantics removed. After the repair process is smooth, the training model will repair the detailed texture information based on the structure. In the first training step, in order to fit the true distribution of the image structure
Similar to the first training process, the loss function for the repair process of training GAN on texture details is as follows:
The two stages of adversarial losses are combined through hyper-parameters
We set
This section will introduce the second stage of ALENN, which is the medical image-assisted diagnosis module based on lesion information enhancement. If there is an auxiliary diagnosis model based only on the original data modeling, this model can well complete the auxiliary diagnosis task. However, due to the inexplicability of deep learning itself, the model can easily fall into a local optimal value. In addition, there also exist certain limitations on the convergence speed of the model, which depends on the model’s initialization parameters and optimization strategies. Hence, the relative error between the positive data and the negative data and the similarity between the positive data are used to artificially amplify this distribution difference at this stage. More specifically, after the restoration of the positive data in the first stage, the corresponding pseudo-negative data is obtained, while the real negative data can still be regarded as negative data after being repaired. Based on this, we can enhance the lesion area in the data and then use it as the modeling data for the auxiliary diagnosis model, making the model sensitive to differences in distribution.
The lesion fusion assisted diagnosis proposed in this paper is an overall framework, and the optimization of the specific network structure is beyond the scope of this paper. In other words, the method proposed in this paper is applicable to most classification networks. Therefore, we select the most basic model VGG-19 as the backbone of this framework. Next, we will gradually introduce the overall process of lesion information fusion in the second stage. In order to facilitate the presentation, we define the subsequent variables. Input image
Consequently, under the supervision of category labels, we have completed the extraction of lesion information in medical images based on the difference of conditional probability distribution
The MURA dataset is the largest public dataset of musculoskeletal image currently, jointly released by the Department of Computer Science, Medicine and Radiology of Stanford University. It contains a total of
In order to better evaluate the method proposed in this paper, we use peak signal-to-noise ratio (PSNR) coefficient and structural similarity (SSIM) to evaluate the performance of the repair module. The larger the value of the two indicators, the more similar the two images. And the accuracy, sensitivity, specificity, recall, F1-score, and Kappa score are applied to evaluate the performance of the classification module.
PSNR coefficient and SSIM are the most widely used objective measurement methods for evaluating repaired images. The calculation formula is as follows:
In this formula,
In addition, if FP, FN, TP, and TN represent the false positive rate, false negative rate, true positive rate, and true negative rate, respectively, the calculation formulas for accuracy, recall, precision, F1-score, and Kappa score are as follows:
In this formula,
In order to verify the effectiveness of the first stage of the two-stage lesion enhancement classification method proposed in this paper, firstly it is necessary to qualitatively analyze the repair effect of the generated adversarial network. This section will perform visual analysis on the negative data and positive data of elbow data in MURA. Since the GAN in this paper is modeled based on negative data, in theory, GAN will approximate the distribution of negative data. On the contrary, when positive data is used as the input of GAN, GAN will repair the “abnormal” area in the positive data according to the negative data. The results of repair and visualization of negative and positive data are shown in Figure
The results of repair and visualization of negative and positive data. (a) Original image. (b) Difference. (c) LE. (d) Color map.
After qualitative analysis, we will quantitatively analyze the differences between the two data types, as shown in Table
The differences between the two data types.
PSNR | SSIM | |
---|---|---|
Negative | 34.7566 | 0.9708 |
Positive | 34.6099 | 0.9657 |
In order to prove the effectiveness and excellent adaptability of the method proposed in this paper, this section applies this method to each classification network. VGG-19, ResNet-50, DenseNet-121, and Inception-v3 are used to conduct related experiments and analyze the results. The experimental results are shown in Table
The experimental results.
Backbone | With LE | Accuracy | Precision | Recall | F1-score | Kappa |
---|---|---|---|---|---|---|
VGG-19 | 82.8 | 76.82 | 94.47 | 84.73 | 65.5 | |
85.59 | 81.11 | 93.19 | 86.73 | 71.13 | ||
ResNet-50 | 81.72 | 77.78 | 89.36 | 83.17 | 63.38 | |
83.23 | 78.97 | 91.06 | 84.59 | 66.39 | ||
DenseNet-121 | 81.51 | 78.54 | 87.23 | 82.66 | 62.96 | |
83.01 | 82.77 | 83.83 | 83.3 | 66.01 | ||
Inception-v3 | 82.15 | 78.36 | 89.36 | 83.5 | 64.24 | |
83.01 | 78.68 | 91.06 | 84.42 | 65.96 |
It can be seen that the results are unsatisfactory compared to the method using only the classification network and the LE information. For each backbone, this method can increase the accuracy by about 1.67% on average. Moreover, the method in this paper is the best one to lift VGG-19. The ROC curve of ResNet and VGG can explain this problem from another aspect, as shown in Figure
The ROC curve of ResNet and VGG.
With the gradual improvement of deep neural network technology recently, there are more methods focusing their researches on exploring the hidden information inside data. Taking computer vision as an example, many studies have bundled the task of exploring the semantic information inside the image with the optimization of the neural network structure. Among them, medical imaging has become one of the important objects of image semantic mining due to its high semantic consistency. However, the interpretability of deep learning itself has always been a topic still under studying; that is, the characteristics learned by neural networks cannot be explained intuitively. In other words, the correlation between the optimization of neural network structure and the effectiveness of semantic information mining remains to be verified. Therefore, this paper proposes an Adversarial Lesion Enhancement Neural Network for Medical Image Classification, which takes the extraction of this hidden semantics as a separate stage, separating it from the auxiliary diagnosis model. The purpose of this is to (1) clearly show the effectiveness of semantic information extraction, (2) use this method as a portable auxiliary diagnostic module with high adaptability, and (3) complete the lesion positioning with only coarse-grained labels. Finally, this paper proves the effectiveness of the first stage structure-based lesion adversarial inpainting module on the public dataset MURA. Meanwhile, on this basis, it is proved that the combined use of two-stage modules can improve the auxiliary diagnosis model. However, there is also a shortcoming in this method, that is, the relatively high time complexity of the sliding window, which is also one of the optimization directions of this research in the future.
The X-ray images used to support the findings of this study have been deposited in the MURA repository.
The authors declare that they have no conflicts of interest.