Feasibility of Using Improved Convolutional Neural Network to Classify BI-RADS 4 Breast Lesions: Compare Deep Learning Features of the Lesion Itself and the Minimum Bounding Cube of Lesion

To determine the feasibility of using a deep learning (DL) approach to identify benign and malignant BI-RADS 4 lesions with preoperative breast DCE-MRI images and compare two 3D segmentation methods. The patients admitted from January 2014 to October 2020 were retrospectively analyzed. Breast MRI examination was performed before surgical resection or biopsy, and the masses were classified as BI-RADS 4. The first postcontrast images of DCE-MRI T1WI sequence were selected. There were two 3D segmentation methods for the lesions, one was manual segmentation along the edge of the lesion slice by slice, and the other was the minimum bounding cube of the lesion. Then, DL feature extraction was carried out; the pixel values of the image data are normalized to 0-1 range. The model was established based on the blueprint of the classic residual network ResNet50, retaining its residual module and improved 2D convolution module to 3D. At the same time, an attention mechanism was added to transform the attention mechanism module, which only fit the 2D image convolution module, into a 3D-Convolutional Block Attention Module (CBAM) to adapt to 3D-MRI. After the last CBAM, the algorithm stretches the output high-dimensional features into a one-dimensional vector and connects 2 fully connected slices, before finally setting two output results (P1, P2), which, respectively, represent the probability of benign and malignant lesions. Accuracy, sensitivity, specificity, negative predictive value, positive predictive value, the recall rate and area under the ROC curve (AUC) were used as evaluation indicators. A total of 203 patients were enrolled, with 207 mass lesions including 101 benign lesions and 106 malignant lesions. The data set was divided into the training set (
 
 n
 =
 145
 
 ), the validation set (
 
 n
 =
 22
 
 ), and the test set (
 
 n
 =
 40
 
 ) at the ratio of 7 : 1 : 2; fivefold cross-validation was performed. The mean AUC based on the minimum bounding cube of lesion and the 3D-ROI of lesion itself were 0.827 and 0.799, the accuracy was 78.54% and 74.63%, the sensitivity was 78.85% and 83.65%, the specificity was 78.22% and 65.35%, the NPV was 78.85% and 71.31%, the PPV was 78.22% and 79.52%, the recall rate was 78.85% and 83.65%, respectively. There was no statistical difference in AUC based on the lesion itself model and the minimum bounding cube model (
 
 Z
 =
 0.771
 
 , 
 
 p
 =
 0.4408
 
 ). The minimum bounding cube based on the edge of the lesion showed higher accuracy, specificity, and lower recall rate in identifying benign and malignant lesions. Based on the lesion 3D-ROI segmentation using a minimum bounding cube can more effectively reflect the information of the lesion itself and the surrounding tissues. Its DL model performs better than the lesion itself. Using the DL approach with a 3D attention mechanism based on ResNet50 to identify benign and malignant BI-RADS 4 lesions was feasible.


Introduction
Breast cancer is a serious threat to women's health and has become the world's most common cancer [1]. Early detection, early diagnosis, and early treatment can improve both survival and prognosis of breast cancer patients [2][3][4]. Greenwood et al. [5] have reported that breast MRI plays an important role in screening and assessing the extent of ductal carcinoma in situ (DCIS) and predicting the potential invasiveness. The degree of early enhancement reflects the vascular richness and blood perfusion of the lesion and can reflect the characteristics of the lesion. According to the guideline of the American College of Radiology (ACR), the possibility range of the BI-RADS 4 of malignancy is 2%-95% as defined by the breast imaging report and data system (BI-RADS) [6]. Lesions with BI-RADS 4 classification are difficult to define clearly. The signs of the lesions are overlapping and intricate. These lesions, benign or malignant, are all classified as BI-RADS 4, along with recommended invasive procedures such as needle biopsy to obtain pathological evidence [7][8][9]. Therefore, comprehensive understanding and improved evaluation methods of benign and malignant breast lesions are urgently needed to reduce invasive operations and the burden on patients.
In recent years, with the rapid development of artificial intelligence-assisted diagnosis systems, deep learning has emerged as a subfield of machine learning [10][11][12][13]. Its application in medical imaging has attracted much attention, along with its wide use in image recognition, segmentation, and analysis [14]. Several studies [15,16] have attempted to increase the number of layers of CNNs from the original 5 layers of the AlexNet network [17] to the 19 layers of the VGG network. Theoretically, a deeper network leads to better effect, but the increase in network depth will also bring additional problems that in turn cause reduced performance. The main reason for the performance reduction was gradient dispersion (vanishing gradients in backpropagation lead to weakened error signal) and gradient explosion (accumulation of large error gradients results in infinity in loss function) that were caused by the increase in the number of network layers. The residual module was proposed by Khalili and Wong [15], which could effectively solve the aforementioned problems above and has become the standard configuration of CNNs.
The CNNs learned a large number of features. Some features were not important for the final result, while some others played a key role in predicting results thus deserve more attention. Based on this theory, Woo et al. [18] proposed the Convolutional Block Attention Module (CBAM). The so-called greater attention was to give higher weight to those key features. In this study, the efficiency of feature extraction and classification of BI-RADS 4 breast lesions with two segmentation methods was compared by the DL model with a 3D attention mechanism, so as to verify the feasibility of using an improved convolutional neural network.

Materials and Methods
2.1. Study Cohort and Imaging Protocol. The patients who underwent breast MRI examinations at Nantong First Peo-ple's Hospital were retrospectively collected from January 2014 to October 2020. A total of 296 patients with breast lesions were enrolled in the study. Inclusion criteria: (1) the diameter of the lesion was greater than 1 cm, or lesions were visible to naked eyes at least two consecutive slices; (2) the image quality was high without obvious artifacts or distortion; (3) the lesions were all mass and showed irregular margins, or inhomogeneous enhancement, or ring enhancement in MRI and classified as BI-RADS 4 by the radiologist. Exclusion criteria: (1) the breast mass showed no enhancement; (2) radiotherapy/chemotherapy or invasive operations such as biopsy before breast MRI; (3) the characteristics of the lesion and the pathological diagnosis were not clear.
All MRIs in this study were acquired using a Siemens 3.0 T magnetic resonance scanner (Verio; Siemens, Erlangen, Germany) with 16-channel phased array breast-specific coil. The patients were placed in the prone position with headfirst entry; the breasts naturally hanged in the breast coil, and the nipple remained at the center of the coil. The scan sequence parameters were as follows: DCE T1-weighted axial fat suppression 3D spoiler gradient echo: TR 4.67 ms, TE 1.66 ms, flip angle 10 o , FOV 340 mm × 340 mm, slice thickness 1.2 mm, scanning of 6 phases without interval, scan time 6 min 25 s, high-pressure syringe injection of 15-20 mL contrast agent Gd-DTPA based on body weight (0.2 ml/kg) at a flow rate of 2 mL/s, and then injection of the same amount of normal saline to flush the tube. After the 25 s injection, scanning was triggered, and each phase was collected for 1 min. The first phase was nonenhancement, and phases 2-6 were enhanced. Our study focused on phase 2 images which was named DCE-MRI T1WI first postcontrast sequence.
2.2. 3D-ROI Lesion Segmentation. All DCE-MRI T1WI first postcontrast images of breast mass that meet the inclusion criteria were imported into the image processing software ITK-SNAP 3.8.0 in DICOM format, and the lesions were manually segmented by an attending physician with 8 years of experience in breast MRI diagnosis and reviewed by a chief physician with more than 10 years of experience in breast MRI diagnosis: (1) based on the ROI of the lesion itself (Figures 1 and 2), the 3D-ROI segmentation method was used to manually delineate the boundary of the lesion slice by slice along the edge of the lesion, containing cystic degeneration, necrosis, and calcification within the lesion; (2) based on the minimum bounding cube, the maximum diameter of the lesion was then projected onto 3 coordinate axes of the image to determine its coverage range of x, y, and z axes, and the bounding box of the lesion was finally obtained (Figures 3 and 4).

Lesion Feature Extraction.
There are two methods of feature extraction. One is to take the minimum bounding cube of the lesion (including the lesion and part of the peritumoral area), and the other is to take only the lesion itself and set the value of the image pixels of part of the nonlesion area to 0. The minimum bounding cube is the smallest circumscribed cube containing the lesion. In addition, before Wireless Communications and Mobile Computing inputting to the CNN, the pixel values of the image data are normalized to 0-1 range. The formula is as follows: where x represents the normalized image pixel value, X represents the original image pixel value, and X max and X min represent the maximum pixel value and the minimum pixel value of the minimum bounding cube of all lesions, respectively. In this study, a total of 207 masses were obtained, of which 106 were malignant and 101 were benign. The data

Model Establishment.
The model was established based on the blueprint of the classic residual network ResNet50 [19], retaining its residual module but changing the convolution module to a 3D convolution module. At the same time, an attention mechanism was added to transform the attention mechanism module, which only fit the 2D image convolution module, into 3D-Convolutional Block Attention Module (CBAM) to adapt to 3D-MRI, as shown in Figure 5. CBAM includes a channel attention module and a spatial attention module, which together can solve the question of which channel and which position characteristics play decisive roles in final prediction [18]. Input module, residual module, channel attention module, downsampling module, and fully connected module constitute the main modules of the network. Among them, the residual module was mainly used to extract features, the CBAM module was mainly used to give higher weight to key features, and the downsampling module was used to reduce the size of the feature map and to increase the number of channels in the feature map. Blocks are used ( Figure 5) to reflect the size change of the feature map. After the last CBAM, the algorithm stretches the output high-dimensional features into a one-dimensional vector and connects 2 fully connected slices. Lesion classification network parameters are shown in Table 1. The network uses cross-entropy cost function as the loss function and stochastic gradient descent (SGD) whose weight decay is 0.0001 and momentum is 0.9 as the optimizer. The batch size is 16. Dynamic learning rate strategy is taken during the train process. The initial learning rate is 0.1, which is considered as a big number, halved every 25 epochs of iterations. Before finally setting two output results (P1, P2), which, respectively, represent the probability of benign and malignant lesions. The lesion is classified as benign if P1 > P2. Otherwise, the lesion is classified as malignant.     "res_conv" is a residual convolution block which contains shortcut connection, and "res conv * N" means the block has N convolution blocks that share the same parameters. 3D_CBAM uses 1 × 1 × 1 convolutions to adjust the channel numbers of the current feature map.

Wireless Communications and Mobile Computing
lesions in the breast, 14 patients with incomplete examination or perfusion scan breast MRI, and 11 patients with breast lesions combined with nonmass enhancement lesions. Eventually, 203 patients were enrolled for analyses ( Table 1). The patients were 17-86 years old with an average age of 48:5 ± 13:1 years old. Among them, there was only one male patient, aged 54 years. There were 105 patients with malignant lesions with an average age of 55:5 ± 11:3 years and 98 patients with benign lesions with an average age of 41:0 ± 10:6 years old. A total of 207 masses were included in the study (Table 2).  Table 3. In comparison, the model 1 analysis achieved mean AUC of 0.799, accuracy of 74.63%, sensitivity of 83.65%, specificity of 65.35%, NPV of 71.31%, PPV of 79.52%, and recall rate of 83.65% and the model 2 analysis achieved an average AUC of 0.827, accuracy of 78.54%, sensitivity of 78.85%, specificity and PPV of 78.22%, NPV and recall rate of 78.85%. There was no statistical difference in AUC based on the lesion itself model and the minimum bounding cube model (Z = 0:771, p = 0:4408). The minimum bounding cube based on the edge of the lesion showed higher accuracy, specificity, and lower recall rate in identifying benign and malignant lesions.

Discussion
Deep learning in convolutional neural networks (CNNs) is usually based on manually or semiautomatically segmented tags to learn to recognize image features. Because breast MRI is different from MRI for abdomen and lung lesions, its position is fixed in a special breast coil and is less affected by breathing movement, leading to relatively higher reproducibility of the segmentation method for breast lesions.
However, the segmentation methods are quite different. Previous studies have mostly extracted the two-dimensional features of the lesion (2D-ROI) [20], selected the largest slice of the lesion or the most obvious slice of lesion enhancement [21], and segmented along the edge of the lesion. 2D-ROI can only represent the information covered by the current area and cannot reflect all the information of the lesion. Therefore, this will definitely affect the reliability of DL models. The use of 3D-ROI is helpful to observe the lesion's overall morphology, leading to more accurate and comprehensive reflection of the characteristics of the lesion [22]. And more weight is given to the hemodynamic characteristics of the relevant lesion in the model based on the usual imaging physicians' reading habits and the advantages of early enhanced MRI.

The Efficacy of a Deep Learning Model Based on the Minimum Bounding Cube of the Lesion in Breast Lesion
Classification. This study used two different segmentation methods for 3D-ROI of the lesion: one was based on the lesion itself, and the other one was based on the minimum bounding cube of the lesion edge. These two different segmentation methods were compared for their impact on the accuracy of the DL model. Our results revealed that the DL model based on the minimum bounding cube of the lesion edge is more accurate, with a mean AUC value of about   Wireless Communications and Mobile Computing 0.827. The reason may be that the minimum bounding cube based on the lesion edge not only contains the internal information of the lesion but also includes some tissues surrounding the lesion. Zhou et al. [23] applied 5 different input boxes (tumor alone, the smallest bounding box, and 1.2, 1.5, and 2.0 time box) in deep learning and showed that the performance of diagnosis gradually decreases as the bounding box increases. The per-lesion diagnostic accuracy was the highest when using the smallest bounding box (89%), but the tumor ROI on all slices were automatically segmented on contrastenhanced maps by using the fuzzy-C-means (FCM) clustering algorithm with 3D connected-component labeling, This study used manually segmented images as a standard for comparison, which may be more accurate. And the minimum bounding cube based on the tumor edge did not expand the box size but instead used 3D-CBAM to increase the weight of key information, in order to prevent the box containing too much information from normal tissue that dilutes the effective information in the overall box or reduces the resolution of the effective information of the image imported into the neural network.
The DL model that is based on the minimum bounding cube of postcontrast images of DCE-MRI T1WI sequence showed superiority in the test set, a mean specificity of 78.22%, which are better than those of the DL model that is based on just the lesion itself. The reason may be that the microenvironment around the tumor plays a critical role in tumor growth and aggressive tissue behavior [24,25]. 3D-CBAM was to give higher weight to those key features. The area around the tumor contains much valuable and hidden information about the disease, including survival predictors for vascular activity and lymphangiogenesis and the infiltration of lymphatic and blood vessels around the tumor, and immune response signals around the tumor for interstitial response and lymphocyte infiltration around the tumor [26]. As we have shown in a previous study [27], the peritumoral edema on T2WI images is better and appears as T2WI hyperintensity around the tumor. This sign is combined with the T2WI signal, leading to significantly increased sensitivity and specificity for the differential diagnosis of benign and malignant breast tumors, and there is a positive correlation between peritumoral edema and Ki-67 expression. These results demonstrate the importance of the tissue surrounding the tumor. However, related studies are still limited at present; thus, the information about surrounding tissues has not been captured by the artificial intelligence learning technology. Braman et al. [26] collected a total of 117 patients and extracted omics features after marking the breast tumors and surrounding areas (2.5-5 mm area around the tumor) using breast images from DCE-MRI-T1WI. Their results showed that the omics features of surrounding tissues helped to predict pCR and that combined use of tumor internal characteristics and peritumoral characteristics led to better prediction accuracy, which as a whole may help guide the personalized treatment of locally advanced breast cancer. This indicates that extracting the information contained in the tissue around the tumor has a high clinical application value.

The Diagnostic Efficacy of the Deep Learning Model
Based on First Postcontrast Images of DCE-MRI T1WI Sequence in Benign and Malignant Breast Lesions. The deep learning model that is based on the minimum bounding cube of dynamic contrast postcontrast images has high specificity in the classification of benign and malignant breast lesions. We speculate that this may be related to the early hemodynamic information of the lesion, as shown in a previous study of ours that DCE-MRI can not only reveal tumor's morphological changes but also reflect its microvascular perfusion, angiogenesis, grades, and malignancy for evaluating the effect of tumor treatment and prognosis. The degree of early enhancement reflects the abundance of blood vessels and blood perfusion of the disease [28]. Malignant lesions grow fast, have multiple large blood vessels, are immature, and have a large number of arteriovenous anastomoses.
In addition to the high accuracy in diagnostic performance of the minimum bounding cube based on the edge of the lesion, we also found that the method is relatively simple and easy to use, as it only needs to find the largest level of the three dimensions of the image through image processing software. At this level, the minimum rectangle that can cover the outermost edge of the lesion is used, and finally, the minimum bounding cube containing the lesion is generated by the computer traversal method. However, the 3D-ROI based on the lesion itself needs to be delineated slice by slice and along the edge. For nonenhancement sequence images, sometimes, the edge of the lesion is unclear, leading to the lack of local edge information of the lesion.

Conclusion
In summary, based on the segmentation method of the minimum bounding cube at the edge of the lesion, postcontrast images of DCE-MRI T1WI sequence were extracted, and a DL model was established. This model can combine the information inside the lesion and that of containing peritumoral area to improve the diagnostic efficacy for both benign and malignant breast lesions. Using the DL approach with a 3D attention mechanism based on ResNet50 to identify benign and malignant BI-RADS 4 lesions was feasible.

Limitations of This Study
This study was a small-sample single-center study, and the results obtained in this study need to be confirmed by future large-sample multicenter investigations. Only mass lesions were included in the study; thus, whether the segmentation method is equally applicable to nonmass lesions remains to be tested. The inclusion/exclusion criteria are quite stringent and exclude many of the lesions which a radiologist reading breast MRI will routinely come across. The study only used first postcontrast images of DCE-MRI T1WI for segmentation by the minimum bounding cube of the lesion, which needs to be examined to see if it fits other sequences of image segmentation. Another limitation is that this study only compared two lesion segmentation methods; thus, future 7 Wireless Communications and Mobile Computing investigation is needed to test whether other ROIs containing peritumoral area may be better.

Data Availability
All data generated or analyzed during this study are available from the corresponding author Wei Xing upon reasonable request.

Ethical Approval
The retrospective study was approved by the Ethical Review Board of Nantong First People's Hospital (No. 2020KY236) and was conducted according to the Declaration of Helsinki principles.

Consent
All patients signed informed consent.

Conflicts of Interest
The authors declare that they have no conflict of interest.