Explainable Transfer Learning-Based Deep Learning Model for Pelvis Fracture Detection

Pelvis fracture detection is vital for diagnosing patients and making treatment decisions for traumatic pelvis injuries. Computer-aided diagnostic approaches have recently become popular for assisting doctors in disease diagnosis, making their conclusions more trustworthy and error-free. Inspecting X-ray images with fractures needs a lot of time from experienced physicians. However


Introduction
Te pelvis is a complex and functionally informative bone that contributes directly to human movement and child delivery [1]. Te human pelvis is located in the lower abdomen, between the spine and the lower limbs [2]. It comprises the right and left innominate bones, the sacrum, and the coccyx. Te innominate bones consist of the pubis, ischium, and ilium. Te sacrum and coccyx are part of the axial skeleton and are variably fused vertebrae [3][4][5][6], as shown in Figure 1.
Managing patients with pelvis fractures is one of the most complex aspects of trauma care, which occurs in 3% of skeletal injuries [8]. Te function of the pelvis is as follows: (1) to protect and support the abdominal and pelvis organs [6], (2) to provide attachment points for muscles, (3) to transmit weight from the upper body to the lower limbs [3], (4) locomotion, and (5) childbirth. As a result, pelvis has great clinical signifcance to humans [1].
Pelvis fractures, such as osteoporosis, can occur mainly in motor vehicle accidents, sports, or after minor falls in people with fragile bones. Often, pelvis fractures are associated with other severe injuries, which can lead to acute bleeding and damage to the surrounding internal organs and soft tissues [5,9]. Pelvis fractures are considered a major cause of mortality. According to a study of patients with pelvis fractures in the United States [10], pelvis and abdominal bleeding are mortality's main causes in the frst six hours. In addition, the mortality rate in injured patients with pelvis fractures is 5-20% in all emergency cases [11]. Given these factors, diagnosing pelvis injuries should be done urgently without any delay. X-ray imaging is the most common routine and cheap modality used in emergency units for the early detection of injuries. X-ray imaging should be carefully evaluated to detect any fractures in the pelvis [12]. Inspecting X-ray images with fractures needs a lot of time from experienced physicians. However, there is a lack of experienced radiologists in many hospitals to deal with these images [13]. AI-based systems are widely used to help radiologists and physicians detect fractures. In a recent study, Rainey et al. [14] showed that using an AI-based system reduces 20% of the time radiologists spend reviewing medical images. Terefore, building an AI model to support physicians in interpreting pelvis X-ray images can reduce radiologists' stress, decrease errors, and improve patient care. Despite advances in AI, very limited methods have been proposed for detecting pelvis fractures.
Kitamura [15] used a deep learning technique to identify pelvis fractures on X-ray images, where the results were 0.70 and 0.85 for the posterior pelvis and acetabular categories, respectively. Yamamoto et al. [16] proposed a method for detecting pelvis fractures using 3D CNN on CT images. Te test data's accuracy, specifcity, recall, and precision are 69.5%, 77.7%, 56.4%, and 61.1%, respectively. Ukai et al. [17] used DCNNs to detect pelvis fractures in CT images, where the AUC was 0.824 with 0.805 recall and 0.907 precision. Te AUC with a single orientation was 0.652.
Deep learning has recently become a potential approach for feature extraction from input images using various models. Several neuron layers are utilized to create various layers that extract small information from input images while combining the features of earlier layers. Convolutional neural networks, or CNNs, are these models. We aim to support physicians in diagnosing pelvis injuries, especially in emergencies. Additionally, deep learning models cannot be used for high-risk judgments such as automated pelvis fracture diagnosis because of their black-box nature. An explainable artifcial intelligence (XAI) framework with deep learning models is necessary to support deep learning models. A set of procedures and methods known as XAI makes it possible for diferent users to comprehend and have faith in the outcomes and output produced by machine learning algorithms. XAI could be pre-or post-hoc. Additionally, "model-agnostic" refers to a class of explainers with broad use that are not explicitly created for a particular ML technique.
Te main contributions of this paper can be summarized as follows: (1) Building a new deep learning model based on ResNet50 for detecting pelvis fractures (2) An XAI model is created using the Grad-CAM framework to explain why a deep learning model predicts pelvic fractures and improves model accuracy, which can raise user confdence and boost the diagnostic system's safety (3) Validating and evaluating the performance of the proposed model on real-case X-ray images.
Tis paper is organized as follows: Section 2 provides the methods used for pelvis fracture detection. Section 3 presents the proposed algorithm. Section 4 discusses the results obtained and case study. Section 5 presents the conclusion and future work.

Methods
Deep learning-based methods are widely used in medical computer-aided diagnosis systems [18]. ResNet [19], Inception [20], Exception [21], and EfcientNet [22] networks have gained popularity in classifying medical images. Transfer learning improves classifcation capabilities, especially with small-size datasets [23,24]. Transfer learning is utilizing a previously learned model to solve a new problem. Furthermore, deep learning models' problematic "blackbox" nature necessitates the development of AI that can be explained (XAI). Te neural network is known for its categorization task for users and subject-matter experts to examine the many elements.
Additionally, we provide an XAI framework for the pelvis fracture classifcation problem in the current study employing class activation maps. Ensuring the neural network has acquired the correct characteristics of the many illnesses considered rather than certain local noises in the dataset is crucial. When tested with pelvis fracture other than those present in the dataset, the neural network would fatally misidentify some cases due to its erroneous learning of the local noises in the dataset.
Tus, the suggested XAI framework verifes that the neural network has learned the correct characteristics and increases confdence in its predictions. AlexNet, GoogleNet, and ResNet50 are black-box deep learning models. Te transfer learning was applied to these models, and the XAI was used to introduce the trusted model for medical purposes [25,26].

AlexNet.
AlexNet was the frst convolutional network developed by Krizhevsky [27]. AlexNet contains several layers, such as fve convolutional layers, two normalization layers, three max-pooling layers, two fully connected layers, and a SoftMax layer. Te concept of spatial correlation in an image frame was investigated using convolutional layers and receptive felds. To increase performance, a GPU was used.

ResNet50
. ResNet is also defned as residual mapping.
Tere are 48 convolution layers, 1 max pool layer, and 1 average pool layer in a ResNet model version called ResNet50 [19]. Shortcut connections are used in ResNet's architecture to address the vanishing gradient issue, as shown in Figure 2.
A residual block, used repeatedly throughout the network, serves as the fundamental ResNet building block. Te network learns the mapping from x F(x) + G(x), as opposed to x ⟶ F(x) alone. Te function G(x) � x is an identity function, and the shortcut connection is known as an identity connection when the dimensions of the input x and output F(x) are the same. Since it is simpler to zero out the weights in the intermediate layer during training than to push them to one, identical mapping is learned by doing so. In ResNet, two mapping types were taken into consideration. Te input x is padded with zeros to make the dimension match that of F(x), which is the frst nontrainable mapping (padding). Trainable Mapping (Conv Layer) is the second way, while G(x) is mapped from x using the 1 × 1 Conv Layer. Te spatial dimensions are maintained or decreased throughout the network, the depth is maintained or doubled, and the product of width and depth after each convolutional layer is maintained.

GoogleNet.
Google's research team proposed Google-Net, also known as Inception V1 [20]. Te goal behind the GoogleNet architecture is to have flters of various sizes that may function at the same level. Te network gets bigger rather than deeper with this concept. Each inception module can capture diferent levels of salient features. Te 5 × 5 conv layer captures global features, but the 3 × 3 conv layer is more likely to capture scattered (distributed) features. Te max-pooling operation captures the low-level features that are distinctive in a neighborhood. All these features are retrieved and concatenated at a certain level before being passed to the following layer.

XAI-Based Methods.
Despite the challenge of identifying which features of a model's input drive its decisions, deep neural networks (DNNs) are an essential machine learning technique. Such diagnosis is crucial in various real-world areas, from law enforcement to healthcare, to ensure that appropriate factors for the usage environment infuence DNN decisions. As a result, research on the methods and studies that explain a DNN's judgments have grown into a vibrant and expansive feld. Competing defnitions of what it means to "explain" a DNN's activities and to evaluate an approach's "ability to explain" add to the feld's complexity purpose [26].
In deep neural networks, gradients are vectors whose magnitude is the partial derivative of the function f(x) and points in the direction of that function's greatest rate of increase. Grad-CAM uses class specifcs to produce localization maps of the signifcant regions of the image based on this information that fows through a generic convolutional network. By displaying visualizations that support output predictions, Grad-CAM makes black box models more transparent. In other meaning, Grad-CAM combines class discriminative capabilities with pixel-space gradient visualization. Grad-CAM can be used with a wide range of CNN architectures, including structured output, multimodel output CNNs, and fully connected layers, such as the AlexNet, ResNet, GoogleNet, VGGNet, and reinforcement learning. So, we used Grad-CAM to explain and visualize the ability of the proposed method to localize the signifcant region.

The Proposed Algorithm
In this research, feature extraction is carried out using the GoogleNet, ResNet50, and AlexNet networks. Te ImageNet dataset is used to train these networks. Te network layers' flters are used to identify input features, such as colors and shapes.
Te pre-trained network is then used to classify various pelvis in a new dataset into fractions and normal. Except for the fnal three layers (fully connected layer (FCL), SoftMax (SM), and classifcation), the training parameters from the original pre-trained model are frozen [28].
Te network's recently added layers are then trained using the images from the new dataset. In addition, these layers are integrated with the previously trained layers in the pretrained network to classify the new classifcation classes. Terefore, there are not many newly trained dense layers.
As a result, compared to CNN training from scratch, the training process may be established relatively quickly, and very little training data are required. Te new FCL, SM, and classifcation output layers are subsequently trained using the extracted features [29].
Te stochastic gradient-descent method with momentum (SGDM), essentially an enhanced form of SGD with fxed learning parameters, is used for fne-tuning. Te SGDM aims to boost velocity across all dimensions, even those with International Journal of Intelligent Systems constant gradients [30,31]. All these experiments use the same hyperparameter setting. Figure 3 shows the transfer learning process of GoogleNet, AlexNet, and ResNet50.

Grad-CAM-Based Method for XAI.
Using class activation maps, we create an XAI framework for the pelvis classifcation problem. Employing the gradient-weighted class activation mapping (Grad-CAM) approach to validate that the proper input pelvic segments are becoming activated while classifying them to their related label. When the network net analyses the classifcation score for the class indicated by the label, Grad-CAM delivers the gradientweighted class activation mapping of the change in the classifcation score of an image X. We use this function to validate that your network is focusing on the appropriate areas of a picture and to explain network predictions. Te Grad-CAM interpretability technique uses the gradients of the classifcation score concerning the fnished convolutional feature map. Te portions of an image that have a signifcant value on the Grad-CAM map have the most efect on the network score for that class.

Experimental Results
An IBM-compatible computer with a Core i7 CPU, 16 GB of DDRAM, and an NVIDIA GeForce MX150 graphics card was used for the research. Te application was executed on a MATLAB 2022 (x64-bit). Te performance of three distinct transfer learning models, AlexNet, GoogleNet, and RestNet50 was compared to the dataset. Te experimental fndings and analysis of our models' use of Kaggle-sourced data are presented in this section.
A batch size of 32 models was trained across 10 epochs. Training accuracy, training error, validation error, and validation error were calculated for each epoch. We used a categorical cross entropy loss function and a stochastic gradient-descent technique with a momentum (SGDM) optimizer with a learning rate (LR) of 0.001. A learn rate drop factor approach using was utilized by LR to speed up and bring the optimizer closer to the global minimum.
We dynamically reduced the LR every four epochs based on the validation accuracy to maintain the beneft of a high LR's faster computation time. If the validation loss did not decrease after four epochs and the data was shufed between each epoch, we decided to cut the LR by 0.1 using the "LearnRateDropFactor" function. [32]. Te dataset's name in Kaggle is "ChestPelvisCSpineScans." It contains 876 images and 501 MB in size. Te images are organized into two groups. Te frst group includes 404 normal images (Figure 4). Te second group includes 472 pelvis fracture images ( Figure 5).

Results and Discussion.
Te performance of the proposed method was computed using quantitative and qualitative. Accuracy, sensitivity, specifcity, and precision were computed as quantitative measures, while the ROC curve was used as a qualitative measure [33]. Values of false positives (f p ), false negative (f n ), true positive (t p ), and true negative (t n ) are used to compute the following measures from the confusion matrix:   International Journal of Intelligent Systems  International Journal of Intelligent Systems 5 Accuracy � t p + t n t p + f p + f n + t n , Specificity � t n f p + t n , (1) In the frst experiment, we removed the last three layers from AlexNet and added new ones for classifying the pelvis into fracture and normal. We resized all images in the dataset to 227 × 227 × 3 to match the width and height of the AlexNet input layer. Te dataset is divided into 70%, 15%, and 15% for training, validating, and testing the refned AlexNet.
Due to the class imbalance, each class's performance metrics are calculated separately. Te average of these measurements is then determined. Figure 6 shows the confusion matrix for training and testing the refned AlexNet using the pelvis dataset, while Table 1 provides an overview of the average accuracy, sensitivity, specifcity, and precision values.
In the second experiment, we adopted ResNet50 by removing the last three layers and adding three layers for the pelvis fracture and normal classifcation. All images' width, height, and channels have been resized to 224 × 224 × 3 to match the input layer of ResNet50. Figure 7 shows the confusion matrix for training and testing the refned AlexNet using the pelvis dataset. Because of the imbalance between the class's images, Table 1 provides an overview of the average accuracy, sensitivity, specifcity, and precision values. Figure 8 indicates the receiver operating characteristic (ROC) curve for the refned AlexNet and ResNet50. Figure 9 contains three curves for the proposed models AlexNet, ResNet50, and GoogleNet. Tese curves visualize the performance measures for the three proposed methods. Tese curves were plotted by the true positive rate (sensitivity) against the false positive rate (1-specifcity). As shown, the performance measures of AlexNet were the lowest, proving the values obtained from the confusion matrix. Te performance measures of ResNet50 enhanced more than AlexNet. Te ResNet50 curve indicates that the sensitivity and 1-specifcity are increased compared to the values obtained from the AlexNet confusion matrix. Te fnal proposal for GoogleNet obtained the best measures, as indicated in the ROC curve. Tis curve visualizes the obtained values in the confusion matrix.

XAI Framework.
In AI and machine learning, XAI is a new and developing feld. Constructing trust among humans about the choices made by artifcial intelligence models is vital. It can only be performed by making ML models' black boxes more transparent. Explainable AI frameworks are tools that attempt to explain how the model works. Tese tools generate reports about how the model works.
Deep learning networks are frequently referred to as "black boxes" because they do not provide any means of determining which component of an input to the network was responsible for the prediction made by the network or what it has learned. Tese models frequently fail spectacularly without warning or explanation when they make incorrect predictions. Class activation mapping is one method for obtaining visual explanations of the predictions made by convolutional neural networks. Mistaken, apparently nonsensical forecasts can frequently have sensible clarifcations. We utilized the class activation mapping to see if a certain part of an input image confused the network and caused it to make an inaccurate prediction. Terefore, we utilized Grad-CAM.
Te Grad-CAM method, which yields class activation maps, is used to create the XAI framework for the pelvis classifcation job [26]. Grad-CAM creates a map of weights, highlighting the key areas in the input that the CNN utilized to predict its class label. Grad-CAM leverages the gradient values fowing into the fnal convolutional layer to create these class activation maps.
Selvaraju et al. [26] describe Grad-CAM in depth. We chose a few samples of pelvis that our CNN properly identifed, and then we used Grad-CAM to obtain their class activation maps. Grad-CAM can be slightly modifed to produce explanations that indicate support for locations where the network might revise its forecast.
Terefore, removing concepts from those areas would increase the model's confdence in its forecast. Tis type of explanation is referred to as a counterfactual explanation. Concerning feature maps A of a convolutional layer, we specifcally negate the gradient of y c (score for class c). As a result, the signifcance weights ∝ c k are now [26] as follows:   International Journal of Intelligent Systems Tis procedure is shown in Figure 10. Te pelvis regions in the image that CNN used to forecast that specifc disease successfully were indicated by the average activation obtained. Te analysis is done on the average activation along the acquired pelvis segments. Creating an XAI framework for the pelvis classifcation task is important for medical purposes. Figure 11 shows that the proposed method can detect the pelvis part from the whole image. To evaluate and show how to classify the normal and fractured pelvis using the provided method.          International Journal of Intelligent Systems From the XAI of the proposed method, the proposed method can detect the pelvis part correctly and classify the new image.

Case Study.
To evaluate and prove the proposed method's ability to detect pelvis fracture, we collected the Xray images used in the experiment from the radiology center for 15 cases. It contains 15 X-ray images. Figure 12 shows anteroposterior images of a normal pelvis. Te images are considered normal due to the absence of fracture features, loss of bone continuity, fssure lines, or dislocation in the form of loss of pelvic alignment or sacroiliac joint separation.
In conclusion, all the images present a normal pelvis regarding fractures or dislocations. Figure 13 shows anteroposterior views of fracture pelvises of diferent types. Te images considered fracture due to fracture criteria in the form of complete loss of bone continuity and separation of bone ends. We used ResNet50 and GoogleNet to classify these cases. We observed that the proposed method based on ResNet50 and transfer learning fails to classify three classes of pelvis fracture. In contrast, the proposed method using GoogleNet and transfer learning fails only in one normal and fracture case.

Conclusions and Future Work
In this study, we proposed an explainable artifcial intelligence (XAI) framework for pelvis fracture detection. It has been shown that the proposed technique can be used and provides fast and accurate solutions to the detection of pelvis image (X-ray) fractures. Te proposed system aims to support physicians in diagnosing pelvis fractures, especially in emergencies where inexperienced radiologists and physicians cannot deal with these images. We used a dataset containing 876 X-ray images (472 pelvis fractures and 404 normal images) to train the model. Te results show an accuracy of 98.5%, a sensitivity of 98.5%, a specifcity of 98.5%, and a precision of 98.5%. In the future, in addition to pelvis fracture detection, a system can be developed that can perform classifcation for major fracture types of the pelvis, such as fracture of the iliac bone, fracture of the sacrum, and fracture symphysis.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.