Deep Learning Algorithms for Detection and Classification of Gastrointestinal Diseases

to diagnose all the generated images in a short period and with high accuracy. The novelty of the proposed methodology lies in developing a system for diagnosis of gastrointestinal diseases. This paper introduces three networks, GoogleNet, ResNet-50, and AlexNet, which are based on deep learning and evaluates them for their potential in diagnosing a dataset of lower gastrointestinal diseases. All images are enhanced, and the noise is removed before they are inputted into the deep learning networks. The Kvasir dataset contains 5,000 images divided equally into ﬁve types of lower gastrointestinal diseases (dyed-lifted polyps, normal cecum, normal pylorus, polyps, and ulcerative colitis). In the classiﬁcation stage, pretrained convolutional neural network (CNN) models are tuned by transferring learning to perform new tasks. The softmax activation function receives the deep feature vector and classiﬁes the input images into ﬁve classes. All CNN models achieved superior results. AlexNet achieved an accuracy of 97%, sensitivity of 96.8%, speciﬁcity of 99.20%, and AUC of 99.98%.


Introduction
Cancer is the most common cause of death in the world, and gastrointestinal cancer is the most frequently occurring type.
e World Health Organization (WHO) estimates that 1.8 million people die annually from gastrointestinal diseases [1], and gastrointestinal cancer is the fourth cause of death in the world.Gastrointestinal cancer grows from gastrointestinal polyps, which are abnormal tissue growths on the mucosa of the stomach and colon.e polyps grow slowly, and symptoms only appear when they are large [2].However, polyps can be prevented and cured if detected at an early stage [3].
Video endoscopy plays an important role in increasing the early diagnosis of polyps in the gastrointestinal tract and reducing the number of mortalities [4].Endoscopy can determine the severity of ulcerative colitis by detecting mucosal patterns that include spatial differences in mucosal colour and texture (degree of roughness on the mucosal surface in gastrointestinal) [5].Hundreds of images can be extracted from a gastrointestinal video, but disease appears in only a few images, and no medical person can devote the amount of time needed to monitor all the images.Consequently, the accuracy of the diagnosis depends mainly on the experience of the doctor; experts are able to diagnose polyps in up to 27% of cases [6].erefore, during the examination, polyps may remain undetected and lead to future malignancies.
e deficiencies of the radiologist and other human factors can lead to a false diagnosis, so a computer-aided automated method would be valuable for diagnosing polyps with high accuracy and at the early stages of cancer.Artificial intelligence techniques have shown massive potential in various medical fields for helping humans to visualize disease that cannot be discovered with the naked eye [7][8][9].For example, artificial intelligence techniques can extract complex microimaged structures from endoscopy images and identify key features.Clinically, artificial intelligence techniques can distinguish between neoplastic and nonneoplastic tissues.Techniques are also available for extracting texture features to evaluate the risk of gastric cancers [10,11].Colonoscopy images have been used to classify colitis by extracting texture features [12,13].However, the challenges in extracting the features of gastrointestinal images limit the diagnostic accuracy [14].
Machine learning techniques have been used to extract colour, texture, and edge features from endoscopic images that depend on trial and error for disease diagnosis [15,16].Convolutional neural networks (CNNs) have begun to solve these feature engineering limitations, and the use of CNN in supervised learning has greatly improved medical image diagnostics [17].CNN has proved its tremendous ability to extract features by transferring engineering to the learning process [18].Deep learning algorithms have shown an outperformance of medical image diagnostics over the performance of experts [19].Hence, computer-aided diagnostics using deep learning techniques for endoscopic images have the potential to achieve diagnostic accuracy that is better than that obtained by trained specialists [20].
Karkanis et al. [21] presented a technique for extracting colour features based on wavelet decomposition for diagnosing colon polyps.Many studies have applied machine learning techniques to diagnose gastrointestinal images, with features extracted through approaches that include polyp-based local binary, grey-level co-occurrence matrices (GLCMs), wavelets, context-based features, and edge shape and valley information [22][23][24].
e system proposed by Tajbakhsh et al. [25] has achieved better performance than other methods.Challenges remain for extracting handcrafted features, such as light reflection, camera angle, and structural polyps.CNN techniques are powerful extractors of deep features, and CNN has achieved promising results in recent years in diagnosing medical images.Zhang et al. [26] introduced a polyp detection system based on a Single Shot MultiBox Detector (SSD), where they reused missing features from the max pooling layers and added them to feature maps to increase the detection accuracy and classification.Godkhindi and Gowda [27] presented a CNN system for detecting polyps from CT colonography images.eir algorithm segments the colon from the CT image and isolates it from the rest of the organs.It then diagnoses a colon polyp by extracting the shape features.Ozawa et al. [28] presented a system for detecting colorectal polyps by SSD and reported promising results in diagnosing these polyps.Pozdeev et al. [29] presented a two-stage polyp segmentation and automated classification system.e first stage uses the global features of endoscopic images to classify the presence or absence of a tumor, while the second stage includes segmentation by CNN.Wan et al. [30] have introduced biomedical optical spectroscopy techniques to detect gastrointestinal cancers in early stages.is type of spectroscopy has the potential to provide structural and chemical information and has many advantages, including noninvasiveness, a reagent-free protocol, and a nondestructive procedure.Ribeiro et al. [31] have proposed the use of CNNs for diagnostics of the colon mucosa to uncover colon polyps for early-stage colon cancer classification.
CNN extracts features by exploiting the input pixels to handle distortions caused by different light conditions.Min et al. [32] developed a computer-aided system to diagnose linked colour imaging by extracting colour features from lesions.e system classified images as adenomatous polyps and nonadenomatous polyps, and the system achieved satisfactory results.Song et al. [33] also developed a computer-aided system for diagnosing colorectal polyp histology by CNN techniques; their network classified the polyps into three types: serrated polyp, deep submucosal cancer, and benign adenoma mucosal or superficial submucosal cancer.
e main contribution of the present paper is the provision of a computer-aided detection method for lower gastrointestinal diseases with modified criteria for extracting deep shape, colour, and texture features and adapting them to a learning transfer technique for fine-tuned and contoured transfers.Extensive experiments were conducted to select pretrained models to diagnose lower gastrointestinal diseases.New models were developed for transferring features extracted from a nonmedical deep learning dataset and adapting them to the new dataset.
e remainder of the paper is organised as follows.Section 2 describes the background and motivations.Section 3 discusses the materials and methods used to diagnose a dataset.Section 4 presents the analysis and results of the study that have been achieved and compares the proposed systems with those from previous studies.Finally, the conclusions are presented in Section 5.

Background and Motivations
is section provides the fundamentals of gastrointestinal diseases and an overview of deep learning for diagnosing medical images.

Overview and Status of Gastrointestinal Diseases.
Computer-aided early detection of disease is an important research field that can improve healthcare systems and medical practice around the world.e Kvasir dataset, which contains gastrointestinal images, is classified into three clinically important findings, three significant anatomical landmarks, and two categories of endoscopic polyp removal.
e gastrointestinal tract is affected by many diseases, with 2.8 million new cases and 1.8 million deaths caused annually by oesophageal and stomach cancers.e gold standard for gastrointestinal examination is endoscopy.
e upper gastrointestinal examination, involving the stomach, oesophagus, and upper part of the small intestine is done by gastroscopy, while the colon and rectum are examined by a colonoscopy.Both of these examinations are done as real-time videos with high resolution.Endoscopy equipment is expensive and requires extensive experience and training.Endoscopic detection and removal of lesions in their early stages, followed by appropriate treatment, are important for preventing colorectal cancer.Doctors vary in their abilities to detect colorectal cancer, and this may affect colorectal cancer diagnosis if a doctor's ability to evaluate the images is limited.An accurate diagnosis of the type of disease is also important for treatment and follow-up.erefore, automatic diagnostics would be very welcome.Automatic diagnosis of pathological findings could contribute to the evaluation and identification of gastrointestinal cancers, thereby improving the efficiency and use of medical resources.

Deep Learning.
CNNs are computational systems designed for the purpose of pattern recognition.CNN has entered into a number of fields, including healthcare [34], and has an important role in the diagnostics of images obtained in early disease stages.Image recognition and diagnostic accuracy are two tasks in which CNN excels compared to human experts.CNN has three types of layers: convolutional layers, pooling layers, and fully connected layers [35].CNN has more ability than traditional networks and other networks such as RNN to deal with images because it uses a combination of technologies with these layers [36,37].e basic idea underlying CNN is the use of twodimensional images and application of two-dimensional filters, in addition to a learning transfer technique where models are trained using the best pretrained models and the last three layers are replaced to learn the weights of the problem to be solved.CNN features are extracted from the dataset that they are trained on, so experts do not need to manually extract the features [38].CNN's strength comes from its ability to learn the representative features in its training dataset.Convolutional layers work like the human brain does in feedback, where each layer acts as feedback to the next layer and the process continues until precise features are obtained.

GoogleNet.
GoogleNet, developed by Google researchers, is a model of CNNs, sometimes called Inception V1; it consists of 22 layers (27 layers including the pooling layers).e GoogleNet architecture was the winner in the classification challenge for images at the ILSVRC 2014.GoogleNet is used in many fields, including computer vision tasks, as well as in medical image classification.Figure 1 illustrates the architecture of the GoogleNet used to classify 5,000 images into five diseases from the lower digestive system.e GoogleNet architecture consists of 27 layers, including layers that do not contain parameters.ey are divided into the input layer, in which the images are inputted into an RGB system with a size of 224 × 224 pixels.e first convolutional layer contains two 7 × 7 filters; these are among the largest filters compared to the other layers.is layer reduces the size of the input image, followed by the max pool layer with 3 × 3 filters, a convolutional layer with a 3 × 3 filter, and then the max pool layer with 3 × 3 filters.e output is inputted into a two-layer block inception module, followed by a max pool layer with 3 × 3 filters and then another four-layer block for an inception module.is is followed by a max pool layer with 3 × 3 filters, a two-layer block inception module, and then by a max pool layer with 3 × 3 filters.e average pooling layer has a size of 7 × 7 pixels.Stride is used to determine the amount of filter shift on the input image.
e Dropout technique uses this to prevent overfitting.In our work, Dropout was set at 40%, which means that the neurons are stopped by 40% in each iteration, and different parameters are used in each iteration.
e fully connected layer received 9216 features and produced 4,096 features.e softmax layer produces five classes: dyed-lifted polyps, normal cecum, normal pylorus, polyps, and ulcerative colitis.

ResNet-50.
ResNet-50 is a residual CNN model consisting of 177 layers.ResNet-50 was the winner in the image classification challenge in 2015.ResNet-50 is the backbone of many computer vision tasks.Figure 2 illustrates the architecture of the ResNet-50 used to classify 5,000 images divided into five diseases from the lower digestive system.e ResNet-50 architecture consists of 16 blocks that contain 177 layers divided into the input layer that inputs RGB images with a size of 224 × 224 pixels and 49 convolutional layers, which use different types of filters [39].e convolutional layer extracts deep features from the input images and stores them in deep feature vector maps, with one pooling layer for both average and max.ese two layers reduce the feature vector map dimensions.Batch normalisation then helps the network to choose the learning rate correctly.e Rectified Linear Activation function (ReLU) that follows the convolutional layers only passes positive outputs and converts the negative values to zero.e fully connected layer receives 9216 features and produces 4096 features, and the second connected layer produces 1000 features.e softmax layer produces the five classes: dyedlifted polyps, normal cecum, normal pylorus, polyps, and ulcerative colitis.

Materials and Methods
e computer-aided automatic detection of gastrointestinal diseases is an important research field.In this section, we describe the GoogleNet, ResNet-50, and AlexNet models of the CNN for early and accurate diagnosis of lower gastrointestinal disease.e general structure of the gastrointestinal detection system used in this work is shown in Figure 4. Preprocessing improves images and removes noise and artifacts, while the image augmentation technology improves training process.e convolutional layers extract the deepest and most important features from each image.e fully connected layers diagnose and classify the gastrointestinal images.

Dataset.
e dataset was collected from the Vestre Viken Health Trust (VV) in Norway from the gastroenterology department at the Baerum Hospital using endoscopic equipment.All images were described by experts from VV and the Cancer Registry of Norway (CRN).e CRN is the national body at the Oslo University Hospital that is in charge of screening and early detection of cancer to prevent spread.e Kvasir dataset consists of interpreted images by experts, including classes containing endoscopic procedures in the gastrointestinal tract and anatomical landmarks.e dataset contains hundreds of images that are sufficient for use in deep learning and transfer learning.e dataset is in RGB colour space and consists of images in resolution from 720 × 576 up to 1920 × 1072 pixels.In our work, the dataset contains 5,000 images equally divided into five diseases: dyed-lifted polyps, normal cecum, normal pylorus, polyps, and ulcerative colitis.Figure 5 shows samples from the Kvasir dataset.e data are available in this link: https://datasets.simula.no/kvasir/#download.

Preprocessing and Augmentation
Techniques.Noise and artefacts arise from light reflections, photographic angles, and the mucous membranes surrounding the internal organs and reduce the performance of CNN due to the increased complexities of feature extraction.erefore, optimisation processes have been of interest to researchers to improve image quality.In this paper, gastrointestinal  4 Complexity images were preprocessed before being inputted into the CNN models.First, the image was scaled for colour constancy, and the image sizes were changed to 244 × 244 pixels for both GoogleNet and ResNet-50 models and to 227 × 227 pixels for the AlexNet model.e mean for the three RGB channels was then calculated for the gastrointestinal images.Finally, the enhancement process was conducted through the average filter, which calculates the average for each pixel with its neighbours and replaces it; this process continues for all pixels of the image [41,42].CNN techniques depend mainly on the volume of data.A larger set of training data generates more promising results for the model.Because of the lack of medical images, data augmentation techniques improve CNN models for  Complexity accurate classification [37,43].
e data augmentation technology also works to balance the dataset when the number of images differs between classes.In this paper, images of the training data were augmented through the operations of flipping, zooming, shifting, and ±rotation [44].

Convolutional Layers.
e gastrointestinal dataset contains many features, such as shape, texture, and colour.
e manual extraction of features requires substantial experience, especially when extracting images from a video, where many images do not include the disease and the disease features appear in a few images that may be missed by the radiologist and the specialists.CNN algorithms work by extracting representative features of each disease through convolutional layers.GoogleNet contains many convolutional layers and nine inception layers, ResNet-50 has 49 convolutional layers, and AlexNet has five convolutional layers.ese layers apply a set of filters and adjust the weights during the training phase to address the deep features and pass them on to the next layer.e average and max pooling layers also reduce the size of the feature maps and represent a group of pixels either by means of the average or the max value between the groups of pixels.e convolutional layers extract representative features of each image for a total of 9216 features per image and represent them in feature maps to feed them to the classification layers.[45].In our study, transfer learning and fine-tuning were applied to GoogleNet, ResNet-50, and AlexNet networks pretrained to the ImageNet dataset [46].Transfer learning is based on training a dataset to solve a specific problem and then transferring that learning to solving another related dataset problem [47,48].Transfer learning works by choosing the pretrained model and the size of the problem and using what has been learned to transfer the generalisation to another task.Transfer learning also avoids overfitting.In this work, transfer learning was applied to GoogleNet, ResNet-50, and AlexNet, where the weights were fine-tuned.e GoogleNet, ResNet-50, and AlexNet models were trained on the ImageNet dataset, and the learning was then transferred to the gastrointestinal dataset.e last three layers of the patterns were deleted and replaced with a fully connected layer.e first connected layer received 9,216 neurons and outputted 4,096 neurons, while the second connected layer received 4,096 neurons and outputted those.e softmax layer produced the five classes: dye-lifted polyps, normal cecum, normal pylorus, polyps, and ulcerative colitis.

Optimizers (Adam).
Optimizers are used to change and tune parameters of neural networks such as weights, biases, and learning rate to reduce loss.Optimizer methods are considered to improve the deep learning classifier, which helps to speed up the performance of models.Adaptive Moment Estimation (Adam) is one of the best deep learning optimizers.Adam is a compilation of both RMSProp and momentum [49].
e adaptive learning rate for each parameter is calculated by Adam.It keeps average past gradient like momentum m t , and it keeps squared gradients by storing past decaying average v t .e following equation describes how Adam works to tune parameters, learning rate, etc.
where m t refers to the first moment in the gradient, v t refers to the second moment in the gradient, and β 1 and β 2 indicate the decay rate.

Experimental Results
Weights and parameters were adjusted for GoogleNet, ResNet-50, and AlexNet CNNs in the training phase to evaluate the dataset of gastrointestinal diseases.Table 1 shows the training options for the three networks and the execution time in the MATLAB environment.e resources are Core i5 Gen 6 with 4G NVIDA GPU.In this paper, three experiments were conducted to evaluate a gastroenterology dataset containing 5,000 images divided equally among five diseases.e same dataset was applied in all three experiments.e dataset was divided into 80% for training and 20% for selection and validation.Figure 6 shows the confusion matrix and AUC obtained from GoogleNet, ResNet-50, and AlexNet.e confusion matrix reviews all test images that are correctly classified (true negative (TN) and true positive (TP)) and incorrectly classified (false positive (FP) and false negative (FN)).e AUC also shows the ratio of TP vs. FP.Table 2 and Figure 6 show the evaluation of the dataset for three CNN models.Accuracy, sensitivity, specificity, and AUC are calculated according to equations ( 3)- (6).All networks showed promising results, as indicated in Table 2.
TP represents the number of positive samples correctly classified.TN represents the number of negative samples correctly classified.FP represents the number of benign samples classified as malignant.FN represents the number of malignant samples classified as benign.
Table 3 and Figure 7 show the classification performance of GoogleNet, ResNet-  8 Complexity while GoogleNet achieved the best performance for classifying ulcerative colitis (96%).e proposed CNNs were evaluated through several measures previously examined in the literature, as shown in Table 4.All relevant literature reported accuracy between 70.40% and 90.20%, while our proposed system reached an accuracy of 97%.Related previous studies achieved sensitivity between 70.40% and 95.16%, while our proposed system achieved a sensitivity of 96.80%.e specificity in previous studies ranged between 70.90% and 93%, while our

Complexity
proposed system reached a specificity of 99%.e proposed system outperformed all previous studies with regard to AUC, as our proposed system reached an AUC of 99.99%.e comparison of the proposed system against the existing models is presented in Figure 7.

Conclusion
is work provides a robust framework for classifying the gastrointestinal tract diseases in the Kvasir dataset.Deep learning techniques can reduce the probability of developing malignant diseases by aiding in early detection, while also reducing the unnecessary removal of benign tumors.Video endoscopy is the most widely used diagnostic method for diagnosing gastrointestinal polyps, but many human factors lead to improper diagnosis of gastrointestinal diseases.is paper presents three deep learning models, GoogleNet, ResNet-50, and AlexNet, that can direct the doctor's focus to the most important regions that may have been missed.e dataset was divided into 80% for training and 20% for testing and validation.e images were optimised to remove noise and artifacts.e data augmentation technology works by multiplying the images during the training phase for high accuracy.Convolutional layers extract features of shape, colour, and texture.Overall, 9216 features were extracted and passed into the fully connected layers that produced 1000 neurons.
e softmax layer produces five classes, classifying each image into one of the five types of gastrointestinal diseases.All three models achieved equally promising results.Advanced deep learning algorithm will be applied in future.
2.2.3.AlexNet.AlexNet is a model of CNNs and consists of 25 layers.AlexNet was the winner in the ImageNet classification competition in 2012, with a top-5 error rate of 15.3% Complexity[40].Figure3illustrates the architecture of the AlexNet used to classify 5,000 images divided into five diseases from the lower digestive system.e architecture of AlexNet consists of 25 layers divided into the input layer that inputs RGB images at 227 × 227 pixels size and five convolutional layers that use different types of filters.e convolutional layer extracts deep features from the input images.ree layers are used in max pooling; these layers reduce the dimensions of the feature vector maps.Two layers of cross-channel normalisation work on reparameterisation of the vector weights and choose the appropriate learning rate.Seven layers of the ReLU follow the convolutional layers.ReLU only outputs the positive values, while converting the negative values to zero.ree fully connected layers operate in series.e first connected layer receives 9216 features and produces 4096 features, the second connected layer produces 4096 features, and the third fully connected layer produces 1000 neurons (features).A softmax layer produces the five classes: dyed-lifted polyps, normal cecum, normal pylorus, polyps, and ulcerative colitis.

Figure 2 :
Figure 2: Structure of a ResNet-50 for gastrointestinal disease diagnosis.

Figure 4 :
Figure 4: A general structure of gastrointestinal disease detection by deep learning.

Figure 3 :
Figure 3: Structure of an AlexNet for gastrointestinal disease diagnosis.

6
Complexityprocesses. Normalisation aids in the appropriate choice of the learning rate through gradient descent converging.Without the normalisation process, the learning rate is more difficult and takes longer.In our work, the image normalisation process was done by subtracting the mean of the complete training set for each pixel.A variance of the dataset was calculated and divided by every pixel, resulting in data centring and making the variance of each feature equal to one.

Table 1 :
50, and AlexNet at the level of each disease.ResNet-50 and AlexNet achieved 99% accuracy in classifying dyed-lifted polyps, while ResNet-50 and AlexNet achieved 95% accuracy in classifying normal cecum disease.GoogleNet achieved the best classification (100%) for normal pylorus disease.Polyps were classified at 98% by ResNet-50, Options for configuring the training parameters for deep learning networks.

Table 2 :
Results of diagnosing gastrointestinal diseases using deep learning models.

Table 3 :
Performance evaluation results for the gastrointestinal disease datasets.

Table 4 :
Comparison of the performance of our proposed system with models of previous studies.