Kidney Tumor Detection and Classification Based on Deep Learning Approaches: A New Dataset in CT Scans

Kidney tumor (KT) is one of the diseases that have affected our society and is the seventh most common tumor in both men and women worldwide. The early detection of KT has significant benefits in reducing death rates, producing preventive measures that reduce effects, and overcoming the tumor. Compared to the tedious and time-consuming traditional diagnosis, automatic detection algorithms of deep learning (DL) can save diagnosis time, improve test accuracy, reduce costs, and reduce the radiologist's workload. In this paper, we present detection models for diagnosing the presence of KTs in computed tomography (CT) scans. Toward detecting and classifying KT, we proposed 2D-CNN models; three models are concerning KT detection such as a 2D convolutional neural network with six layers (CNN-6), a ResNet50 with 50 layers, and a VGG16 with 16 layers. The last model is for KT classification as a 2D convolutional neural network with four layers (CNN-4). In addition, a novel dataset from the King Abdullah University Hospital (KAUH) has been collected that consists of 8,400 images of 120 adult patients who have performed CT scans for suspected kidney masses. The dataset was divided into 80% for the training set and 20% for the testing set. The accuracy results for the detection models of 2D CNN-6 and ResNet50 reached 97%, 96%, and 60%, respectively. At the same time, the accuracy results for the classification model of the 2D CNN-4 reached 92%. Our novel models achieved promising results; they enhance the diagnosis of patient conditions with high accuracy, reducing radiologist's workload and providing them with a tool that can automatically assess the condition of the kidneys, reducing the risk of misdiagnosis. Furthermore, increasing the quality of healthcare service and early detection can change the disease's track and preserve the patient's life.


Introduction
Te kidneys in the human body cleanse waste products and pollutants from the blood [1,2]. Te abnormal growth of cells causes tumors (cancers), afects people diferently, and causes diferent symptoms.
Terefore, the early detection of kidney tumors (KT) is an essential step to reduce the risk of further disease progression. Consequently, this leads to the patient's life preservation [2,3]. Although around a third of KT cases are discovered after being spread to other areas, most conditions do not induce symptoms. Tey are often found when the patients are being treated for other diseases. Kidney tumors can be observed accidentally on radiography and may appear as masses, kidney cysts, or abdominal pain in patients. Te signs likely have nothing to do with the kidneys [4,5]. However, low hemoglobin, weakness, vomiting, stomach pain, blood pee, or high blood sugar are among the most subtle symptoms or infections KT causes. Also, anemia occurs in about 30 percent of patients with KT [6,7]. Unfortunately, tumors and solid masses that can arise inside the kidneys are often cancerous. Te value of determining the presence of the tumor is to choose the appropriate method for treatment; hence, the rate of recovery from the disease may depend on the early detection of the tumor. One of the necessary tests to determine the tumor is computed tomography (CT) scans of the abdomen and pelvis to the patients, which have characteristics studied to judge whether the kidney has a tumor. Figure 1 shows a case of KT, a renal mass lesion in the left kidney measuring about 4 cm besides 3D volume rendering of the kidneys (kidney in pink and renal cancer in blue). A tumor threatens a person's life, so many procedures resolve this obstacle through accurate tumor diagnosis [8,9].
Deep learning (DL) is one of the most powerful machine learning technologies that can automatically learn multiple features and patterns without human intervention [10][11][12]. DL enabled the building of predictive models for the early diagnosis of tumor disease, and scientists used proven pattern analysis methods. DL algorithms outperformed traditional machine learning due to their highly accurate results [13][14][15]. Also, it often matches or surpasses human performance. Tat is why they are recommended as the best method for dealing with images [16,17]. It has gained attention in image processing, especially in the medical feld, because radiology is primarily concerned with extracting useful information from images.
Object detection is the method of identifying the class instance to which the object belongs. Tere are several types of detection, such as single-class object detection and multiclass object detection [18]. Object detection has been applied in a wide feld of medical images because of its precise efect on discovering diseases of all kinds. Te convolutional neural network (CNN) is widely used to extract image characteristics and detect diferent objects. It is a neural network that operates on the principle of weight sharing. Te convolution is an integral part of a function that explains how one function interferes with another. Te size and the number of images, the number of working layers, and the form of activation functions used in CNNs vary [19]. Variables of CNNs are selected experimentally and on a trial-and-error basis. Besides, every CNN consists of several layers, the most important of which are the convolutional and subsampling (pooling) layers [20]. Figure 2 shows an illustration of CNN architecture.
Over the years, many variants of CNN structures were developed to solve difcult real-world problems and obtain sufcient accuracy. In our study, we have applied VGG16 and ResNet50, besides two modifed CNNs. Qassim et al. developed a CNN-based network model, VGG16, that achieved 92.7% top-5 test accuracy in ImageNet. ImageNet-2014 is a competition with approximately 15 million high-resolution images that have been classifed into roughly 22,000 categories. Te VGG16 network consists of 16 layers that have weights, 13 convolutions (cov) and 3 fully connected (FC) layers besides its learning time of 16.55 ms. Te image entered into the cov1 layer has a fxed size, which is 224 × 224. Te image is scrolled through a set of layers, where flters 3 × 3 were used and also 1 × 1 convolution flters. Te convolution step is fxed at 1 pixel so that accuracy is maintained after torsion. Max-pooling is performed via a 2 × 2-pixel window. Also, three fully connected (FC) layers follow a stack of convolutional layers. Te fnal layer is the softmax layer. Te composition of fully connected layers is the same in all networks. All hidden layers are equipped with a calendar ReLU [21]. Figure 3 shows the architecture of the VGG16 model. In addition, the ResNet50s is an improved version of convolutional neural network developed in 2015. Tis network consists of 50 layers, 49 convolutions and one fully connected layer. Each convolution block has three convolution layers besides its learning rate of 12.83 ms. Te image entered into the cov1 layer has a fxed size, which is 224 × 224 [22]. Figure 4 shows the architecture of the ResNet50 model.
Moreover, this paper has collected a novel dataset, renal CT scans, consisting of 8,400 frames. As a preliminary study on the new dataset, a convolutional neural network (CNN) framework with six layers has been proposed to diagnose tumors. Ten, a CNN framework with four layers has been proposed for classifying the tumor type. We have confned CNN's training process and directed the CNNs to generate anatomically more viable predictions, mainly when the input picture data are not clear enough (e.g., missing object edges and boundaries). In addition to our proposed model's architecture, we have used state-of-the-art networks in the study, which are the ResNet50 with 50 layers and VGG16 with 16 layers [21]. Finally, we evaluate and test our model on the new renal CT for 2020 and 2021.
Te remainder of this paper is organized as follows. Section 2 presents the related work to detecting CT images. Section 3 describes the materials, including data set collection and description. Section 4 describes the methods, including data preprocessing, augmentation, and network architectures. Also, Section 5 shows the experiments and results. Lastly, Section 6 presents the discussion and conclusions.

Related Studies
Tis section begins with a discussion of works of literature, each of which addresses the issue of early KT detection and classifcation using various machine learning and deep learning techniques based on CT scans. Ghalib et al. [23] conducted a study for renal tumor detection using deep learning approaches on CT scans. Te authors developed an efcient algorithm to detect and further analyze renal cancer tumors using CT for patients. Te preprocessing technique involved identifying the noises of a CT scan and removing them with a proper fltering technique. Image enhancement is also performed using contrast-limited adaptive histogram. Te classifcation process is determined based on the patterns of visual appearance that include contrast, size, location, surface area, color, volume, risk, specialization, density, and risk. Based on their experimental results, the proposed model obtained high performance in classifying tumors into normal and abnormal, achieving 0.85 sec of average execution time.
On the other hand, Liu et al. [24] conducted a study for exophytic renal tumor detection through machine learning techniques on CT scans. Tey used 167 CT scans and developed a framework for kidney segmentation on noncontract CT images using efcient belief propagation. Based on their experimental results, the proposed model obtained high performance with 95% and 80% rates of sensitivity of exophytic lesion and endophytic lesion detection, respectively.
Furthermore, Mredhula and Dorairangaswamy [25] conducted a study for KT detection and classifcation using deep learning approaches and traditional machine learning techniques on CT scans. Tey used 28 CT scans for diferent categories of kidney tumors, where the used dataset was acquired from their database. Tey focused on implementing a semiautomatic segmentation method, defning that the segmentation of the gray-level images provides   information such as the anatomical structure and the identifcation of the region of interest to locate tumors. Besides, they proposed an associative neural network (ASNN) model that combined the k-nearest neighbor (KNN) technique with an ensemble feedforward neural network.
Lately, Zhou et al. conducted a study about diferentiating renal tumors based on deep learning [26]. To investigate the efect of transfer learning on CT, they used 192 CT scans for patients to diferentiate between benign and malignant tumors and attempted to improve the accuracy by building patient-level models. Te CNN architecture used was cross-trained InceptionV3 to perform the classifcation task. Five image-level models were established for each of the slices. Te performance evaluation of the model was performed using the receiver operating characteristic metric on fve-fold cross-validation. Te results showed high accuracy with a 97% rate. Te researchers concluded that deep learning approaches are useful for renal tumor classifcation based on CT scans and recommended benefting from 3D CT scans to achieve more accurate results.
More recently, Zabihollahy et al. [27] conducted a study about the detection of solid renal masses using deep learning approaches on CT scans. Tey used semiautomated majority voting 2D-CNN, fully automated 2D-CNN, and 3D-CNN to classify RCC from benign solid renal masses on contrast-enhanced computed tomography (CECT) images. Tey used CT scans for 315 patients, in which the dataset included 77 scans for patients with benign solid renal masses and 238 scans for patients with malignant renal masses. Tey generated slices of scans manually and utilized the CNN model to extract features from each slice. Ten, the classifcation was performed using the aggregation of CNN predictions and evaluated by the majority voting technique. Based on their experimental results, the proposed model obtained high performance with 83.75%, 89.05%, and 91.73% rates of accuracy, precision, and recall for the classifcation between RCC and benign tumors, respectively.
Also, Schieda et al. [28] conducted a study about the classifcation of solid renal masses using machine learning techniques on CT scans. Tey have used CT scans for 177 patients with solid renal. Te features were extracted through manual segmentation with radiologists from three phases of scans: nephrographic phase contrast-enhanced, corticomedullary, and non-contrast-enhanced. Te proposed method utilized the XGBoost machine learning technique. It was used to generate classifers and, simultaneously, to search for the collection(s) of texture features that accurately discriminated between outcomes. Te proposed model obtained high performance with 0.70 rates of AUC in classifying renal cell carcinoma from benign tumors and 0.77 rates of AUC in classifying clear cells of RCC from the other types.
Finally, Yap et al. [29] conducted a study about the classifcation of renal masses using machine learning technique CT scans. Tey used CT scans for 735 patients with renal masses, in which the dataset included 196 scans of benign masses and 539 scans of malignant cases. Tey segmented scans manually by utilizing the 3D Synapse 3D tool by cooperating with two expert radiologists, where the features were extracted based on shape and texture matrices. Te proposed methods used two machine learning techniques, which are AdaBoost and Random Forest. Based on their experimental results, Random forest obtained high performance on both features with 0.68 to 0.75 rates of AUC for the classifcation of renal masses.

Dataset
Tis section focuses on the data collection process and data analysis.

Data Acquisition and Preparation.
Tis work presents new data consisting of images and text " metadata" obtained from KAUH hospital in Jordan. In this paper, we worked on the image data. Te current study has collected scan data for renal masses cases from the hospital's database, performed by the interventions computed tomography (CT) scan service. Although the image set provides more than one picture from diferent dimensions for each patient, the diversity of images helps us get an accurate diagnosis. Besides, clinical text data support our fndings and help us understand the collected images. From these miscellaneous data, diferent studies can be conducted. Te collected dataset consists of 8400 images of 120 adult patients who have performed a CT scan for suspected kidney masses. Te images are provided in (DICOM) format, considered the most standard for the interchange and transmission of medical images used worldwide. Te data collected included a CT scan with contrast material and without contrast. Figure 5 shows a sample CT with contrast and without contrast taken from the dataset.
For comparing the current dataset with the available public dataset, Table 1 summarizes the public datasets of renal CT scans for diagnosing kidney tumors by showing their sizes and sources. For example, the G037-RCP dataset exported by the Royal College of Pathologists (RCPath) located in London combines multihealth data, such as texture, images, tests, and educational information. Te C4K-KiTs19 dataset is an abbreviation for Climb 4 Kidney Cancer collected from the University of Minnesota Medical Centre [31]. Te TCGA dataset is an abbreviation for the Cancer Genome Atlas, which is a cancer program that has data samples spanning 33 cancer types [37]. Besides, the CPTAC-CCRCC program investigated 110 tumors regarding the TCIA dataset. Te current study's dataset exceeds the other datasets regarding size, number of patients, and diversity of images. It is considered the frst collected data from Jordan's King Abdullah University Hospital (KAUH). Te CT scan images and metadata were collected manually and supervised by a specialist team. Tere are 70 CT scans for each patient. It is strongly believed that this dataset can be the basis for subsequent studies to diagnose tumors and stones, cysts, and any kidney problems, such as infation, infection, and hydronephrosis. Te proposed dataset will be publicly available for researchers up to their request (https://github.com/DaliaAlzubi/KidneyTumor).

Data Set Annotation and
Visualization. Data were collected for adult patients between the ages of 30 ± 80, 55 females, and 65 males, who underwent CT images of the abdomen and pelvis. Of 120 patients, 60 tumors were classifed as benign or malignant and 60 cases were diagnosed as normal cases without tumors. Still, half of the normal cases sufer from cysts, hydronephrosis, and stones. Also, some of them sufer from cancers in neighboring organs such as the colon, liver, breast, lung, stomach, and their condition for follow-up. Terefore, they must perform a CT scan periodically to ensure that cancer in other organs has not spread to the kidneys. Besides, some cases are sufering from a nephrectomy or part of it due to RCC, and their condition must be monitored to ensure the safety of kidney function and that the tumor does not spread. Table 2 and Figure 6 show an analysis of the gender situation for all cases in the dataset.
Te clinical observations include ID, age, gender, date of the scan, patient history, symptoms, diagnosis, type of right kidney injury, type of left kidney injury, both kidney disease segmentation, tumor stage, patients situation if they have a tumor or it is a normal case completely healthy or normal case with a cyst or stone. Also, they include the tumor type: Benign or Malignant, the Subtypes of the tumor, and the Test.
All patients had a CT multidimensionally examined for the pelvic and abdominal area that outlined various slices of the renal, ureter, and bladder region. Tese metadata were constructed and labeled manually based on the clinical reports. Te data were reviewed by radiologists and the medical staf of the kidneys and urinary tract. In cooperation with them, the correctness of the data structure was checked and validated. Te dataset contains (20) attributes and numerical and categorical data that describe all dataset characteristics, as shown in Table 3. In addition, the patient's data are divided into categories.
Te "Tumor Type" attribute illustrates the tumor type as two labels: Malignant (1) and Benign (2), as shown in Table 5.
Te "Taking Contrast" attribute illustrates if the patient has taken Contrast material as two labels: Yes (1) and No (2), as shown in Table 6.
Te "Segmentation Injury in Right Kidney" and "Segmentation Injury in Left Kidney" attributes illustrate the location of the tumor: upper, middle, lower, healthy, and undefned, as shown in Table 9.
Regarding the statistical analysis of the collected data for kidney patients, 83 cases were taken in 2020 and 37 cases were taken in 2021. Figure 7 shows the age of normal and   Contrast material is given to the person to be examined by X-ray imaging to enhance the quality of the image. Tus, that is easy for the doctor to distinguish between healthy injured tissues, facilitate the distinction of blood vessels, and determine the extent of their injury [38]. Figure 8 shows the patients who had taken contrast material before the CT test for normal cases, where 35 had taken contrast and 25 had not. And for tumor cases, 38 had taken contrast and 22 had not. Based on the above analysis, a patient who was not given a contrast had allergies, diabetes, impaired kidney function, and kidney dialysis. In addition, it is considered a risk factor because of its harmful efects such as nausea, vomiting, high or low blood pressure, caused itching, sensitivity or shortness of breath, or problems with breathing or heart failure. Figures 9 and 10 show the classifcation of tumor type and subtype for all tumor cases. Of the 60 tumors cases 38 are divided into benign and 22 malignant. For benign cases, there are 28 cases considered adenoma that can be excised and treated; nine angiomyolipoma cases must be removed because they are considered a hemorrhagic cyst, and one case is considered lipomas. Besides, for malignant cases, 11 are considered RCC and 11 cases are considered metastasis due to the transfer of the tumor from neighboring organs. In adenomas cases, most of them sufered from pressure, diabetes, and liver problems such as cysts and tumors. Also, secondary cases sufered from cancers in other organs such as breast, colon, and right kidney nephrectomy because of RCC, ureter, and uterus.
Te incidence of diferent tumor types is linked to gender diferences, and it is also related to the treatment method because it does not have the same response. Figure 11 shows the gender and the tumor classifcation. It is clear that most cases of kidney tumors are in men. Te percentage of males having tumors is higher than that of females. In addition, males reach a later stage of the tumor than females because the rate of smoking in men is higher than that of women. Also, the tumor spreads in men more quickly than it spreads in females.
According to the statistical analysis of the gender afected by the tumor that is shown in Figure 11, the results prove the truth of the information in the National Cancer Institute (NCI) since men are more likely than women to develop tumors. Te institute reports that one out of every two men and one out of every woman will develop cancer during their lifetime [39]. Figure 12 shows a statistical analysis of the location of the tumor. For the left kidney, there are 21 cases in the  Journal of Healthcare Engineering upper, 17 cases are healthy, 11 cases in the lower, and 9 cases in the middle. While on the other hand, for the right kidney, there are 24 cases of healthy, 18 cases in the upper, 8 cases undefned, 7 cases in the lower, and one case in the middle. Based on these analyzes, we found that most of the tumors in the right kidney are located in the upper part, and the lower part and most of the tumors in the left kidney are located in the upper, middle, and lower parts. Te healthy label means that there is no tumor in this kidney. It is possible that the tumor is in one kidney and the other is healthy. Te undefned label means that this kidney may have been partially or completely nephrectomy, or it may be that the expert is unable to diagnose the location of the tumor, or it may be that the scan is not clear enough to determine the exact location of the tumor due to not taking the contrast material or because the patient moved during the CT test. Figure 13 shows the tumor stage for all tumor patients, where there are 55 cases in the frst stage, two cases in the second stage, two cases in the third stage, and one case in the fourth stage. Tus, most tumors are in the I stage, meaning that they can be treated, and there are some cases in the late stage, which is a threat to the patient.

Methodology
Tis section describes our proposed methodology for KT detection and classifcation using CT scans. It includes a detailed explanation of our preprocessing steps and the used data augmentation techniques and an illustration of the architecture of the four models we built for KT diagnosis. We examine the patient's situation and defne tumor presence to reduce the harmful efects of the injury and reduce the number of deaths and defne the tumor type. Terefore, we have collected the new dataset from (KAUH) that contains images and metadata. We have also used the OpenRefne tool and tableau for preprocessing step to have a cleaned dataset. Furthermore, we used a DICOM converter to change the image format, and we have chosen 70 images of the kidneys from diferent dimensions for each patient. Figure 14 shows the workfow of the proposed framework.
We built prediction networks, three models to make multidiagnosis for the classifcation of diferent 4 labels revolving around two phases. In the frst phase, we classify the case as normal case or tumor case, while in the second phase, we classify the tumor detected as benign tumor or malignant tumor where artifcial neural network modeling is used where neurons correspond to receptive felds similar to neurons in the visual cortex of a human brain. Tese networks are very efective for tasks of detection, categorization of objects, image classifcation, and segmentation. Te goal of CNNs is to learn higher-order characteristics using the convolution operation. Since convolutional neural networks learn input-output relationships (where the input is an image), the output is a feature map (image class label).
In this study, we examine the patient's situation and defne tumor presence to reduce the harmful efects of the injury and reduce the number of deaths. Terefore, we have collected a new dataset from (KAUH) that contains images and metadata. We have also used the OpenRefne tool and tableau to make some preprocesses steps to have a cleaned dataset. Furthermore, we used a DICOM converter to change the image form, and we have chosen 70 images of the kidneys from diferent dimensions for each patient. Ten, we started by implementing a convolutional neural network for binary classifcation with the labels (Normal/Tumor).
Artifcial neural network modeling is very efective for detecting tasks, categorization of objects, image classifcation, and segmentation. Te goal of CNNs is to learn higherorder characteristics using the convolution operation. Since CNN's learns input-output relationships (where the input is an image), in convolution, each output pixel is a linear combination of the input pixels [40].     Grown to blood vessels, may spread in around IV Tumors spread into the adrenal gland or to other organs Table 9: Segmentation of the injury in the right and left kidney labels description.

Label Description Upper
Tumor in the upper part of the kidney Middle Tumor in the middle part of the kidney Lower Tumor in the lower part of the kidney Healthy Tere is no tumor Undefned Partial, nephrectomy, blurred kidney, undiagnosed 8 Journal of Healthcare Engineering We aim to implement a binary classifcation solution for the detection of kidney tumors. Te use of CNN in such a case helps to identify the feature map for each image engaged in the tanning process for the adopted CNN model. Hence, the use of the pooling layer helps to determine the size of the feature segment that we are looking for to extract a featured image, which will be the primary feed data into the fully connected neural network in the CNN model. As represented in Figure 15, we have two classes to be trained on it.
Te study aims to implement a binary classifcation solution for the detection and classifcation of kidney tumors. Artifcial neural network modeling efectively detects tasks and categorizes objects, image classifcation, and segmentation. Te use of CNN in such a case helps to identify the feature map for each image engaged in the tanning process for the adopted CNN model. As represented in Figures 15 and 16, two categories are used to be trained for each phase. In the frst phase, we classify the case as; Normal case, or Tumor case, while in the second phase, we classify the detected tumor as Benign tumor or Malignant tumor.
Te attribute of our interest in the frst phase is the "situation," which is shown in Table 4. It comprises diferent values that are merged to balance the number of labels in the frst case of detection of the tumor. We have merged the situation for the normal cae "healthy" and normal case with the cyst, as "Normal" of the tumor (Normal � 0) label and the situation of tumors as "tumor" (Tumor � 1) label. Finally, the attribute comprised new binary labels (0 and 1). Table 10 shows the new labels.
Te attribute of our interest in the second phase is the "tumor type," which is shown in Table 5, we present the benign tumor as (benign � 0) label and the malignant of tumors as (malignant � 1) label. Finally, the attribute became composed of new binary labels (0 and 1). Table 11 shows the new labels.

Data Preprocessing and Augmentation
4.1.1. Preprocesses. Each patient had a fle containing a video of a CT scan, where the number of pictures in the videos for patients varied, from 200 ± 900 images. As an initial step, the video was divided manually into frames. In addition, clinical imaging data were stored and transmitted in the DICOM format. We converted images from the complex DICOM format to JPEG format, much smaller and easier to use [41]. After converting the image format, 70 images were chosen for each patient that showed the kidneys from diferent dimensions. Besides, for metadata, we have used the OpenRefne tool and tableau to make some preprocesses steps, having a cleaned text dataset to make visualization for data. See Figure 14.

Image Normalization.
We normalize the images by resizing the layers from 3 to 1 channel (converting RGB image into a grayscale). We also normalize the image size; the CT window level and breadth were set to emphasize the renal area while suppressing information from other organs and tissue.
Tis step of normalization is Figure 17 by reshaping the images to the preferred size of 224 × 224; this allows the network to acquire adequate renal context information from CT volumes. Reducing the image size is important because sometimes the image contains a lot of information; we can remove this kind of redundant important information.

Feature Extraction.
We have used OpenCV for image scans, which is a large open-source toolkit that supports Python language. It can detect objects from images and videos. In addition, OpenCVhas an algorithm named Canny, which provides the ability to extract image edge features [42]; see Figure 18. Figure 18 shows an example of output for the Canny edge detector on one of the dataset images. We can notice that it provides a deep level of information, especially between the separations of the kidney. From this view, we can see that before training models, we have to use data augmentation to increase our dataset series of images per epoch in the training model, to enhance the model's generalization capability. In this way, the CNN will be able to distinguish this feature by its deep layers epoch by the epoch of learning; by this analysis, we discover that we need to use data augmentation.

Image Augmentation.
Preprocess improves the accuracy of the proposed methodology by normalizing and augmenting data. Image data augmentation is used to boost the model's learning capabilities and generalize its  performance. Augmentation is a technique for artifcially increasing a training dataset's size by producing updated images in the dataset. Te data augmentation class can be specifed once the data I/O interface has been initialized [43]. Data augmentation is a powerful technique to minimize model error rate (overftting) by expanding training data. Te used metrics in augmentation are (i) Re-scale: it is used with the aim to scale the 255 values to the static range (224 × 224). (ii) Shear range: the image will be distorted along an axis, mostly to create or rectify the perception angles. We used 0.2 on the original image. (iii) Zoom range: the amount of zoom is 0.2 on the original image. (iv) Horizontal fip: we fipped the images horizontally. Figure 19 shows an example of generating four new images based on the original CT. Tese valuable techniques are commonly used to create synthetic data, train large neural networks, and make our proposed models more robust to avoid overftting when training the deep learning model. Also, when we face data scarcity, if the number of patients in the data set is minimal, it can be adjusted by rotation, refection, etc. As a result, we get entirely new and manufactured images using diferent technologies. Figure 20 shows an example of applying augmentation operations in Renal CT.
Image data augmentation is supported in the Keras deep learning library via the ImageDataGenerator class. Usually, image data augmentation is applied only to the training data set, not to the validation or test data set. In addition, it may be useful to try data augmentation methods separately to see if they lead to a measurable improvement in model performance, perhaps using a small sample data set and a model and a training run. Finally, following the process of augmenting the dataset, an augmented dataset was acquired. Te data size of the tumor detection task before applying the augmentation method was 8400, and it was increased by four times, after the increase it became 33,600. Also, the data size of the tumor type classifcation task before applying the augmentation method was 4200 and it was increased by four times, after the increase it became 16,800. Tus, this data increases the performance of our models, especially when the available dataset is unbalanced. Also, augmentation reduces the training time, the model error rate, and the accuracy of classifcation tasks.

Building Models.
Tis section describes the detection and classifcation models architecture used to predict four outputs; frst, the normal case and the tumor case, then, the benign tumor and the malignant tumor. We have built four models; VGG16, ResNet50, and two modifed CNN.

VGG16 Detection Model.
Te network architecture consists of 16 layers deep: 13 convolutions 4 max-pooling and 3 fully connected layers. Te convolutional input layer has a shape of 224 × 224 × 3; this layer determines the input dimensions and shape. Max-pooling minimizes the dimensionality of images by reducing the number of pixels from the previous convolutional layer. In addition, a fully connected ANN has an input layer that refects the size of max-pooling output data and a hidden layer with the Relu activation function, also an output layer with a softmax output classifer that performs the prediction percentages for each class. Figure 21 shows the network structure of the VGG16 architecture.    images. Te fully connected network, which has an input layer that refects the size of the max-pooling output data and a hidden layer with the Relu activation function, also has an output layer with a Softmax output classifer that performs the prediction percentages for each class. Figure 22 shows the network structure of the ResNet50 architecture.

CNN-6 Detection
Model. Te proposed model for the detection of the tumor consists of 6 deep CNN layers with fully connected ANN. First is the batch normalization layer, which standardizes the inputs to a layer for each minibatch. By this, the model can reduce overftting and enhance classifcation accuracy. Ten, the convolution 2D input layer, represented as an input layer in 2D, creates a convolution kernel that works with layers input to produce a tensor of outputs. Te Kernel is a convolution mask that can be used for blurring, sharpening, embossing, edge detection, and so on as a features extractor from the original image with the help of other layers. It is important to note that there is no diference between 1D, 2D, or 3D except in the number of dimensions.
Conv2D took several parameters that specifed its working process. As an example, modelName.add (Conv2D (32, (3, 3), input shape � (224, 224, 3), activation � "Relu")), which means that the Conv2D will learn a total of 32 flters after that the use of use max-pooling layer is to reduce the spatial dimensions of the output features data volume. Tis value is usually used as a value of power. Te other parameter is the kernel size (3 × 3), where we defne the kernel dimensions. Te dimension must be a two-dimensional array of an odd number to specify the height and width of the 2D convolution frame applied to the original image to extract the features from it. While the activation � Relu (rectifed linear activation function) defnes the activation function that we want to apply on this input layer defned as y � max (0, x). It also might be softmax, which is more suitable for output activation in binary classifcation and low target classes such as 2 or 4. Te softmax function is a function that turns a vector of K real values into a vector of K real values that sum to 1.
Te input values can be positive, negative, zero, or greater than one, but the softmax transforms them into values between 0 and 1 to be interpreted as probabilities. If one of the inputs is small or negative, the softmax turns it into a small probability, and if the input is large, it turns it into a signifcant probability. Still, it will always remain between 0 and 1. Te input shape is the primary parameter for this layer where it defnes the structure for the input image formulated as Image Width × Image Height × Image Channels in this example (224 × 224 × 3). Ten is the Max-Pooling2D layer 2 × 2; this max-pooling is a type of operation that is typically added to CNN's following individual convolutional layers. When added to a model, max-pooling reduces the dimensionality of images by reducing the number of pixels in the output from the previous convolutional layer.
Ten, the dropout layer randomly sets input units to a value with a rated frequency at each step during training time. Tus, it helps to reduce model overftting. Ten, a fattened layer fattens its input into a fatted output such as 2D into 1D of values (2 × 2) input becomes four as output. Ten, the dense layer, which defnes the hidden layer or the output layer, takes the number of neurons and the activation function. In the output layer, the number of neurons matches the number of training classes. Figure 23 shows the network structure of the modifed CNN architecture.
Te proposed model for detecting the tumor consists of 6 deep CNN layers with fully connected ANN: frst, the batch normalization layer, which standardizes the inputs to a layer for each minibatch and then the convolution 2D input layer. Te Kernel is a convolution mask that can be used for blurring, sharpening, embossing, edge detection, and so on. Conv2D took several parameters that are specifc to its working process. Max-pooling layer is to reduce the spatial dimensions of the output features data volume. Next, the dropout layer helps to reduce model overftting. Ten, the fattening layer fattens its input into a fatted result. Ten, the dense layer defnes the feedforward network. In the  output layer, the number of neurons much matches the number of training classes. Figure 23 shows the network structure of the modifed CNN architecture.

CNN-4 Classifcation
Model. Te proposed model for the classifcation of the tumor type consists of 4 deep CNN layers with fully connected ANN: frst, the convolution 2D input layer; then, Max-Pooling2D layer 2 × 2; after that, the fattening layer; and then, the dense layer. In the output layer, the number of neurons matches the number of training classes. Figure 24 shows the network structure of the CNN architecture.

Experimental and Results
Tis section presents the experiments of four diferent models, VGG16, ResNet50, and two modifed CNN. Te four models were run with often the same hyperparameters, though all had diferent network architectures. Results are shown in the following subsections.  Table 12 shows the parameter setting for our experiment details. Te parameters were selected based on the experiments, which gave us the best results for the parameters mentioned in Table 12. Tey were described in detail as the following: (i) Stepper epoch: it defnes the number of learning steps that will be done in each epoch. We need a high number of training steps in each epoch since our data are large, and in our data augmentation, we declare it based on the batch size, so we have to make the steps larger than the batch size.
(ii) Epochs: it is the number of times the dataset is being completed to the network, which can help in reusing the same dataset for training again. An epoch means training the neural network with all the training data for one cycle. (iii) Validation steps: same as steps per epoch but in this case, it defnes the number of validation samples applied to the model in each epoch to reduce loss and increase the accuracy of classifcation. (iv) Loss: it is the value that is calculated after each iteration to defne the error, which is calculated by the loss function. (v) Optimizer: it helps reduce the output error of the loss function by changing the weights and bias values in the model and computes the adaptive learning rates for each parameter in the training phases. In our adopted CNN tools, we have used Adam optimizer. We experimented with Adam and other optimizers and found that the use of Adam is more accurate, as it also proves that Adam is the best in image classifcation problems. (vi) Activation function: it is a function used to choose whether the neuron should fre the data or not by obtaining the value received from the neuron and reevaluating it. (vii) Learning rate: it is a factor that is used along with the optimizer in changing the weights of the function, to end up with high accuracy. (viii) Batch size: it is the number of samples processed before the model is updated. It must meet the data sample but we reduce this number since our feature vector represents the image itself without a region of interest information so there is no need to take a high number of batches. (ix) Input neuron: it receives the image binary data as a feature vector. According to the image size 224 × 224, we found that 32 is the best number for our model training. (x) Hidden neuron: it is trained on the input images to build the training model. We took the number of  neurons in the input layer and multiply it by the number of image channels and increased the value to 128 to enhance the processing time. (xi) Output neuron: it defnes the output classes that we have.

Adopted Experiment Dataset.
For the detection task, the collected data were divided into 80% for the training set and 20% for the testing set. Te training dataset was also divided into training and validation, which consisted of 5376 images for train and 1344 for validation. At the same time, the total number of images for testing is 1680. On the other hand, the training dataset used in the classifcation model for the kidney tumor type consists of 2688 images. Terefore, the total number of images for validation is 672. At the same time, the total number of images for testing is 840. For the detection task, the collected data were randomly divided into 80% for the training set and 20% for the testing set. Te training dataset was also divided into training and validation, which consisted of 5376 images for train and 1344 for validation. At the same time, the total number of images for testing is 1680. Table 13 represents the values with the dataset splitting percentages.
Te training dataset used in the classifcation model for the kidney tumor type consists of 2688 images. Te total number of images for validation is 672. At the same time, the total number of images for testing is 840. Table 14 represents the values with the dataset splitting percentages.

Evaluation Metrics.
Based on the preprocessed dataset, we have used the confusion matrix [44] to evaluate our networks. We have used F-score metrics, which being calculated from the precision and recall of the test phase. More details about evaluation metrics are as follows: (i) True Positive (TP): it represents the correctly predicted positive values, which means that the value of the actual class is yes, and the value of the predicted class is also yes. (ii) True Negative (TN): it represents the correctly predicted negative values, which means that the actual class's value is no, and the value of the predicted class is also no.

VGG16 Detection Model Training and Testing.
Training the VGG16 model on the data produced a loss value of 0.3506 and a test accuracy of 0.5938. Figure Table 15 shows F-score diagnostic testing.
Te training accuracy of ResNet50 reached 0.60, while the test accuracy reached 0.5938. Loss value implies how well or poorly a certain model behaves after each iteration of optimization. Te value of the test loss in the VGG16 model is 0.3506, and the training time is 3 s 68 ms/step. From the previous results, we can conclude that the VGG16 model is weakly trained, and it behavior was poor in the testing process.

ResNet50 Detection Model Training and Testing.
Training the ResNet50 model on the data produced a loss value of 0.0806 and a test accuracy of 0.9747. Figure 26 represents the loss and accuracy for both training and validation during the training process through each epoch. Te values show how stable the model is.
Te model was tested on 848 samples from the normal class and 832 samples from the tumor class. For normal cases, the model was able to classify 806 samples correctly, while it failed in 42 samples. However, for tumor cases, it classifed 813 samples correctly while it failed in 19 samples. Table 16 shows F-score diagnostic testing.
Te training accuracy of ResNet50 reached 0.96, while the test accuracy reached 0.9747. Loss value implies how well or poorly a certain model behaves after each iteration of optimization. Te value of the test loss in the ResNet50 model is 0.0806, and the training time is 3 s 70 ms/step. From

CNN Model Classifcation Training and Testing.
After performing a CNN model training based on the previous data mentioned above, we got a loss of 0.0643 and an accuracy of 0.9777. We also got the following graph representing the loss and accuracy for both training and validation during the training process through epochs. Te values show how stable the model is in Figure 28.
Te test was performed upon 531 samples from the malignant class and 234 samples from the benign class. For the malignant class, the model was able to classify 229 samples correctly while it failed in 5 samples. On the other hand, for benign, it correctly classifes 474 samples while it fails in 57 samples. Table 18 shows F-score diagnostic testing.
Te training accuracy for this model reached 0.9777. Te test accuracy reached 0.92. At the same time, the training time is 1 s 64 ms/step. Table 19 shows the accuracy, training loss, number of epochs, and training time, for all proposed deep learning models where three models were used for tumor detection and one model for tumor classifcation. Based on the models that we built, we can say that they are promising in diagnosing kidney tumors because of their high accuracy in diagnosis.

Comparison with Other Related Studies
In comparison with the previous works, the proposed methodology and the 2D-CNNs have achieved fruitful results by using CTscans for kidney patients. Tis is the frst research for the detection and classifcation of kidney tumors based on the new data, which can help doctors and radiologists fnd the appropriate treatment plan for kidney tumor patients. According to the data in Table 20, which represents a comparison between our proposed work and the previous work, our research is considered the frst research that utilized bigger data of CT scans. Moreover, it has outperformed the previous works in accuracy reaching a 97% score for tumor detection and a 92% score for tumor classifcation.
Te results of previous studies proved the power of using deep learning approaches in renal tumor detection and classifcation tasks. Researchers in the other studies have operated practical methodologies for a fair comparison and achieved satisfactory results. One of the challenges that researchers faced was the availability of data. Usually, the data on medical images are few in terms of numbers, which leads to high risks of overtraining and subsequently reduced performance. Some solutions that can help mitigate this problem are using smaller models and augmenting the data. Also, there is more than one study on the same dataset,        which afects the limitations of the studies. Because of the challenges of collecting and building data, it takes time and efort, especially pulling the data. In addition, it must be ensured that the data are properly structured in continuous cooperation with specialists. Although obtaining new data is difcult, there is a need for a new expanding dataset to perform a complete diagnosis that covers the limitations of the diagnosis to cover all aspects of diagnosis, not only tumor detection and classifcation of tumor type but also for classifcation of tumor subtypes, stage, and segmentation in one operation.

Discussion and Conclusion
Tis paper uses four methods, VGG16, ResNet50, and two diferent modifed 2D-CNN models, to study the patient's situation with kidney tumor injury and defne the kidney tumor type. Based on renal CT scans, the features extracted helped recognize the image class (Normal/Tumor and Benign/Malignant) by training and testing methods. For our novel dataset, the results proved the efectiveness of our proposed 2D-CNN models, where the accuracy for the detection models VGG16, ResNet50, and 2D-CNN reached 60%, 96%, and 97%, respectively. On the other hand, the classifcation 2D-CNN model got 92%. For revealing the specifc characteristics of a kidney tumor, the data of patients need to be collected. After all, the process of labeling, building data, and converting the image format takes time. In addition, the images are drawn manually for each patient and need to cooperate with radiologists to validate the data. Several of the previous studies did not take more than one image of the patient. As a result, there are limitations in the diagnosis and studies; ideally, the data set should be more extensive. Tus, the precise composition of our data set is impressive since it does not contain missing data and carries valuable information from the metadata. Besides, the images cover multiple aspects of diagnosis. Tere are 70 images for each patient in which kidney problems can be predicted, including tumors and stones, cysts, and other tumors in the nearby organs.
Some challenges were encountered in this study, summarized in several points; a process of manually data collection, segmenting video and converting the image from DICOM to JPEG, image selection, text data building, data    labeling, and missing data, where we encountered technical problems and re-collected data for some patients, overftting problems, and the need for high-performance servers. Te main contributions of this paper can be summarized as follows: originating new datasets from a Jordanian hospital consisting of text data of clinical reports and sequences images of CT scans, a case study, and statistical analysis of kidney tumor cases in one of the most important hospitals in northern Jordan, exploring the performance of the modifed 2D-CNN models for the tumor detection and classifcation task, enhancing the diagnosis of patient conditions with high accuracy, reducing the doctor's and radiologist's workload, and providing them with a tool that can automatically assess the condition of the kidneys, support a better understanding of the evaluation results, and predict the presence of tumors in any patient. Besides, the results of the models can reduce the risk of misdiagnosis. Furthermore, increasing the quality of healthcare service and early detection can change the disease's track and preserve the patient's life.
Our future work includes further optimizing the detection performance and accurate extraction of renal tumors from CT scans and additionally making classifcation tasks for the tumor subtypes that we have identifed and other multiple diagnostic studies such as classifying tumor stage and segmenting the tumor in both kidneys. We look forward to having a full diagnosis of this new data toward having a robust standard for intelligent diagnosis of kidney tumors.

Data Availability
Data are available from the authors upon reasonable request.

Ethical Approval
Tis article does not contain any studies with human participants or animals performed by any of the authors.

Consent
Informed consent was obtained from all individual participants included in the study.