Region-Based Segmentation and Classification for Ovarian Cancer Detection Using Convolution Neural Network

Ovarian cancer is a serious sickness for elderly women. According to data, it is the seventh leading cause of death in women as well as the fifth most frequent disease worldwide. Many researchers classified ovarian cancer using Artificial Neural Networks (ANNs). Doctors consider classification accuracy to be an important aspect of making decisions. Doctors consider improved classification accuracy for providing proper treatment. Early and precise diagnosis lowers mortality rates and saves lives. On basis of ROI (region of interest) segmentation, this research presents a novel annotated ovarian image classification utilizing FaRe-ConvNN (rapid region-based Convolutional neural network). The input photos were divided into three categories: epithelial, germ, and stroma cells. This image is segmented as well as preprocessed. After that, FaRe-ConvNN is used to perform the annotation procedure. For region-based classification, the method compares manually annotated features as well as trained feature in FaRe-ConvNN. This will aid in the analysis of higher accuracy in disease identification, as human annotation has lesser accuracy in previous studies; therefore, this effort will empirically prove that ML classification will provide higher accuracy. Classification is done using a combination of SVC and Gaussian NB classifiers after the region-based training in FaRe-ConvNN. The ensemble technique was employed in feature classification due to better data indexing. To diagnose ovarian cancer, the simulation provides an accurate portion of the input image. FaRe-ConvNN has a precision value of more than 95%, SVC has a precision value of 95.96%, and Gaussian NB has a precision value of 97.7%, with FR-CNN enhancing precision in Gaussian NB. For recall/sensitivity, SVC is 94.31 percent and Gaussian NB is 97.7 percent, while for specificity, SVC is 97.39 percent and Gaussian NB is 98.69 percent using FaRe-ConvNN.


Introduction
One of the most common types of cancer in women is Ovarian Cancer (OC). In year 2018, 295,414 women were diagnosed with ovarian cancer, which resulted in 184,799 deaths around the world. Since early-stage tumors are often asymptomatic, most women with ovarian cancer have advanced disease at the time of diagnosis, resulting in a lower long-term survival [1]. Although ovarian tumors are chemosensitive as well as exhibit initial success against platinum/ taxane treatment, the 5-year recurrence rates in patients with advanced illness are 60 percent to 80 percent [2].
It is characterized by early-stage symptoms that are modest and a low survival rate. Te most common as well as dangerous gynecologic cancer is OC. Serous, mucinous, Endometroid, and clear cell ovarian cancer are four subtypes of primary epithelial ovarian carcinoma [3]. According to earlier research, one out of every 54 women can acquire OC.5-year survival rate for a patient diagnosed with OC is roughly 48.6% [4]. Te low survival rate is largely attributable to cancer discovery at an advanced stage, with 72 percent of patients diagnosed at stage III or IV. As a result, early detection is critical. Attempts to detect OC in the preclinical stage have been made in the past, employing both medical imaging and blood markers. Although these biomarkers show promise, they have a number of drawbacks, including missing classifcation, sluggishness, and longer working hours [5].
Although the Serum Carbohydrate Antigen (CA125) is commonly utilized, its accuracy rate is low due to its high sensitivity. Ultrasound imaging, MRI, and CT scan are some of the imaging modalities that are used to locate and characterize tumors. Early detection of any medical condition, particularly cancer, is critical for improving survival rates. Medical imaging is one of the most successful ways for early-stage diagnosis, prediction of brain imaging modalities, monitoring cancer stages, and follow-up procedures after cancer therapy, according to studies. Manually interpreting the data from these medical photographs is timeconsuming and prone to human error [6]. Te normal ovary and origin of three types of ovarian cancer have been shown in Figure 1. Tere are three main types of ovarian tumors.

Epithelial Tumors.
Tis type of tumors is derived from the cells on the surface of the ovary. Tis is the most common form of ovarian cancer and occurs primarily in adults.

Germ Cell Tumors.
Tis type of tumors is derived from the egg producing cells within the body of the ovary. Tis occurs primarily in children and teens and is rare by comparison to epithelial ovarian tumors.

Stromal tumors.
Tese tumors are rare in comparison to epithelial tumors and this class of tumors often produces steroid hormones.
In addition, computer-aided diagnosis (CAD) methods are frequently employed to assist physicians and pathologists in better analyzing the outcomes of medical images. ML methods are employed in a CAD-based medical imaging strategy for cancer detection [7]. Feature extraction is a crucial stage in the machine learning approach [8]. In the literature, many feature extraction approaches have been examined and analyzed in the context of various MRI, CT, and ultrasound images. Previous work has focused on generating worthy feature descriptors and ML methods for context learning from various types of medical images. Tese methods have some drawbacks that limit the use of CADbased medical diagnostic procedures. In this study, we focus on representation learning rather than a learning-based strategy to solve the shortcomings of CAD-based systems. Deep Learning learns from image data using hierarchical feature representation, a form of representation learning technique [9]. Te image data itself is used to produce highlevel feature representation. Te deep learning approach has gained huge profts and success in diferent applications like image recognition, objection detection, speech processing, and many others [10] with the addition and support of considerable parallel architecture and GPU. Physicians classify patients' symptoms into one of several illness classes based on their understanding. Learning categorization model is a learning challenge in this study for ovarian problems. A broad classifcation approach was discovered through data analysis. Training data containing cases such as objects or instances is characterized using attribute vectors (features or variables). It could be either quantitative or qualitative in nature. In supervised learning, mutually exclusive cases, as well as class data, are employed for learning, when all cases have the same attribute vector from the same class.

Literature Survey
Recent research [9] has demonstrated that combining genetic data with pathology images to diagnose tumors is very efective. For predicting breast cancer outcome, researchers [10] combined pathology images and genetic data. To connect the heterogeneous data of two modalities, a multiple kernel learning method was used. Method had an accuracy of 0.8022 and a precision of 0.7273. M2DP, a multi-modal task feature selection technique for cancer diagnosis, was introduced in [11]. Method was tested utilizing a breast cancer benchmark and a lung cancer benchmark, with an accuracy of 72.53 percent and 70.08 percent. In [12], the author proposed a various kernel strategies for forecasting lung carcinomas by combining genomic data with pathological aspects of images, with an accuracy of 0.8022 [13]. Using a DL model, the machine fully extracted deep features from the gene as well as image modalities, and then combined the disparate features utilizing weighted linear aggregation. Te accuracy of prediction was 88.07 percent. Recent developments in CNN and other DL methods have profound implications in medical diagnostics. For histopathologic analysis of prostate cancer, the author [14] used a deep residual CNN. Te model correctly classifed the image patches into benign and malignant at a coarse level of 91.5 percent of the time. Using residual networks, study [15] presented a method for automatically classifying brain cancers (ResNet50 architecture). On a patient-by-patient basis, the model accuracy was 0.97. For classifying dermoscopy images, author [16] utilized deep GoogleNet Inception. Precision was 0.677 [17]. DenseNet-161 and ResNet-50 were used in this study. Te F-score of the DenseNet-161 model was 92.38 percent, while the accuracy was 91.57 percent. However, many jobs in the realm of medical applications are reliant on long-range interdependence [18]. RNN methods are the most popular approaches for learning longitudinal data in depth. LSTM [19] is an RNN version that captures both LSTM dependencies within sequential input. Te F-score for the method was 0.8905 [20]. Table 1 depicts the comparative analysis of proposed and existing techniques.

Research Methodology
Tis section discusses the proposed technique in ovarian cancer detection based on segmentation and classifcation using deep learning architectures. Overftting and other errors might occur if the training sample is too small. To enhance classifcation accuracy, we increased the sample size in our study by manipulating images [22]. Image enhancement and rotation are examples of image manipulation. To improve sample sizes, we rotated original input images from 0°to 270°in 90°steps around their center point.
Our two groups of data produced two separate recognition models [23]: one used the original image dataset as training data without image segmentation and other utilized the image dataset as training data, with a sample size 11 times larger than the original image dataset. Figure 2 depicts the architecture of this study process [24].
Te input image was divided into three categories: epithelial, germ, and stroma cells. Te image is frst preprocessed for noise reduction as well as fltering. Tis image was manually tagged as well as trained utilizing the standard training model [25]. Tis study used a NN known as the  FaRe-ConvNN to compensate for hand annotation. Using FaRe-ConvNN, an object is detected using a trained image and a manually segmented image. Since the convolution is used to detect edges, both features are annotated based on region. Image segmentation has annotated contextual features. Te accuracy of disease detection utilizing computerassisted diagnosis is higher than that of manual detection. Gaussian NB and SVC are utilized for classifcation when FaRe-ConvNN is applied [26]. Trough various processing techniques or combinations of multiple processing, such as random rotation, shifts, shear, fips, etc., image augmentation artifcially generates training images. Te validation error must drop along with the training error in order to create useful Deep Learning models. Tis can be done very well with data augmentation. Te distance between the training and validation sets, as well as any upcoming testing sets, will be minimized because the augmented image will represent a wider range of potential image locations. Te suggested method seeks to enhance segmentation outcomes by creating a new MRI image dataset from an existing MRI image dataset. In this work, the segmentation of the ovarian imaging collection is specifcally discussed. It is stated that the segmentation task entails locating the pixels that belong to the ovarian cancer image and separating the nuclei from the surrounding tissue. We can totally rearrange the pixels in an image enhancement process that involves fipping an image horizontally or vertically while maintaining the features. Images may be at a range of angles, though they are unlikely to be upside down. Each image may be rotated by a diferent amount. Te majority of the image's pixel values have changed from the original image's values.
Incorrect pixel values that are randomly distributed throughout the image can also be used to add noise to the image. Each image in the training set can be enhanced using standard augmentation methods like fips and rotations without requiring manual image processing. Batches of photos are pulled from the directory by "Image-DataGenerator," which then applies transformations like "vertical fip," "horizontal fip," or "rotation range." Te enhanced images frst go through pre-processing to improve them before computation. Te pre-processing step primarily results in the collection of images under various methods of image examination. It changes the applied image into a new one that is essentially identical to the applied image, with a few minor diferences. Resizing, masking, segmentation, normalization, noise removal, and other preprocessing procedures are some of them. By downsizing the photos and fltering the noises that are present in the image, this study preprocesses the applied product images. Each image is changed to its default size of 300 × 300 pixels before being resized. Resized photographs are sent to the fltering process so that the product images can produce better results.  behind convolution is to extract features from an image while keeping the spatial relationship between pixels as well as learnt features inside the image by using small, equal-sized tiles. For an MN 3 input image with K flters of size I J in the frst convolutional layer, where I <<< N and 3 indicate the color channels. Every element from input image as well as flter matrix undergoes a mathematical process, which results in the learned features [4]. Tis is what it means:

Cancer Detection
where y l i,j is output of layer l, w is weight of flter f which is used at layer l-1. To put it another way, the flter slides through all of the image's elements and multiplies each one, resulting in a single matrix called a feature map. Size of the feature map matrix is determined by the depth and stride.
Additionally, an activation function known as ReLU is commonly utilized to introduce non-linearity to CNNs, allowing them to learn nonlinear models. Tis rectifer approach is most commonly utilized since ReLU considerably enhances CNN object identifcation performance [27].

Pooling Layer.
Pooling is one of ConvNet's unique concepts, as previously noted. Pooling step's goal is to lower the dimensionality of every feature map by removing noisy, unnecessary convolutions and computation networks while keeping the majority of the critical data. Tere are several types, including Max, Sum, and Average, but max-pooling is the most popular and recommended [28]. In max-pooling, a spatial neighborhood is constructed, and the max unit is obtained from the feature map depending on the flter dimension, which are 2×2 windows, for example. Figure 3 displays max-pooling with a 2×2 window as well as a stride of 2, which reduces the dimensionality of the Feature Map by picking the maximum of each region.

Fully Connected Layer.
It comes directly before the output layer in a ConvNet and functions like a standard NN at end of convolutional as well as pooling layers. Each neuron on a fully connected layer is coupled to each neuron on the layer before the FC layer. Te FC Layer's goal is to utilize the preceding layer's output features to classify images using the training dataset. In essence, a CNN's fully connected layers act as a classifer, with convolutional layer outputs serving as classifer's input [29]. Figure 4 shows the unifed structure of FaRe-ConvNN and RPN. Modern object detectors have anchor boxes as a standard feature. A rectangular box is acquired for every object in an image during object detection, resulting in many boxes of varied shapes as well as sizes in every image.
Te images are frst separated into grids. Te reason for this is that medical photographs are typically quite large. You also want to make sure that each grid has labeling (done by specialists). CNN [30] is trained on each grid. Each grid is provided with a mask that states "cancerous" or "non-cancerous" when it is transmitted. Ten, as you glide through each grid, train the NN to recognize each grid's mask.

Region Proposal Network.
Feature map is RPN's input, while the output is a series of rectangular object proposals, each with an objectless score [31]. Te selective search takes 2 seconds per image to propose a region, whereas RPN takes only 10 ms. Anchor boxes with three aspect ratios and three scales are used by FaRe-ConvNN. As a result, there are 9 anchor boxes for every pixel in the feature map. A simple convolution layer with a kernel size of 3 * 3 is followed by two FC layers in the architecture. 1 * 1 convolutional layers are used to create this fully connected layer. Te classifcation layer's output size should be 2 * 9, whereas the regression layer's output size should be 4 * 9. For each pixel in the feature map, the total number of predictions [32] will now be (4 + 2) * 9 * (H * W).

Loss Function. Loss function utilized in FR-CNN is
represented by the following: As previously stated, regression ofset is determined using the closest anchor box. Anchor boxes are now acting as region proposals, which is related to the region proposal technique. At training time, all anchor boxes do not contribute to loss. Positive labels are given to anchors with the biggest IOU with ground truth as well as IOU overlap Contrast Media & Molecular Imaging greater than 0.7. Te training purpose is not served by anchors that are neither positive nor negative. Anchors that cross borders are also ignored [33]. Consider data y j with labels z j such that × � (y j , x j )|y j ∈ R m , z j ∈ R n , j � 1, . . . M a FaRe-ConvNN fnds function f DNN : R m ⟶ R n to weave via data such that f DNN (y j ) � z j as much as possible through the utility of 3 various parts: An activation function ρ: R m i− 1 ⟶ R m_ i x (i− 1) ∈ R m i− 1 would satisfy by the following: Objective function is Min(F) from (4) where where Decision variables and constraints have been given by following: Consider an afne approximation to f(X) � ‖Y− H c XH ⊤ r ‖ 2 F in (2) at X l and the following update rule: where ‖X − X l ‖ 2 F serves as proximal term is expressed as follows: where P ] is the proximal operator corresponding to G and ] � λη. From (9) and (10), proximity operator is defned.
where sgn(·) denotes signum function. Te update that results is identical to usual iterative ISTA. Utilizing 11 gradient expression, (12)is expressed in following form: Te activation function ψ is defned as a linear combination of K DoG, and linear function is defned as the following equations: where , where Y q � H c X q H ⊤ r + ξ q * Random noise vectors ξ q are considered to be equally distributed as well as independent. Let c l ∈ R K , l � 1 be coefcients of LET activation in layer [34]. By reducing squared estimation error over all training examples, the optimal set of activation specifcations c * is obtained as follows: Gradient of J (c) with respect to c is required for optimization. Unless a very tiny step size is specifed, optimization of J (c) utilizing vanilla GD tends to diverge. We get around this problem by pointing out that the Hessian does not have to be computed directly. To train the network's parameters, all that is required is the Hessian-vector product. Search direction δ * c is determined in Ith iterate ci of HFO by minimizing a second-order Taylor-series approximation J(c) to actual cost J (c) by the following: where g i � ∇J(c)| c�c i , H i � ∇ 2 J(c)| c�c i , and δ c is search direction to be selected optimally at each iteration by reducing a normalized quadratic approximation by (17): Te dimension of the position vectors is d. In the range [0, 1], R1, r2 are two random values, β is a constant, and Γ(x) � (x − 1)! Equation (18) is used to determine the ftness function here: Te level set function ϕ is a surface that is positive inside region Ω, negative outside the region Ω, defned over image space. Te level set equation in its most general form is as follows: 3.1.6. Training. A CNN should be trained on a large database of images in order to attain low error rates. Backpropagation is utilized to train CNN by computing the gradient required for updating the network's weights. Depending on which layer is being taught, there are number of diferent steps to train the CNN [35]. Te backpropagation mechanism is used in the FC layer. Te squared error loss function, as shown in (1), must frst be used to estimate the error or cost function indicated E(yL) at the output layer as follows: target n k is the nth training example target of class k, while y n k is the actual output from last layer. Derivative of error function is partial derivative from output layer [25], as shown in the following: For every input to current neuron, (zE/zx 1 j ) usually known as delta must be determined.
σ(x 1 j ) relates ReLU function σ is used to x 1 j which is input to the current neuron. After you've completed this for all neurons, you'll need to calculate the errors from the previous layer.
where w 1− 1 ij is weight connected to input x 1 j in next layer. Ten, until input to the frst completely linked layer is reached, (7) and (8) are repeated, resulting in the network's higher reasoning [36], or dense layers, training on one training sample. (9) represents the change in weight, which is supplementary to old weight: where η is the learning rate.

Backpropagation-Max Pooling
Layers. Backpropagation in convolutional layers is varied from that in the FC layer. Gradients for each weight in FC layers must be modifed for the current layer. Since convolutional layer shares weights, each x − 1 i,j expression with weight w ab must be included. Te gradient component for individual [37] weights are computed using the chain rule in the following way: zE/zw ab . Tis entails calculating afection of the loss function E based on a single pixel change in the weight kernel:

Contrast Media & Molecular Imaging
Since the error at the current layer is already known, deltas may be simply determined by calculating the derivative of activation function. Te activation function is max(0; x 1 ij ) answer one or zero except for x 1 ij � 0 when its derivative is not defned [38]. Following that, error must be transmitted to the preceding tier. Tis is accomplished once more by utilizing chain rule as shown in the following equation: Tis equation represents a convolution in which w ab has been fipped along both axes. It is also worth noting that this will not work for values at the top as well as the bottom.

SVC (Support Vector Classifer).
Consider set x � xi1, x2, . . . , xn { }with xi ∈ Rp has a convincing pattern, that is grouped into positive class and negative class. Te data are then considered as follows: where w is weights vector and b is scalar. Te problem of hyperplane H0 is same as fnding optimal split feld with the biggest margin value, which is expressed as follows: Te linear case data in (29)classifcation nicely demonstrates how data can be divided into two types. Terefore, the slack variable si needs to be added so that yi(wT.xi+) ≥ 1 − si is obtained.
C helps to reduce the model's complexity and minimize training errors. Te optimization issue in equations (11)-(13) is written as an optimization [39] issue without constraints in (32) using the Lagrange function, as follows: Non-negative variables are known as Lagrange Multiplier where ≥ 0. Goal of (14) is to reduce Lp to w and b while simultaneously increasing Lp to w. Te dual issue of (14) can be solved using partial derivatives Lp to w, b, and s as follows: with constraints 0 ≤ a i ≤ C, with i � 1, . . . , n and n i�1 a i y i � 0.
3.1.9. Gaussian NB (Naives Bayes). Nave Bayes is a simple and quick classifcation method that relies on the Bayes theorem and is expressed by the following: Tis classifer assumes that each variable contributes equally to the outcome on its own. Every characteristic is now independent of the others, and output is likewise afected by the same weight. As a result, the Nave Bayes theorem cannot be applied to real-world problems, and when this approach is utilized, only poor accuracy is obtained. As a result, Gaussian NB is utilized, which assumes that features have a normal distribution. Features have a conditional probability and are presumed to be Gaussian. Equation (36) gives the Gaussian NB as follows:

Performance Analysis
Tis section compares the suggested strategy for ovarian cancer diagnosis to existing methods and analyses the performance analysis. Te model's performance is represented by a confusion matrix that includes true negatives, true positives, false negatives, and false positives.

Database Description.
Te suggested classifer was tested on single-cell blood smear samples obtained from the Cancer Imaging Archive database [40]. Cropped sections of epithelial cells, germ cells, and stromal cells can be found in Cancer Imaging Archive database. Te Cancer Imaging Archive database's grey level attributes are virtually identical to Cancer Imaging Archive database, but with a larger dimension.

Contrast Media & Molecular Imaging
Te confusion matrix of Gaussian NB employing FaRe-ConvNN is shown in Figure 5 above, where rows indicate predicted class and columns represent an actual class of ovarian cancer data. Trained network that is correctly and erroneously classifed is represented by the diagonal blue and white cells. Te right-hand column represents each anticipated class, whereas the bottom row refects each actual class' performance [41]. Tis confusion matrix plot for Gaussian NB utilizing FaRe-ConvNN reveals that the overall classifcation performance is 98.69 percent correct.
Te confusion matrix of SVC employing FaRe-ConvNN is shown in Figure 6, with rows and columns indicating predicted as well as actual classes. Tis SVC confusion matrix plot utilizing FaRe-ConvNN reveals that total classifcation accuracy is 97.39 percent. Te anticipated class is represented by the right-hand column, while the performance of each actual class is represented by the bottom row. To make it easier to examine the performance, zeroes are added here [42]. Few couples are frequently misidentifed as a result of this confusion matrix. Te analysis of SVC, as well as Gaussian NB with various specifcations, is shown in Table 2.
Graphical depiction based on settings for Gaussian NB as well as SVM utilizing FaRe-ConvNN is shown in Figure 7. Precision, recall/sensitivity, and specifcity are the metrics that have been determined in percent. SVC precision is 95.96 percent, whereas Gaussian NB precision is 97.7 percent, with FR-CNN enhancing precision in Gaussian NB. For recall/ sensitivity, SVC is 94.31 percent and Gaussian NB is 97.7 percent, while for specifcity, SVC is 97.39 percent and Gaussian NB is 98.69 percent using FaRe-ConvNN. Te Gaussian NB technique in classifcation utilizing FaRe-ConvNN delivers an improved predicted class in ovarian cancer detection, as discussed above. Te parametric values acquired by various approaches are compared in Table 3. Figure 8 compares existing and proposed techniques in terms of precision, recall, and specifcity. CNN [43] has an

Conclusion
In comparison to existing methodologies, the classifcation method of both SVC, as well as Gaussian NB utilizing FaRe-ConvNN, delivers a precision value of more than 95%, according to the performance analysis. Utilizing this proposed FaRe-ConvNN, 97 percent to almost 99 percent precision was acquired from the projected class utilizing this classifcation technique. Based on the results of the suggested method, it can be stated that this OC detection classifcation method is a signifcant contribution to the medical sector, assisting clinicians in making more precise decisions and treating patients more efectively. Tere is still scope for research contribution that too in the experimentation of diferent deep learning models and their hyperparameters optimization for achieving promising and trustable results. Also, intermediate results of CNN can be analyzed and further inferences can be derived for further research.

Data Availability
Te data that support the fndings of this study are available on request from the corresponding author.

Conflicts of Interest
Te authors declare that they have no conficts of interest to report regarding the present study.