A Deep Learning Fusion Approach to Diagnosis the Polycystic Ovary Syndrome (PCOS)

One of the leading causes of female infertility is PCOS, which is a hormonal disorder afecting women of childbearing age. Te common symptoms of PCOS include increased acne, irregular period, increase in body hair, and overweight. Early diagnosis of PCOS is essential to manage the symptoms and reduce the associated health risks. Nonetheless, the diagnosis is based on Rotterdam criteria, including a high level of androgen hormones, ovulation failure, and polycystic ovaries on the ultrasound image (PCOM). At present, doctors and radiologists manually perform PCOM detection using ovary ultrasound by counting the number of follicles and determining their volume in the ovaries, which is one of the challenging PCOS diagnostic criteria. Moreover, such physicians require more tests and checks for biochemical/clinical signs in addition to the patient’s symptoms in order to decide the PCOS diagnosis. Furthermore, clinicians do not utilize a single diagnostic test or specifc method to examine patients. Tis paper introduces the data set that includes the ultrasound image of the ovary with clinical data related to the patient that has been classifed as PCOS and non-PCOS. Next, we proposed a deep learning model that can diagnose the PCOM based on the ultrasound image, which achieved 84.81% accuracy using the Inception model. Ten, we proposed a fusion model that includes the ultrasound image with clinical data to diagnose the patient if they have PCOS or not. Te best model that has been developed achieved 82.46% accuracy by extracting the image features using MobileNet architecture and combine with clinical features.


Introduction
Polycystic ovary syndrome (PCOS) is the most prevailing women's health issue that afects 5 to 10% of women during reproductive age [1].PCOS is an endocrine disorder and a common cause of infertility [2].Infertility is defned as the failure in the process of releasing the egg from the ovary.Tere are many reasons for infertility; one of them is the outgrowth of an unusual number and volume of follicles during the ovulation phase, which is considered the frst symptom of PCOS [3].Ovarian follicles are small cysts that contain a fuid found inside a woman's ovary.Signs and symptoms of PCOS include increased acne, excessive body and facial hair, alopecia, overweight, infertility, and irregular or no periods [4].Te cause of the disease is unknown, while some researchers believe that increased androgen production in ovarian theca cells is the fundamental problem [5,6].Tis hormonal disorder has a massive debate on diagnosis criteria, but the clinical validation of PCOS is usually considered the criteria followed at the Rotterdam workshop in 2003 [7].Rotterdam criteria is based on three aspects.If two of them occur, then PCOS may be diagnosed.Te three aspects are as follows: (1) high level of androgen (i.e., male sex hormone), (2) oligomenorrhea, and (3) polycystic ovaries (PCOM).Te ovary ultrasound image is one of the main tools that can be utilized to predict PCOS at the earliest stage.Tis image of the ovary contains essential information, such as the number, volume, and position of the follicles [4].
Ultrasound has been the most popular imaging modality in the clinical examination of the patient with ovarian pathology.Ultrasound has several advantages over other medical imaging methods such as computed tomography (CT) and Magnetic Resonance Imaging (MRI).Te reason behind that is that ultrasound is of low cost, accessible, and safer and provides real-time results.Tis imaging technique ofers a great opportunity to develop a deep learning model for automatic analysis; it makes this examination more objective and accurate in diagnosis.Deep learning is a powerful approach utilized in imaging analysis and computer vision [8][9][10].Automatic classifcation of PCOS in the earliest stage based on the ultrasound images and clinical data helps in disease recognition.
Diagnosis of the PCOS has diferent criteria and symptoms that require blood tests, ultrasound tests, and high-quality menstrual data.Terefore, because of the variety of symptoms associated with this syndrome and the absence of a single diagnostic test or method used by clinicians to evaluate patients, medical practitioners are compelled to suggest many clinical test results and unneeded radiological imaging.Consequently, PCOS is diagnosed by excluding irrelevant symptoms or test results, owing to the lack of understanding of its complicated pathomechanism.At the same time, early diagnosis and detection of PCOS with the least number of lab tests and imaging procedures are critical because such a condition leads directly to ovarian dysfunction.In turn, it increases the hazard of infertility, abortion, or even gynecological cancer, as well as mental anguish for patients owing to the wastage of time and money [11].Currently, the diagnosis of the polycystic ovary morphology in the ultrasound images is performed manually.It is based on specialists' accumulated knowledge and experience in recognizing the morphology and characteristics of the ovary ultrasound images.Te specialist's decision after examining the same case's images could sometimes be subjective and variable.Furthermore, ultrasound diagnosis accuracy is higher when conducted by professionals in comparison to less experienced doctors.However, specialist examiners are very limited in number [12], specifcally in underdeveloped areas.Moreover, a radiologist's estimate of the follicles' numbers and sizes in the ovary images is carried out manually.Tus, detecting PCOM is a time-consuming and challenging task for radiologists because of the various sizes of follicles and their association with veins and tissues.Furthermore, it makes the images to contain artifacts and speckle noise [13].Te uncertainty in diagnosis can have a long-term impact of women's fertility and hormonal unbalance.Furthermore, this manual framework of diagnosis could increase the mistakes of the examination, which causes inconvenience for the patient.Tus, it is recommended to propose intelligent computer-aided systems that can ofer decision support tools to gynecologists.Using a deep learning model to diagnose women's clinical data and ultrasound images if they have a PCOS.It will also help to overcome the barriers of manual examination of ultrasound images and the assessment of patient clinical data.
Recent implementations of machine learning and deep learning in medical ultrasound images have included diverse activities such as classifcation [14], segmentation [15], and detection [8,16].Ultrasound images of various regions of the body have been used in CAD to diagnose diferent types of illnesses that can threaten human life, such as breast cancer [17], hydronephrosis [18], and prostate cancer [19].Moreover, many contributions have been carried out by other researchers in order to identify PCOS by using ultrasound images [4,13,14,16,[20][21][22][23][24][25].Several machine learning and deep learning models have been implemented to perform ovary ultrasound image analyses for diagnosis systems, such as SVM [24], NB [22], CNN [20,21,25], and VGG-16 [16].However, many studies used other kinds of data rather than ultrasound images to diagnose PCOS, such as clinical data [11,26,27] and ultrasound reports in text format [28].Moreover, we found that U-Net, VGG-16, and GoogLeNet are some of the CNN architectures achieved signifcant results for various computer vision tasks such as classifcation and segmentation of images, as shown in [29,30], and [16].Consequently, a deep learning approach was adopted to make the training easier and achieve better performance due to the advantage of deep learning not requiring manual extraction of features from the images.Te CNN can automatically extract the features during model training.Moreover, it has been observed that most of the previous works ([20, 21, 29-31]) used a private dataset to build their proposed model.However, the study of [25] used an open-source dataset, but the study of [25]contained 3D images which is not suitable for our study.Moreover, most of these studies, such as [4,22], and [21], used a small-size dataset to build their model.In general, it has been noticed that there is a shortage of diagnosed PCOS by using ultrasound images.Also, we noticed that the clinical and metabolic data have been used in [26,27], and [11] to diagnose PCOS patients by using machine learning algorithms.Nonetheless, the diagnosis of PCOS based on Rotterdam criteria includes a high level of androgen hormones, failure of ovulation, and polycystic ovaries on the ultrasound image (PCOM).Tere is a lack of prior research studies that proposed a deep learning-based fusion model that is able to explore the impact of clinical data along with the ultrasound images to diagnose PCOS.In recent medical imaging literature, there has been a tendency toward using both health data and images in a "fusion-model" to solve complicated problems that a single modality cannot solve, such as skin cancer [32], breast cancer, [33] and Alzheimer disease [34].Terefore, there is still a need to enhance the model toward achieving better accuracy in PCOS diagnosis by using ovary ultrasound images and clinical data.All the 2 Applied Computational Intelligence and Soft Computing discussed studies contribute to the process of building the proposed model to detect PCOS.Te main objective of this research is to develop a computer-aided diagnosis CAD model for PCOM diagnosis in order to assist a radiologist in classifying ovary ultrasound images with the aim to reduce the false positive rate and increase the accuracy of the model.Moreover, explore the impact of the clinical features in the diagnosis the PCOS with ultrasound images using a deep learning fusion approach to assist the doctor and radiologist in making better clinical decisions.Te rest of this paper is organized as follows.Section 2 discusses the methodology that has been proposed in this paper.Section 3 presents the empirical study with the result that has been obtained.Section 4 provides a discussion of the results, and Section 5 illustrates the conclusions.

Materials and Methods
Te section provides a detailed description of the dataset used in this research to develop and test the diagnosis model.Ten, the proposed model used to diagnose the PCOM using the ultrasound images has been discussed.Also, we discussed the fusion techniques that will be used to combine the ultrasound image with clinical data to build the framework to diagnose PCOS.

Dataset.
In this work, we used the dataset that contains ovary ultrasound images and the clinical data collected from King Fahad Hospital of the University (Khobar, Saudi Arabia).In cooperation with the Department of Radiology, four radiologists were assigned to work on this research who reviewed a total of 1250 patient fles that include those patients with documented polycystic ovaries (250) and those who were normal or had other problems in their ovaries (1000).For some of the patients, multiple ultrasound scans were available.However, in the study, only the latest scan of the patient was used.Moreover, only images where the ovary is clearly imaged were selected to be part of the study.For categorization, radiologists classifed the chosen images into two groups: a normal morphology (non-PCOM) and those showing sonographic morphology of polycystic ovary (PCOM).Te polycystic ovary morphology was defned as an ovary containing multiple uniformly sized follicles that are peripherally placed and are below 1 cm in size, as shown in Figure 1.Te images dataset consists of 391 images, 127 PCOM, and 264 normal ovaries non-PCOM.
Te process of collecting the clinical data started after completing the image collection to study the impact of the clinical data on the diagnosis of PCOS.Tis process aims to extract the clinical information from the hospital system for the patients whose ultrasound image has been collected in the previous step.Te features were selected with the help of expert opinions and by taking into account recent studies that identifed the signifcance of those attributes on PCOS diagnosis in some manner [11,22].During the clinical data collection process for 391 patients whose ultrasound images were already available, we found there are a lot of missing data that require further fltration.Eventually, a dataset contains 22 2. (ii) Te second approach studies the impact of the clinical data with ultrasound images to diagnose PCOS using the data fusion technique with deep learning models.To prepare the image and the clinical data for the fusion model, preprocessing techniques are applied that have been performed in the previous model when using only the image.On the other hand, the clinical data required some preprocessing methods to prepare the dataset, such as handling categorical data, feature scaling, and dealing with missing data.Two models of joint fusion approaches have been developed and evaluated: (a) Te frst is a joint fusion type II, which fuses the clinical features after preprocessing as it is with extracted features from the images; these features were extracted using the deep learning models.
Diferent CNN architectures will be compared to fnd the most suitable model.Te joining features from two modalities will be fed into a feedforward neural network (classifcation part) to give the fnal diagnosis, as shown in Figure 3 Te proposed techniques are divided into subsections, the frst for the transfer learning models and other related techniques such as fusion.

Transfer Learning.
A suitable dataset is essential for the efective operation of any artifcial intelligence framework.Data collection and annotation may be difcult, especially when dealing with medical problems.As a result, many issues do not have big datasets that may be employed in deep learning models.As we can notice, the PCOM dataset contains only 391 ultrasound images of normal and polycystic ovaries.In this context, the idea of transfer learning (also known as knowledge transfer) arises.Basically, it reuses the CNN's model developed for a specifc task and uses the same weights as a starting point for another problem that has a limited number of images [35].Transfer learning is a deep learning technique where a model is always pretrained and built using the ImageNet dataset [36], which is a dataset with over 14 million samples and then fne-tuned the same model for a diferent issue.Te following subsections describe in detail the transfer learning models for CNN architectures that have been used and evaluated in this study.

VGG: It is one of the CNN architectures proposed in (ILSVRC) competition of computer vision in 2014 by
Simonyan and Zisserman [37].VGG refers to the visual geometry group lab from Oxford University; the main idea of VGG lies in the fact that it gives signifcant improvement when using the small convolution flter with (3 × 3) kernel as compared with a large-sized kernel.Te most common architecture of VGG was VGG-16 and VGG-19, which contains 16 or 19 layers, respectively, [38].It includes three parts convolutional, pooling, and fully connected layers.Te VGG16 has 13 convolutional layers and three fully connected layers.Also, it has fve pooling layers that are distributed after every 2 or 3 convolutional layers.Te main variation between these two models (VGG-16 and VGG-19) was the VGG-19 model has one additional layer after every three convolutional layers.Tis model can be used directly for the classifcation task.Te VGG approach's beneft is to reduce the number of parameters and achieve faster convergence [37].Inception v3: Szegedy et al. [39] frst introduced the "Inception" microarchitecture.In 2014, the Inception V3 architecture is based on Szegedy et al. [40], who introduced modifcations to the inception module to improve ImageNet classifcation accuracy and reduce the computational cost.Inception networks (Goo-gLeNet/Inception v1) have been proven more efcient than VGG based on the number of parameters generated by the network and the cost of the memory and other resources.If any changes are made in inception network should be very careful to secure that the computational advantages are not lost.Tus, there is a barrier in the inception architecture to adapt for diverse use cases that causes the uncertainty of the network's efciency.Several techniques of improving the network have been proposed in an Inception v3 model to loosen the barriers and restrictions for more efortless model adaptability.Factorized convolutions, regularization, dimension reduction, and parallelized computations are among the techniques used [40].Te Inception v3 consists of a 42-layered deep neural network.Te Inception v3 network architecture adopts a convolution kernel splitting technique to break huge volume integrals into little convolutions.It is possible to decrease the number of parameters using the splitting approach, which increases the network training speed while the spatial features can be extracted with greater efciency [41].MobileNet: It is a CNN architecture that was developed by Howard et al. and other researchers in 2017 for the classifcation and detection problem [42].Tis architecture is benefcial for mobile and embedded vision applications, and it intends to reduce the number of parameters that will be used in training the model.MobileNet architecture is characterized by being Applied Computational Intelligence and Soft Computing streamlined by using the depthwise separable convolutions that help to construct lightweight networks to decrease the model complexity and size.Depthwise separable convolution includes two main operations.Te depthwise convolution and pointwise convolution [42].Depthwise convolution applies a spatial convolution in each channel.Terefore, it has the same number of output channels as input channels.Pointwise convolution: applied the convolution with 1 × 1 kernel size that combines the output of the depthwise convolution to change the dimension.Te MobileNet architecture is made up of 28 layers [43].DenseNets: Tis model refers to densely connected convolutional networks, which is a CNN that uses dense block to connect all layers directly with each other to make a dense connection between layers.Tis architecture motivates to solve the issue of the standard CNNs when the information path starts from the input layer and passes through so many layers to reach the output layer, that it may vanish by the time reaching the end of the network [44].Te barriers can be handled in this architecture when each layer gets new inputs from all succeeding layers and passes on its feature maps to all following layers to retain the feed-forward nature.Te frst layer connects to the second, third, and fourth layers, and the second layer connects to the third, fourth, ffth, and so on.Te DenseNet has L (L + 1)/2 connection with L layers, whereas traditional CNN has an L connection.DenseNet has many advantages, such as needing less number of parameters than traditional CNN due to this type of connection, enhancing feature distribution, and encouraging feature reuse [44].DenseNets architectures are split into dense blocks, while the dimensions of the feature map remain constant within a block, but a diference in the number of flters within them.Transition layers reside between the DenseBlocks for down sampling to decrease the number of channels.A batch-normalization layer, 1 × 1 convolution, and a 2 × 2 average pooling layer create the transition layers [45].Te DenseNet121 version of DenseNet was pretrained on the ImageNet dataset.Number "121" in the DenseNet121 network refers to the layers with trainable [46].DenseNet121 is made up of many dense blocks that contain a diferent number of layers (repetitions) with two convolutions; one has a 1 × 1 kernel as the bottleneck layer to reduce the feature number and a 3 × 3 kernel to execute the convolution operation.Also, it has many transition layers between the dense blocks.Terefore, Dense-Net121 has one 7 × 7 convolution, 58 3 × 3 convolution, 61 1 × 1 convolution, four average pooling, and one fully connected layer.Our proposed classifer replaces the fnal fully connected layer, as well as the SoftMax activation Tese architectures have been obtained for feature extraction and classifcation task by using the model layers, from the input layer to the last pooling layer (front layers) for feature extraction and can use for the rest of the model layers (fully connected layer) for the classifcation task.

Fusion Technique.
CNNs have achieved signifcant success in a wide range of applications.Combining multimodal fusion with CNNs seems to be a promising area for future research.Data fusion refers to the process of combining and associating the data and information from many modalities to provide more accurate, consistent, and complete information to develop machine learning models that outperformed the individual data modality [47].Data fusion approaches have been widely used in multisensor environments to combine and aggregate data from many sensors; however, similar techniques may also be used in other areas, such as text processing.Data fusion in multisensor settings aims to reduce detection error probability and increase dependability by combining data from various dispersed sources [48].
Tere are many techniques for multimodal data fusion, such as early fusion, joint fusion, and late fusion.Early fusion, also called features level fusion, refers to combining various types of input data into one vector feature for training and then feeding this vector to a single machine learning model, as shown in Figure 4(a).Combining the input can be done using many methods that include concatenation, pooling, or applying a gated unit.Early fusion type I combines the actual features, whereas early fusion type II combines the extracted features whether through manual extraction, image analysis tools, or a learned representation from a neural network [49].Joint fusion which has been used in this research activity, also called intermediate fusion [50], is the approach of feeding learned feature representations Te join fusion was similar to early fusion, except that the loss is fed back to the neural network that is responsible for extracting the feature, which helps to enhance the feature representations in each training reiteration [49].Te joint fusion type I is defned when the input features from each modality are extracted and learned before combining them.However, joint fusion type II does not require the feature extraction step for all input features to be identifed as joint fusion.

Experiment I: Ultrasound Images.
In this part, the experiments to predict PCOM are explained using the ultrasound image dataset.Te following part gives a view of the proposed framework, which goes through several phases, starting by applying preprocessing techniques that aim to improve the dataset's quality.Ten, applying the feature extraction and classifcation task using deep learning architectures.Te dataset was divided into training and testing sets using an 80: 20 holdout to construct and validate the prediction model.Te training set was utilized to learn the model and fne-tune the model's parameters, while the testing set was used to the model for hyperparameter tuning and select the most proper model.

Preprocessing.
Preprocessing is a critical phase before feeding the data to be processed in the feature extraction stage.Tis step will provide data with higher quality and allow essential information included in the image to be retrieved more readily.In the preprocessing phase, the data augmentation and AHE techniques were applied.Te preprocessing step starts by loading the images, and each extracted ovarian image is a 224 × 224-pixel size.Ten, the pixel normalization was applied by scaling all pixel intensities to the range [0, 1].As we have mentioned earlier, since a limited sample size of the current dataset is in hand, there is a need to increase each sample.Te image sample expansion has been carried out using the image data augmentation technique.Tis technique artifcially expands the dataset size by creating modifed copies of images through random geometric transformations, such as fipping, cropping, rotating, and random erasing.Moreover, data  augmentation is helpful for making the model not see and use the same batch of input during each training iteration.Tis will help reduce the overftting problem [51].Te augmented images are generated on the fy while the model is still being trained.
Maheswari et al. [22] have shown the benefts of using AHE for removing noise.Following their recommendations, we applied the AHE for all images before the training phase.Te contrast level is raised using an AHE to distinguish the background from the foreground.Also, it is considered a practical approach to reduce the speckle noise for local minima extraction, as shown in Figure 5. Tese speckles could be causing the reduction of obscures and contrast details more diagnostically [22].After the image goes through the enhancement process, the rest of the model goes through deep learning architectures.

Applying the Deep Learning Architectures.
After obtaining the preprocessing phase, the CNN models may be used to produce a deep network that can properly learn and extract the features in the ovarian ultrasound images.CNN's architectures are applied to predict if the patient has polycystic ovarian morphology or not.Pretrained deep learning models are used to extract the images features, where the features extraction process is carried out in the front layers of the network and then modifed the fully connected layers (last few layers in the network) to produce a deep network that can properly learn and diagnose PCOM using ultrasound images.We perform experiments using 6 well-known CNN architecture: VGG16, VGG19, Incep-tionV3, DenseNet121, DenseNet201, and MobileNet.Tese architectures are selected to be examined and compared with our dataset based on their performance in previous studies.Moreover, they are the most popular and widely used architectures in healthcare felds [8,52].
For all pretrained models that have been used in the experiment, we used their convolutional and pooling layers as it is in the original architecture to extract the image features, but we will not use their fully-connected output layers to perform the predictions task.Instead of that, we construct a new fully connected layer that is suitable for our problem.It means the classifcation layer trained on the ImageNet dataset was removed.After that, the new fully connected layer will be constructed that is suitable for our problem and append on top of the architecture.During the experiment, many modifcations are carried out on the fully connected layer to enhance the performance, such as adding a dropout layer to all hidden layers with a diferent probability, changing the learning rate value, changing the number of dense layers, and testing diferent optimization methods.Te best fully connected model that has been reached is shown in Figure 6.Te models are trained for 100 epochs with a batch size of 32 images.Also to prevent overftting, the dropout was used with a probability of 0.5; it should also be noted that diferent values were tested for the dropout and we found the most suitable results when we use 0.5.Also, the initial learning rate value was 0.00001 which controls how much to change the model in response to the estimated error each time the model weights are updated and applied the "Adam" algorithm for optimization.Adam's optimization method computes individual adaptive learning rates for diferent parameters from estimates of the frst and second moments of the gradient to compute individual adaptive learning rates for diferent parameters [53].Each model architecture has been assessed using diferent metrics such as accuracy f-score, precision, recall (sensitivity), and specifcity.Te code is available upon request.to uneven class distribution in our dataset, we prefer to focus on the F1-score measure, which is achieved the best result than other models.Also, we noticed that models Incep-tionResNet, ResNet_152, and EfcientNet_B3 have not achieved a good result with our dataset; accordingly, these architectures are excluded from the following experiment.Table 3 depicts the confusion matrix for the best accurate model while applying Inception architecture for the feature extraction and classifcation task.Tis table shows the network's overall classifcation rate and accuracy.Te number of correctly classifed instances is shown in the diagonal cells, whereas the number of misclassifed cases is shown in the of-diagonal cells.

Result of Experiment I.
Tis experiment considers only the ultrasound image to diagnose PCOM.Although, it is better to extend the model to include clinical features for accurate diagnosis of PCOS as a syndrome, where images allow only diagnosis of the polycystic morphology.

Experiment II: Ultrasound Images + Clinical Data.
Tis section discusses the PCOS diagnosis model based on combining the ultrasound images with the clinical dataset to construct the fusion deep learning model.To build the fusion model using a multi-input, we need two branches, one for the clinical dataset and the other for the ultrasound images.Te images and clinical dataset are split into training and testing set using 80: 20 holdout for the 285 samples.Before starting the fusion model, the preprocessing phase is required for the images and clinical data to be ready for the fusion deep learning model.For the images dataset, the preprocessing step starts by loading the ultrasound images with the size of 331 × 331-pixel then the same preprocessing techniques are applied that have been performed in the frst model except for data augmentation, which is discussed in Section 3.3.1.Also, the preprocessing phase for the clinical dataset includes many techniques: handling categorical data, feature scaling for the continuous data, and dealing with the missing data.
To evaluate if the fusion of image and clinical features can have an impact on PCOS prediction, we explore and evaluate diferent fusion approaches to combine image features with clinical features.In the beginning, the late and joint fusion have been compared.Te late fusion does not give any promising results because all cases have been diagnosed as negative cases.Ten, the empirical study focused on joint fusion by exploring and comparing different stages to fuse features from multimodal data.Two model architectures have been performed and evaluated as follows.

Joint Fusion Type II.
Te frst approach of the fusion model feeds clinical features as it is in the frst branch after applying the preprocessing techniques.Te second branch performs the deep learning model over the image data to extract the image features.Tis branch compares diferent architectures, including VGG16, VGG19, InceptionV3, DenseNet121, DenseNet201, and MobileNet, to fnd the best model that is applied to diagnose PCOS.Tese two branches will be concatenated together to perform the classifcation part that gives the fnal diagnosis, as shown in Figure 3(a).Te classifcation part consists of 2 dense layers and a dropout layer with a probability of 0.2.

Joint Fusion Type I.
Te second approach applies some dense layers in the frst branch to handle the clinical features to give 250 neurons as output for this branch.Also, for the second branch, after image features are extracted using the deep learning model, two dense layers will be performed to have the same number of neurons, which is 250.Ten, these learned features from two branches will be combined using the same classifcation part in the previous approach to produce a fnal prediction, as shown in Figure 3(b).
Tese models are trained for 100 epochs with a batch size of 10.Also, the initial learning rate value is 0.001, and the "Adam" algorithm for optimization is used.Te code is available upon request.

3.4.
Result of Experiment II.Tis section discusses the result that has been obtained while applying the fusion model that aims to combine the ovary ultrasound images with clinical data to diagnose PCOS.Two approaches are proposed for the fusion model.Te frst one is joint fusion type II which fuses the raw features for the clinical data with image features that are extracted using the deep learning models such as VGG_16, VGG19, InceptionV3, DenseNet121, DenseNet201, and MobileNet.Ten, the classifcation part is applied using fully connected layers.In contrast, the joint fusion type I fuses the learned feature for images and clinical data as input to fully connected layers to make the fnal diagnosis.Table 4 shows the classifcation accuracy, precision, F1-score, recall (sensitivity), and specifcity for the joint fusion model type II while using diferent deep learning models to extract the image features.It is noticed in Table 4 that joint fusion model type II, while the image feature extraction using VGG-16 and VGG-19 outperformed the other models in all metrics.Te result while using the VGG-16 is 77.19%, 61.54%, 71.11%, 84.21%, and 73.68% for accuracy, precision, F1-score, recall (sensitivity), and specifcity, respectively.On the other hand, the model using VGG-19 achieved 75.44%, 80.77%, 75.00%, 70.00%, and 81.48% for accuracy, precision, F1-score, recall (sensitivity), and specifcity, respectively.We noticed that the frst model is higher based on the accuracy and sensitivity, but the model used VGG-19 is more elevated on the precision, F1-score, and specifcity.In medical problems, recall (sensitivity) metrics play an important role due to the cost associated with false-negative will be extremely high, that is because of the complications related to PCOS and the risk of developing health problems in later life will be increased if the PCOS is not diagnosed in the early stage.But F1-score for the second model using VGG-19 is better than the frst model using VGG-16 by approximately 4%, which gives a harmonic mean of precision and recall (sensitivity), and it indicates that we have low false positives and low false negatives.On the other hand, Table 5 presents the result achieved in the joint fusion model type I with the same evaluation metrics.Tis model achieves a better result while using the MobileNet model to extract image features with 82.46% accuracy, 84.62% precision, 81.48%% F1-score, 78.57% sensitivity, and 86.21% specifcity.
Figure 7 presents a detailed comparison between the best model for joint fusion types I and II.Te confusion matrix shows a summary of the prediction results of a classifcation model.Te joint fusion type II has made a total of 57 predictions.Out of 57 predictions, 43 are true predictions and 14 are incorrect predictions.Whereas joint fusion type I has made out of 57 predictions, 47 are true predictions and 10 are incorrect predictions.Based on that, the joint fusion type I outperformed another model, and it is more suitable for our problem to diagnose PCOS using the ovary ultrasound image with clinical data.As noticed, the fusion model for ultrasound images with clinical data to diagnose PCOS gives promising results to develop CAD systems that can assist the doctor in making the right decision.

. Discussion
In this research, three main experiments are performed to develop the CAD model that is applied to assist the radiologist and gynecologist during the diagnosis of PCOS.In the frst experiment, the PCOM is diagnosed based on the ovary's ultrasound image.Te proposed model achieved 84.81% accuracy using the Inception model, as shown in Figure 8. Tis model has achieved good outcomes compared to the result obtained in state-of-the-art studies that invest in detecting the PCOM using the ultrasound images of the ovary.Srivastava et al. [16] achieved 92.11% accuracy using VGG-16, but their work goal is more general and simple because they detect whether an ovarian cyst is present in the image or not, while there are many types of cysts; one of them is PCOS.However, [20,54] achieved 78.1% and 80.84% accuracy, respectively, whereas the current research results are higher than these mentioned research.Moreover, [20,54] performed the classic approach by applying feature extraction using the Gabor wavelet method and then applying a classifcation task using CNN or Elman neural network.Abdullah et al. [3] got higher accuracy of 93.02% than our work, but they also applied the classical techniques using Gabor wavelet as a feature extractor and a modifed backpropagation as a classifer.Also, the authors did not specify the number of images used in the experiment, which is one of the important and infuential criteria.While Cahyono et al. and others [21] achieved a 100% microaverage F1-score using the proposed architecture of CNN, which cannot be compared with current work due to using a limited number of samples in the dataset and an unbalanced dataset (total of 40 non-PCO data and 14 PCO).
Te second experiment aims to explore the efect of the clinical features in diagnosing PCOS using ultrasound images.A fusion deep learning model is developed that fuses the ultrasound image features with clinical features to produce a fnal diagnosis of PCOS.Te best model that has been developed extracts the image features using MobileNet architecture and joins them with learned clinical features.Tese combined learned features are fed to fully connected layers to apply the classifcation task and provide the fnal diagnosis.Te model has achieved 82.46% accuracy, which is less than the result gained while using only image or clinical features separately, as shown in Figure 8.Nevertheless, the fusion model that combined the ultrasound image with clinical data outperformed the model that uses only images based on the other metrics, precision, F1-score, recall (sensitivity), and specifcity, which gives a more accurate description of the model's performance.It is also worth noting that the diagnostic process using the ultrasound images only diagnoses the polycystic morphology (PCOM), which is one of the Rotterdam criteria to diagnose PCOS, while the model that included the images and patient information aimed to diagnose the PCOS.Meanwhile, the achieved results represent an advancement towards automated PCOS detection that is based on the multimodality fusion model.Most of the stateof-the-art approaches reported in the literature use only clinical images; they do not consider the patient clinical information with images except the study of [22] that used images and clinical features; they achieved 98.63% accuracy.Tis paper has utilized traditional feature extraction techniques such as modifed furious fies and traditional classifers ANN and NB while using a limited number of samples 68.Te development of a fusion model for ovary ultrasound images with clinical features to diagnose PCOS is still a new feld and needs further development.
Meanwhile, the fndings show that progress toward automated PCOS identifcation is being made.Also, the results of the fusion model are acceptable; it indicates that the clinical features afect the PCOS diagnosis.However, it requires more improvement that could be attained by including more features that describe the patient's situation like the patient's lifestyle, diet, and some features that appear on the patient, such as dark areas on the skin, acne, pimples, weight gain, and hair loss.However, it is still necessary to perform exhaustive testing and experiments using data from diferent medical centers before deploying such a system.

Conclusion
Tis research activity aims to propose a PCOS diagnosis and analysis model for CAD systems.Two datasets have been collected; one for ovary ultrasound images and another for clinical data that include vital sign information, lab test results, and some symptoms, which help diagnose PCOS.Multiple deep learning frameworks are proposed to implement the AIbased CAD.Te frst proposed model diagnoses the PCOS morphology with ovary ultrasound images using the deep learning model for the automated diagnosis of PCOM with the aim to reduce the false positive rate and increase the performance of the proposed model.Many experiments have been performed to achieve this goal using diferent CNN architectures.Te proposed model employs the Inception model fne-tuned with the ultrasound images dataset.Fine-tuning is carried out by modifying the last layers of the Inception network, which is responsible for doing the classifcation task (fully connected layers).Tis model can determine whether the ultrasound images show polycystic or not and has obtained 84.81% accuracy, 69.57% precision, 72.73% F1-score, 76.19% sensitivity, and 87.93% specifcity.Te result achieved gives a notable improvement on a benchmark dataset, which indicates the promising future of obtaining a CAD system that is able to assist the radiologist in classifying ovary ultrasound images.Tis research activity has presented a study to analyze the impact of combining the image and clinical features using a deep learning model to diagnose PCOS.Two fusion models are compared and analyzed: joint fusion types I and II.Te fnding of this experiment has shown that the joint fusion type I has outperformed with 82.46% accuracy, 84.62% precision, 81.48%% F1-score, 78.57% sensitivity, and 86.21% specifcity.To sum up, this research highlighted the relevance of clinical features in PCOS diagnosis and proved that patient clinical information is necessary for diagnosing PCOS.Tis automated model can aid the physician in saving the time required to assess patients and reduce the risk associated with delayed PCOS diagnosis.Some limitations of this study could be addressed in future research.First, the number of samples in the dataset is limited due to the difculty in the process of collecting the images and clinical data.Te study has also used the dataset from a single center.In addition, one of the main limitations of this research is the lack of available computational resources that have been utilized to perform the empirical study, which in turn impacts the model's performance.
Despite the promising outcomes achieved by the proposed approaches, there are many considerations that might be improved in the future.Increasing the number of samples in the dataset (images and clinical data) helps improve the model performance, generalizes it successfully and reduces errors and misclassifcation.Although the current study approach only considers one image per patient, it is possible to have many images from various angles for the same patient.It is therefore desirable that additional ultrasound images can be taken into consideration.Also, the clinical dataset could be improved by adding more features that describe some symptoms observed in the patients, including acne, hirsutism, and other signs of hyperandrogenism, amenorrhea, and infertility.Moreover, the deep learning model that has used ultrasound images to diagnose PCOM can be used to classify other types of ovarian cysts, such as functional cysts, endometrioma or endometrioid cysts, dermoid cysts, hemorrhagic ovarian cysts, and PCOS.
(a).(b) Te joint fusion type I join the learned clinical features with learned imaged features.Te features will be learned by applying some dense layers before joining the features from each source.Tese learned features from the clinical images will be fed to the fnal model to apply the Applied Computational Intelligence and Soft Computing classifcation task to diagnose if the patient has PCOS or not, as shown in Figure 3(b).

Figure 2 :
Figure 2: Diagram of the model for the PCOM diagnosis.

Figure 3 :
Figure 3: Diagram of the model for diagnosis of the PCOS by combining the image with the patient clinical dataset.(a) Joint fusion type II and (b) joint fusion type.

Figure 5 :Figure 6 :
Figure 5: Clinical view of ultrasound ovary image before (a) and after (b) applied adaptive histogram equalization.

Figure 7 :Figure 8 :
Figure 7: Confusion matrix of the best fusion model of types I and II.
features and 285 samples.Te clinical dataset is composed of a total of 129 PCOS cases and 156 non-PCOS cases.Patients' diagnoses in the dataset are based on laboratory tests, doctor notes, and the radiologist examination of the ultrasound images to PCOS and non-PCOS.
Tis section presents and discusses the results of applying the previously defned experiment to diagnose polycystic ovary in the ultrasound images.Te full Deep Learning models are applied for feature extraction and classifcation.Experiments using 6 well-known CNN architectures are performed: VGG16, MobileNet.Te performance evaluation on the test set using the deep neural network architectures is presented in Table 2. Te Inception model was the best performing model, getting an 84.81% accuracy, 69.57% precision, 72.73% F1-score, 76.19% recall, and 87.93% specifcity.Due

Table 2 :
Te results for all models in applying the feature extraction and classifcation task using deep learning architectures.

Table 3 :
Confusion matrix for inception model.

Table 4 :
Performance for all models in the joint fusion type II.