Using Deep Learning with Bayesian–Gaussian Inspired Convolutional Neural Architectural Search for Cancer Recognition and Classification from Histopathological Image Frames

We propose a neural architectural search model which examines histopathological images to detect the presence of cancer in both lung and colon tissues. In recent times, deep artifcial neural networks have made tremendous impacts in healthcare. However, obtaining an optimal artifcial neural network model that could yield excellent performance during training, evaluation, and inferencing has been a bottleneck for researchers. Our method uses a Bayesian convolutional neural architectural search algorithm in collaboration with Gaussian processes to provide an efcient neural network architecture for efcient colon and lung cancer classifcation and recognition. Te proposed model learns by using the Gaussian process to estimate the required optimal architectural values by choosing a set of model parameters through the exploitation of the expected improvement (EI) values, thereby minimizing the number of sampled trials and suggesting the best model architecture. Several experiments were conducted, and a landmark performance was obtained in both validation and test data through the evaluation of the proposed model on a dataset consisting of 25,000 images of fve diferent classes with convergence and F 1-score matrices.


Introduction
At present, lung and colon cancer types are among the most prevalent and deadliest cancers leading to cancer-related deaths globally [1].In contrast with the combination of breast, ovarian, and prostate cancers, lung and colon cancers cause more death per annum.In recent reports, vaping and smoking have skyrocketed the risk of lung cancer; although nonvapers may be infected, the threat is minimal.Dietary habits, advancing age, obesity, and sedentary lifestyles [2] also contribute immensely to the risk factors leading to the progressive surge in incidences of colon cancer.People with an established family history of CRC infammatory and bowel disease, adenomatous polyposis, or hereditary nonpolyposis colon cancer are highly prone to CRC infection.According to studies, 20-53% of the U.S. citizens above 50 years of age are projected to be infected with adenomas and the aged have about 5% lifetime threat of adenocarcinomas emergence [3].Prompt detection of cancer is essential to its cure, but manual recognition and the stages of the processes involved in the identifcation are cumbersome and dangerous, especially in the early stages.Biopsies and imaging such as CT scans are two major diagnostic [4] methods.
Te microscopic investigation of unhealthy tissues or histopathology is critical to the early diagnosis and treatment of cancers [5].In recent times, the developmental strides in digital microscopy have pioneered the extraction of relevant information from these diseased tissues via whole-slide images (WSIs) of cancer tissues utilizing artifcial neural network algorithms based on convolutional neural networks [6].Deep artifcial neural networks can perform pathological examinations independently, which are usually conducted manually by pinpointing strategic interpretable features that pose prognostic characteristics.In the past, deep learning networks required huge data, which was hard to gather in the healthcare feld for model training and performing inference.However, recent advancements such as single-shot, few-shot, data augmentation, and architectural search learning methods have seen a reduction in demand for huge data in training and deploying deep learning models in healthcare.
Numerous factors impede the automatic detection and classifcation of these unhealthy tissues containing cancers.For instance, images of low quality due to poor fxation and strains during tissue preparatory works or failures due to autofocus when performing slide digitisation.Also, complex tissue builds, nuclei clutter, and variations in the morphology of the nucleus also constitute a great challenge.Specifcally, lung and colon colorectal adenocarcinoma and lung squamous-cell-carcinomas often pose asymmetrical chromatin textures which are extremely cluttered together, having unclear boundaries, and this invariably makes the detection of distinct nuclei a perplexing task [7].In addition, inconsistencies in the appearance of similar nuclei within and across several data samples make the classifcation of specifc nuclei invariably tough.Tese challenges were taken care of during the data preprocessing process.
Due to these complex patterns of cancer-afected tissue images, the classical design of automated cancer recognizers needs domain expert knowledge to guide on which specifc features to extract for training an artifcial neural network algorithm.Tis process, also referred to as feature engineering [8], is time-consuming, labor-intensive, and errorprone.Our proposed method is capable of learning the feature representations in the colon and cancer-afected tissues, thereby eliminating feature engineering.Our hypothesis is based on using a Bayesian neural architectural search to estimate the exact classifer architecture in conjunction with patients' endpoint as the outcome that has the potential of divulging recognized prognostic morphologies and also recognizing previously unfamiliar prognostic structures.

Related Works
In recent times, researchers have applied diferent artifcial intelligence and deep learning strategies to classify images containing diferent types of health problems, including cancer, for early identifcation and treatment.In a related work, a collection of classifers such as the KNN (k-nearest neighbor), ANN (artifcial neural networks), and SVM (support vector machine) in conjunction with the Bayesian model select and learn features from the leukemia dataset [9].In their study, Hosny et al. [10] proposed an automated framework to classify skin lesions for early cancer detection.In the work, they used transfer learning on a pretrained deep learning network to achieve a reliable result for cancer detection.In another closely related investigation, feedforward neural networks with a deep belief network and H 2 O were deployed to perform cancer classifcation from a cancer data repository [11].
Furthermore, a deep learning pipeline for a fully automated cervical cancer classifcation was proposed in work by [12].In the proposed pipeline, two pretrained deep learning frameworks were integrated to automatically conduct cervical tumour classifcation and cervix detection tasks.Also, a deep learning model was adopted to explore the possibility of classifying from gene expression data cancer cells [13].In another similar study, a supervised cancer classifcation model for the molecular subtyping of cancer cells, in particular, breast and colorectal cancers, was proposed [14].In another study, a convolutional neural network-driven deep learning method was deployed to perform a multiclass breast cancer classifcation task [15].
In continuation, a study used a three-way decision-based Bayesian deep learning approach to conduct an uncertainty quantifcation in skin cancer classifcation [16].Diverse convolutional neural network-powered deep learning models were used to perform dermatologist-level dermoscopy skin cancer classifcation tasks [17].Also, another scientifc investigation utilised a weakly-supervised 3D deep learning model to classify and localise breast cancer lesions found on MR imageries [18].An optimal feature fusion from ultrasound images for breast cancer classifcation using a probability-driven optimal deep learning framework was introduced by [19].A patch-based deep learning framework was introduced to perform breast cancer classifcation tasks from histopathological images [20].In this work, a rapid, deep learning-inspired framework using a Bayesian-Gaussian neural architectural search strategy is proposed.Our work is motivated by the recent drive for efcient models that are capable of rapid cancer data processing and recognition.[21] focuses on automating the neural network training cycle by eliminating the hassles of the manual neural network architecture value selection process (see Figure 1).Random and grid search [22], Bayesian optimisation [23], evolutionary search [24], reinforcement learning [25], and gradient descent [26] are some of the methods that have been proposed.Each of these algorithms has merits and demerits.For instance, the grid search has an issue known as the "Curse of Dimensionality" because it requires enormous time to train due to the drastic increase in the number of parametric combinations as more parameters are added to the model.Te random search tries at random parameter-combinations instead of searching each parameter combination like the grid search strategy.So, as the parameter value increases, the probability of obtaining an ideal combination of parameters via random sampling reduces to zero.

Bayesian Neural Architectural Search. Te neural architectural search
Bayesian optimisation using the Gaussian process algorithm provides a better alternative for 0-th order optimisation of expensive function evaluation necessary for artifcial neural network architecture selection.For a given Bayesian optimisation iteration, we train and observe a subset of the neural network to gauge the accuracy of 2 Journal of Healthcare Engineering unknown model architecture in a search domain.Tis method solves the aforementioned problems in the other search algorithms and eliminates the need for manual construction of distance functions between neural networks.Bayesian optimisation normally works by the assumption that an unidentifed function was sampled via a Gaussian process (GP), having a frm grip on this function while observation persists.In this work, the observations are the degree of convergence of several choices of hyper-parameters we intend to optimise.In choosing the hyper-parameters of the next iteration, the expected improvement (EI) [27] is optimised on the best present result, or the upper confdence bound [28] of the Gaussian process.Te efciency of the upper confdence bound (UCB) and the expected improvement (EI) have been confrmed for the amount of function evaluation necessary to attain the global optimum of numerous black-box functions [29].

Te Gaussian Process (GP).
Gaussian process (GP) is an optimal method for loss function modelling in models that require optimisation and is a prior of functions that are closed under sampling [29], that is, if the prior distribution of a function f is perceived to be GP having k kernel and 0 mean, then the conditional distribution of f, acknowledging a sample Z � (x i , f(x i ) n i�1 of its values, is also regarded as GP whose covariance and mean functions are derivable analytically.Gaussian processes possessing mean functions that are generic can also be used in principle, but it is efcient and easy to use only 0-mean processes for this work.We achieved this by focusing the values of functions on the data sets being processed.

Acquisition Functions for Bayesian Optimisation.
Assumptions are made such that the function f(x) is selected from the prior of the Gaussian process, and observations are in the form of x n , y n N n�1

􏼈
, given y n ∼ N f x n ), v and v representing the noise variance introduced into the observed function.Posterior over functions are induced by the data and the prior, which is denoted as a: X ⟶ R + , which fxes what point in X should next (n) be estimated through a proxy optimisation X n � argmax x a(x)W;here many diverse functions have been anticipated.Previous observations are relied upon by the acquisition functions, even the Gaussian process hyperparameters and these dependencies are denoted as a(x; x n , y n  ; P).Many popular acquisition functions are available, but with the Gaussian process prior, they rely solely on the predictive mean function of a given model μ(x; x n , y n  , P) in conjunction with the predictive variance function σ 2 (x; x n , y n  ; P).Terefore, the current best value is presented as X b � argmax x n f(x n ), β(.) as the cumulative distribution function of the standard normal and c(.) the standard normal density function [29].Intuitively, a notable approach is to maximise the probability of improving the current best result, and this process is known as probability of improvement (PI) [25].Analytically, this can be computed as follows: where Te convolutional layer is the power block where vital convolutional operations are performed and important features are extracted, as shown in Figure 2. It extracts similar features from diferent image regions and matches them together for probabilistic decision-making.A chuck of images (x 1 , x 2 , x 3 ) are taken from the image repository and fed into the convolutional blocks (Block1, Block2, Block3, . ..) for operations.Filters in the convolution layers convolve over the input image chunks to pick vital key points using the back propagation algorithm.Te pooling layer or subsampling layers carry out down sampling processes on the images emanating from the convolution operations.Ten, a max-pooling operation picks the largest pixel values from a specifc part of the image kernels, thereby minimising the required parameters to be computed and making the convolution activities translational invariant to scale, size, and shape [31].Te last layer is a fully connected layer which accepts the inputs of all previous neurons and operates on them to produce output (y 1 ).

Materials. We present the detailed materials and resources used in training and evaluating the proposed model. We employed Keras open-source deep learning framework with
TensorFlow backend [32] to construct, train, and evaluate the Bayesian-Gaussian driven convolutional neural architectural search model for cancer identifcation.All experiments were performed on a high-end PC with an 8G GPU card of 16 GB internal memory, a cuDNN library, and a CUDA Toolkit.

Dataset.
Te dataset used in this work consists of 25,000 colon and lung histopathological images of fve classes [33].Each class contains 5000 images placed in separate folders, where 0, 1, . .., n denotes the classes of the images.Te classes belonging to colon histopathological images are colon adenocarcinomas and benign colonic tissues, and that having lung histopathological images are lung adenocarcinomas, lung squamous cell carcinomas, and benign lung tissues.All patients' identities are removed and the data are freely made available for AI researchers.Te original size of all the images is 768 × 768 pixels.However, during preprocessing, we resized all the images to 150 × 150 pixels to minimise computational demand and allow the dataset to ft into our computational model.Te dataset was randomly split into three, having 70% samples assigned to the training set, 20% for validation and 10% designated for testing the model.

Methods.
A baseline convolutional neural network with three layers was used for the training of the proposed Bayesian-Gaussian inspired convolutional neural architectural search.Te frst consists of 9 kernel sizes, 1-stride, 16flters, and max-pooling of (2 × 2).Te second has parameters as the frst but with a dropout layer of 0.15.Te third layer has nine kernel sizes, 1-stride, 36-flters, maxpooling of (2 × 2), and a dropout layer of 0.15.Te fourth and fnal layer is the dynamic layer, where the neural architectural search processes are performed.We used categorical cross-entropy as the loss function and Adam as the optimiser.Te best model was initialised at zero before training with 30 epochs and a batch size of 128.Expected improvement was used as the acquisition function with the number of calls set at 11. Initially, we kept the dynamic learning rate between 1e − 6 low and 1e − 1 high with a uniform prior, the artifcial neural dense layer at 1-low and 10-high, and dense node at 2-low and 512-high.We set the default parameter P at 1e − 3 learning rate, 1-16 dense layer/ node, and rectifed linear unit (ReLU) as the activation function.

Results and Discussion
We deployed several measurement matrices to determine the cancer identifcation prowess and performance of the proposed model.

Convergence and Matrix Plots. Te convergent plot in
Figure 3 shows the learning progression during training with respect to the number of calls.As the call n increase, the model convergence increases and attain the peak between 4 and 11 calls.Figure 4 is a matrix plot illustrating the combination of the three key training dimensions.
Te frst and second plots on the frst row of Figure 4 show the partial dependences of two dimensions of the 4 Journal of Healthcare Engineering ftness-value-change approximation resulting from the simultaneous alteration of the dimensions.Tey represent the estimates of the modelled ftness function, which invariably are the approximation of the real ftness function.Te partial dependence (PD) is computed by setting an individual value for the learning rate and selecting a large number of examples randomly for the dimensions left in the search space, and then the projected ftness values available in all the points are averaged.To demonstrate the infuence of this exercise on the average ftness function, this process is redone on other learning rates.Similarly, this procedure is repeated on the plots of the partial dependencies of the other remaining dimensions.Te sample distribution of individual hyperparameters while performing Bayesian optimisation is shown in the diagonal of the histograms in Figure 5. Te other plots under the diagonal diagram show the position of samples in the search space.Te magnitude of the sample selections is demonstrated with the colour coding.It is most likely to observe a high concentration of samples in sections of the search space when bigger numbers of samples are chosen.Te top ten accuracies of the model architectural search process drawn from 30 generated architectures are shown in Table 1.From the table, the model with 1.85e − 4 learning rate, nine layers, and 142 dense nodes yielded the best result overall.

Te Confusion Matrix.
We further measure the performance of our proposed method by examining the precision, recall, and F1-score of randomly selected test samples of each class of the colon and lung tissues.Te recall is the ability of the proposed model to discover all the signifcant cases of cancer in a given set of samples.In order words, it is the number of true positives (TP) divided by the number of true positives (TP) [34] added to the number of false negatives (FN), i.e., Te colon and lung tissue data points accurately classifed as positive are the true positive (TP), and the ones classifed as negative when they are actually positive in reality are false negatives (FN).Te ability of the proposed method to detect only the relevant colon and lung tissue data points is the precision or the number of true positives (TP) divided by the number [35] of false positives (FP) added to the number of true positives (TP) i.e., Precision(P) � TP (TP + FP) . ( False positives (FP) are instances where the model classifes data points as positive when they are negative in reality.Furthermore, the harmonic mean of the precision and recall are the F1-score expressed as follows: Finally, the macro and weighted averages are the arithmetic mean of the F1-scores per class of the colon-lung tissue test data samples and the weight of the F1-score of each colon-lung tissue test data class by the number of samples from that class, respectively.Analysing the performance of the proposed model on each class of the lung-colon tissue test samples, our model recorded a 98% F1-score on 489 randomly selected lung adenocarcinoma (LA) test samples, as shown in Table 2. Also, a 99% F1-score was achieved on 511 test samples of the lung squamous (LS) tissues and 94% on 534 lung benign (BL) test data samples, respectively.Likewise, the model yielded a 93% F1-score on the 512 randomly picked colon adenocarcinomas (CA) test data samples and 99% on 454 benign colonics (BC) test data samples.An overall 97% test accuracy on the 2500 reserved test data samples was achieved with 97% macro-averaging and weighted averaging, respectively.
We compared our result with the one obtained using our baseline conventional convolutional neural network (CNN) model, having 3,474,501 trainable parameters with 50 epochs, as shown in Table 3.All parameters in the baseline model remain the same as the proposed model but without the Bayesian-Gaussian architectural search process.A closer look at Table 3 indicates that the baseline model sufers overftting problems which can be mitigated with some cumbersome measures, but our method does not require additional measures to obtain an optimal model.Our approach achieved an approximately 97% overall accuracy (Table 2), thereby outperforming the normal CNN method, which yielded an overall accuracy of 62%, as shown in Table 3.
Furthermore, a comparative analysis with related works on deep learning-based lung and colon cancer classifcation is performed against the proposed model in this section.Since our work is based on new novel dataset, some of these related results cited in this article are not completely comparable as the dataset they used in the various works are diferent from our work.Even so, the objective of the works is the same and thus put into comparison as shown in Table 4.
As shown in Table 4, our introduced method outperformed the other referenced methods in terms of classifcation and recognition of cancer infection types.Te frst three works on the used SVMs, SC-CNN, and RF, respectively to conduct the cancer classifcation tasks and recorded accuracies far below our proposed method within the range of 72% and 86%.Tis is followed by the models proposed [38] that used the RESNET-50 deep learning architecture and obtained an accuracy of 93.91%, then [39] Hatuwal and Tapa [40] and Masud et al. [41], respectively, used conventional CNN architectures on the histopathological cancer image datasets to obtain classifcation accuracies ranging from 97.89% to 97.92%.However, our proposed method used a novel variant Bayesian-Gaussian architectural search process to obtain a more better CNN architecture that yield a more superior performance in terms of performance accuracy and efciency.

Conclusions
In this work, we proposed a neural architectural search model which examines a histopathological image to recognize the presence of some classes of cancer in both lung and colon digital images by learning and distinguishing critical features in them.Tis method works by using key points in a given batch of data occupying a search space to suggest a suitable and efcient neural network architecture.Te results from this work have shown that by having a sizeable amount of histopathological image dataset, one can successfully construct an efective and efcient neural network model capable of recognizing a cancer-infected person without undergoing painful rigorous diagnostic processes.Tis technique works without manually setting network architecture features, unlike the conventional artifcial neural network models.In the future, we plan to increase the

Figure 3 :Figure 2 :
Figure 3: Convergence plot of the training process.

Figure 5 :Figure 4 :
Figure 5: Matrix plot of the Bayesian optimization process.
of Healthcare Engineering Alternatively, we adopted to maximise the expected improvement (EI) of the current best result in this work, which is closely related to the Gaussian process: Figure 1: Te proposed neural architectural search cycle with Gaussian process.Journal

Table 3 :
Confusion matrix of the convolutional neural network.

Table 4 :
Comparison of the obtained results with other related methods.

Table 1 :
Top ten search accuracies.

Table 2 :
Confusion matrix of the optimal model. of the model by adding more cancer classes and cases and increasing the model's efciency and accuracy. robustness