Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions

Deep learning (DL) is a branch of machine learning and artificial intelligence that has been applied to many areas in different domains such as health care and drug design. Cancer prognosis estimates the ultimate fate of a cancer subject and provides survival estimation of the subjects. An accurate and timely diagnostic and prognostic decision will greatly benefit cancer subjects. DL has emerged as a technology of choice due to the availability of high computational resources. The main components in a standard computer-aided design (CAD) system are preprocessing, feature recognition, extraction and selection, categorization, and performance assessment. Reduction of costs associated with sequencing systems offers a myriad of opportunities for building precise models for cancer diagnosis and prognosis prediction. In this survey, we provided a summary of current works where DL has helped to determine the best models for the cancer diagnosis and prognosis prediction tasks. DL is a generic model requiring minimal data manipulations and achieves better results while working with enormous volumes of data. Aims are to scrutinize the influence of DL systems using histopathology images, present a summary of state-of-the-art DL methods, and give directions to future researchers to refine the existing methods.


Introduction
Cancer is defined as abnormal cell growth that arises from any body organ. In essence, further growth of the cells in these organs is saturated. These silent and saturated cells are increased at a rapid rate till either their removal through a physical procedure such as surgery, medication, use of hormonal therapy, or radiation therapy or their disappearance on their own naturally. The natural disappearance of cancer cells can happen in cancers related to kidney or melanomas. These cells can be screened using tools such as colonoscopy or pap smear examination or using mammograms. There are more than 150 different kinds of cancer, and there is a lack of strategies to cure them in their early stages. Cancer stem cells are an effective way to form stromal cells thus paving a way for the cure of cancers. Apart from stem cells, WNT16B protein also increases resistance against cancer along with chemotherapy. Therapies such as laser therapy and cryotherapy are some of the most vibrant approaches to treat cancer. Some of the most prevalent types of cancers worldwide include lip, oral cavity, breast and cervical, and thyroid cancers. On the other hand, rare cancers such as osteosarcoma, Ewing's sarcoma, male breast cancer, gastrointestinal stromal tumors, chondrosarcoma, mesothelioma, adrenocortical carcinoma, cholangiocarcinoma, kidney chromophobe carcinoma, pheochromocytoma and paraganglioma, sarcoma, and ependymoma made up more than 20% of cancer cases and are rare types of cancers [1][2][3][4].
Cancer is a disease of genes. The process of replication, mitosis, and bombardment by oxygen cells bring continuous changes in normal and cancer cells. This process begins at the birth of a cancer cell and goes on till death. During this process, cancer cell gains mass using stromal support cells, immune cells, and endothelial cells. These cells become a part of cancer mass due to factors like stress ligands and antigens. Other emblems of cancer-based cellular stress are proteotoxicity, metabolic changes, and displaced acids of nucleotides. Another pattern of genes that drive them is chromosomes. They are drivers of a cell's nucleus. The human body has around 20,000 genes in somatic cells, and their study known as cytogenetics has seen large strides of progress over the past several decades where it is now possible to build a 3D model of chromosomes [5][6][7].
Sugar is an important constituent of tumor cells that fuels the rapid growth of these cells. They are an important part of the diet of cancer cells, and their growth ensures the formation of new clones. Bacteria and microbial cells colonize the human body. Microbial cells are estimated to be as abundant as human cells, yet their genome is roughly 100 times the human genome, providing significantly more genetic diversity. Helicobacter pylori, Chlamydia trachomatis, Salmonella enterica serovar typhi, Fusobacterium nucleatum, enterotoxigenic Bacteroides fragilis, Koribacteraceae, etc., are some of the most prominent bacteria that are associated with cancer. Apoptosis and necroptosis are two avenues of programmed cell death [8][9][10].
Cancer has long inspired fears. In the distant past, physicians related depression or melancholic humour to cancer's pathogenesis. It was believed that melancholy could give rise to a tumor as people attributed their cancer to sadness. Recently, inflammation and nonspecific immune activation are found to be key factors in the pathophysiology of depression related to cancer. Urban centers are at an increasing risk of cancer-related risks due to factors like nutrition; infections such as sea turtle fibropapillomatosis and feline immunodeficiency virus; urban chemical pollution such as carcinogens, polychlorinated biphenyls, glutathione, and urethane-induced adenomas; light and noise pollution such as suppression of pineal melatonin production; changes in survival; and life history strategies [11][12][13].
Deep learning (DL) has seen phenomenal growth in recent years in the use of artificial intelligence allowing complex computational models to represent abstractions gathered from data with wide applications in speech processing, visual processing, and other domains. These methods work by discovering fine structures in large and often complex datasets using a backpropagation algorithm. Compared to DL, conventional methods such as machine learning-based methods have limitations in processing natural data in its basic form without preprocessing [14].
Convolutional Neural Networks (CNNs) are DL systems equipped with the power to learn invariant features. CNNs have filter banks, feature pooling layers, batch normalization layers, dropout layers and dense layers that work in harmony to create patterns for different object recognition tasks such as detection, segmentation, and classification. CNNs have multilevel hierarchies where the distribution of inputs changes during the process of training. Preprocessed inputs, such as those obtained through the process of whitening, etc., are highly desirable to obtain better performances across tasks [15]. CNNs have many different variants such as those offering shorter connections, for example, DenseNet architecture, which offer advantages in terms of feature circulation and offer substantial reduction in hyperparameters to build efficient architectures [16]. The focal and nonfocal electroencephalogram signals in tunable Q-factor wavelet transform domain have been investigated and identified with the help of feature selection and neural network methods [17]. A recent study concerning the low-density parity-check (LDPC) codes for Internet of things networks has been conducted via a novel technique for obtaining the first two minima of check-node update operation of the min-sum-LDPC decoder [18]. In addition, a review of future robust networks including various scenario for 6G has been discussed in [19].
Other types of CNN architectures that have gained popularity in recent years are ResNets, Xception, and GoogLe-Net architectures. The need for these networks arises due to degradation in performances across tasks when the network is getting deeper, the need for multiscale processing, and the search for better architectures with less number of parameters [20][21][22][23].
Another issue that holds considerable importance in DL is the ability of an architecture to store information over extended time intervals. One solution proposed for this problem is Long Short-Term Memory (LSTM). LSTM architecture works by enforcing consistent error flow that is nonglobal in space and time through states of specialized units [24].
Another idea worth mentioning in DL is the notion of transfer learning. In transfer learning, features extracted from deep CNNs are repurposed to new and novel tasks. The need arises because generic tasks may differ by a wide margin from the original tasks due to which there may be insufficient labelled or other data to train or adapt a DL architecture to new tasks. Using transfer learning, features can be adapted to have sufficient generalization expression using simple techniques reliably [25][26][27].
Finding better architecture design parameters for DL models is a problem worth considering. Reinforcement learning methods can help in this task. An inspirational example is NASNet architecture that uses a number of different network topologies to find repeated motifs that can be combined in series to handle inputs of varying spatial dimensions and depth [28,29].
This paper presents an overview of DL methods for the task of cancer diagnosis, prognosis, and prediction. The aim is to highlight the differences between different model constructions and to provide limitations and future perspectives for further exploration of this exciting domain.
The rest of the paper is organized as follows. Section 2 presents the gist behind the selection of studies that are 2 Computational and Mathematical Methods in Medicine made a part of this survey article. Section 3 presents an overview of publically available datasets for cancer research followed by the description of current applications of DL in cancer diagnosis, prognosis, and prediction tasks in Section 4. Section 5 presents the discussion covering limitations of the existing methods, perspectives, and some directions for future work. Finally, Section 6 concludes this work and proposes avenues for further research in this domain.

Methodology
The criterion used for the selection of articles for this minireview was language and authenticity of electronic sources. Articles written only in English language are made a part of this survey due to wide recognition of English as the language of scientific and biomedical domains. Years of sources of articles considered for this study range between 1997 and 2020. We used PubMed, Web of Science, IEEE Xplore, and Science Direct platforms to conduct the research. The search terms used were diagnosis of cancer, prognosis of cancer, prediction of cancer using DL, and transfer learning models.

Publically Available Datasets for Cancer Research Using DL Methods
In this section, we will provide a brief description of publically available datasets for cancer studies. We will briefly describe whole-slide images. The challenge dataset is made up of 500 training and 321 testing breast cancer histology whole-brain slides. This dataset is designed to fulfill three purposes. The first one is to predict mitotic scores while the second one is to predict gene expression-based proliferation scores. A third task was later added to the challenge for mitosis detection.
(7) INbreast Dataset. This breast cancer dataset [34] has a total of 115 cases and is made up of fullfield digital mammograms.

Current Applications of Deep Learning in Cancer Diagnosis, Prognosis, and Prediction
In this section, we will discuss some current research trends in the domain of DL for cancer diagnosis, prognosis, and prediction tasks. We will cover techniques for the prognosis/prediction of tumors, breast cancer, and other types of cancer. In addition, we will also cover techniques for the segmentation/detection of breast cancer and other types of cancer. Furthermore, we will cover different methods for the classification of breast cancer and other types of cancer. We will also cover techniques for the classification, segmentation, and detection of brain tumors. Figure 1 shows histopathological views of some of the cancer subtypes that will be covered in this review article.

Prognosis/Prediction of Tumors, Breast Cancer, Skin
Cancer, Head and Neck Cancer, Brain Cancer, Liver Cancer, Colorectal Cancer, Ovarian Cancer, and Other Types of Cancer. Petalidis et al. [50] reported a gene expression dataset for astrocytic tumors. They employed an Artificial Neural Network (ANN) algorithm to combine signatures from histopathological subclasses of these tumors in order to address the need for proper grading of these tumors. In this study, they found 59 genes which belong to three classes, namely, angiogenesis, lower-grade astrocytic tumor discrimination, and cell differentiation. They further report that these tumor subtypes have very high prognostic value, and they are missing in other studies reported in the literature. Finally, they report 11 classifiers that used genes to differentiate among primary/secondary subtypes of glioblastomas. They used a custom as well as independent dataset reporting accuracy of 96.15% to identify correct classes for these subtypes. Chi et al. [51] use morphometric features to compare prediction outcomes on two different breast cancer datasets. They report successful predictions with good and bad prognostic values. Here, good means that prognosis stands valuable even after five years while bad suggests less than five years. The authors in [52] conducted experiments in female breast carcinoma patients using a DL approach. They did prediction using a Cox regression model and gene expression datasets. They called their approach Survival Analysis Learning with MultiOmics Neural Networks (SALMON). They report that performance of SALMON is improved when more data is used to combine and simplify cancer biomarkers and gene expressions to enable prognosis prediction. Shimizu and Nakayama [53] conducted experiments to identify breast cancer genes for prognosis prediction using The Cancer Genome Atlas (TCGA) database. They identified 184 genes using artificial intelligence (AI), Contains images of human, monkey, and cat species at subcellular, cellular, and tissue levels and for that purpose, they used random forest and neural network models. Furthermore, they used a molecular score for prognosis that uses only 23 of these genes. They confirmed that they have found potential drug targets in these genetic discoveries. The authors in [54] performed their experiments using malignant melanoma. They used a dataset with 1160 females and 786 males. They used an ANN architecture employing a flexible nonlinear structure for prognosis prediction of survival probabilities. They found the performance of their model to be at par with the Cox model with the advantage that it offers a flexible approach when analyzing data using a specified distributional form. Jing    Computational and Mathematical Methods in Medicine based biomarkers using CNNs as DL models from glioma and glioblastoma cohorts of TCGA. They used a samplingand filtering-based approach for the improvement of their predictions by not taking into account the intratumoral heterogeneity. Their model achieved a median concordance index of 0.754 surpassing other state-of-the-art approaches. The authors in [64] developed a DL-based approach using CNNs to predict the survival of mesothelioma cancer subjects. They used TCGA and a French source to test their approach. They achieved a concordance index of 0.656 on the TCGA cohort surpassing the performance of human experts and found key regions in the stroma that are associated with inflammation and cellular diversity. Liu et al. [65] modeled diagnostic prediction using DL models. The authors conducted their study on 27 diverse cancer types obtained from TCGA and Gene Expression Omnibus dataset. They successfully decoded 12 CpG and 13 promoter markers. The CpG markers that they identified achieved a sensitivity of 100% in the prediction of prostate cancer samples while promoter markers achieved 92% using cell-free deoxyribonucleic acid (DNA) methylation data. Table 2 displays a summary of the studies for the task of prognosis and prediction of cancers covered in this subsection.

Segmentation/Detection of Breast Cancer, Lung
Cancer, Bladder Cancer, and Other Types of Cancer. Yap et al. [66] used DL approaches for breast lesion detection using ultrasound images. They investigated the performance of LeNet, U-Net, and a pretrained AlexNet. They conducted their experiments on two custom datasets of 306 and 163 images termed dataset A and dataset B, respectively. Their pretrained AlexNet-based model achieved the best overall performance by achieving an F-measure of 0.91 and 0.89 on both datasets. The authors in [67] come up with different variants of fully convolutional networks (FCNs) for the segmentation of lesions of breast cancer subjects. They tried an AlexNet-based FCN, as well as 8-, 16-, and 32-layered FCN models. To overcome the problem of data deficiency, they used transfer learning and pretraining on the ImageNet dataset. Their dataset has two classes, benign and malignant. They reported an average dice score of 0.7626 using FCN with 16 layers on benign lesions. Their model correctly recognized 89.6% of benign lesions and 60.6% of the malignant lesions successfully. Liu et al. [68] used DL to detect breast cancer in lymph node biopsies. They used 399 slides from the Camelyon16 challenge dataset to achieve an AUC of 99% at the slide level. They used a second custom dataset that has 108 slides to achieve an AUC of 99.6%. As a preprocessing step, they used a color normalization procedure. The authors in [69]  As data augmentation methods, they employed horizontal and vertical flipping, translation, and resizing operations to artificially increase the size of datasets. Anuranjeeta et al. [70] used shape and morphological features derived from segmented images to detect cancer cells using a number of DL and machine learning-based models. They used J-Rip, logistic modal tree, rotation forest, multilayer perceptron, and other models trained by histopathological images. Rotation forest performed the best in cancerous/noncancerous detection achieving an accuracy of 85.7%. The authors in [71] used a modified regional CNN method to efficiently determine mitosis in breast cancer using histopathological images. They employed subjects belonging to the 2014 International Conference on Pattern Recognition (ICPR) and TUPAC 2016 datasets in their study. They achieved 0.76 in precision on the TUPAC 2016 dataset. Zhou et al. [72] used a 3D deep CNN model to detect lesions in the breast cancer MRI dataset. They deployed a custom dataset with 1537 female patients and classify them as benign or malignant. They achieved an accuracy of 83.7% for the diagnostic task and a dice distance score of 0.501 for the detection task. The authors in [73] proposed a DL integrated architecture with the capability of performing classification, segmentation, and detection for the screening of breast masses as benign or malignant. They used digital X-ray mammograms from the INbreast database. Their model achieved a mass detection accuracy of 98.96%, while for mass segmentations, they achieved a dice score of 92.69%. To augment the dataset, the authors applied rotation 8 times to synthetically increase the size of the dataset. Nasrullah et al. [74] deployed DL-based architectures for the diagnosis of malignant nodules in lung cancer. They conducted studies on LUNA16 and LIDC-IDRI datasets. They used faster region CNN and U-Net styled architecture to achieve an accuracy of 94.17% on the classification task. The authors in [75] used a DL-based system for screening lung cancer using CT scans. They used LIDC-IDRI and Kaggle data science bowl challenge datasets for the experiments. Their system was based on 3D CNN architectures. The authors used heavy augmentations to artificially increase the size of the datasets using methods such as rotations, scaling, translation, and reflection. They achieved a dice coefficient of 0.4 on the LIDC-IDRI dataset. Shkolyar et al. [76] deployed DL-based models for the detection of papillary and flat bladder cancer. They used CNNs to construct an image analysis platform. They used two datasets of 100 and 54 subjects. Their model successfully detected 42 of 44 papillary and flat bladder cancers. They reported a pertumor sensitivity of 90.9%. Fourcade et al. [77] used a combination of DL and superpixel segmentation-based methods to segment full body organs such as the brain and heart from Positron Emission Tomography (PET) images. To synthetically increase the size of the dataset, the authors deployed rotations, scaling, mirroring, and elastic deformations. Their best performing model achieved a dice score of 0.93. The authors in [78] deployed DL architectures to detect brain metastasis on MRI. They used data from 121 subjects in their proposed study. They used a faster region CNN model achieving an area under the ROC curve of 0.79. Ma et al. 9 Computational and Mathematical Methods in Medicine [79] used you only look once v3 dense multireceptive fields CNN for thyroid cancer nodule detection. They used ultrasound images and deployed different data augmentation methods such as color jitter, change saturation, exposure, and hue on two datasets of 152 and 699 images. The number of images increased to 10845 after the application of data augmentation schemes. The values of mean average precision (mAP) reported by the authors were 90.05 and 95.23. Das et al. [80] proposed a system combining watershed segmentation, Gaussian mixture model (GMM), and deep neural network for the classification and segmentation of liver cancer using CT scans. Their model performed recognition of hemangioma, hepatocellular carcinoma, and metastatic carcinoma subjects. They employed 225 CT scans in their study achieving a dice score of 0.9743 on the testing set for the segmentation task and an accuracy of 99.38% for the multiclass classification task. The authors in [81] proposed a DL-based model for the segmentation of histopathology images of the liver organ. Their proposed DL model combined residual block, bottleneck block, and an attention decoder block. The authors further created a new dataset of 80 histopathology images which they named as the KMC liver dataset and proposed a joint loss function combining dice and Jaccard losses. They conducted their experiments on two datasets: KMC liver and multiorgan Kumar datasets. Each image in the Kumar dataset has a dimension of 1000 × 1000 while each image in the KMC liver dataset has a dimension of 1920 × 1440. Their model achieved a Jaccard index of 0.7206 on the KMC liver dataset and 0.6888 on the Kumar dataset. Wang and Chung [82] proposed a modified U-Net-based architecture for the segmentation and diagnosis of the colon gland. The authors employed two  Table 3 displays a summary of the studies for the task of segmentation and detection of cancers covered in this subsection.

Classification of Breast Cancer.
Huynh et al. [92] used DL methods to classify regions of interest taken from ultrasound images. Cystic, benign, or malignant labels were assigned to each region. Two binary classification tasks were performed using pretrained CNNs, nonmalignant (benign+cystic)/malignant and benign/malignant. They used SVM as a classifier on the CNN-derived features. On the nonmalignant/malignant classification task, they obtained an AUC of 0.9 while on the benign/malignant task, their method obtained an AUC of 0.88. The authors in [93] used CNNs as their DL approaches and introduced the concept of a matching layer to convert grayscale to red, green, and blue patterns. They used 882 ultrasound images obtained from two publicly available datasets. Using fine-tuning and matching layer, their method approached an AUC of 0.936 on a test set of 150 cases. Byra et al. [94] used DL transfer learning-based approaches such as Inception version 3 and VGG19 architectures on reconstructed B-mode images experiencing a decrease in classification performances. To counter this, they used data augmentation to reconstruct B-mode images achieving better performances on breast ultrasound images. The authors in [95] combined crossmodal and cross-domain transfer learning for the benign/malignant classification task. In comparison to training from scratch and simple fine-tuning, their approach achieved better performance with 97% accuracy on ultrasound images. Hadad et al. [96] deployed cross-modal transfer learning using mammography images achieving an accuracy of 0.93 which is better than cross-domain transfer learning. The authors in [97] presented a study on the use of MRI in screening individuals younger than 40 years confirming the effectiveness of MRI as a modality of choice for such diagnoses. They reported a very high sensitivity around 93% to 100% and low specificity in the range of 37% to 97%. They found MRI to be effective especially after reconstructive surgery. Hu et al. [98] developed a transfer learning methodology using an MRI modality with multiple parameters. They used different sequences such as dynamic contrast-enhanced and a T2-weighted sequence to distinguish benign lesions from malignant. They used image, feature, and classifier fusion methods and achieved an AUC of 0.87 for the feature fusion scheme that statistically outperformed other methods. The authors in [99] proposed a methodology using Inception version 4 and the residual network transfer learning architectures as well as a recurrent CNN architecture on the 2015 Breast Cancer Classification Challenge and Break-His datasets for binary and multiclass classification tasks. They used rotation, translation, and other data augmentation methods to artificially increase the size of the datasets achieving an accuracy of 97:57 ± 0:89% on multiclass and 11 Computational and Mathematical Methods in Medicine   [112] deployed a weakly supervised scheme for the binary classification of benign and malignant tumors using histopathology images. They deployed the BreakHis dataset achieving an accuracy of 92.1% at 40x magnification. An important contribution of their approach is the absence of the need for labelling the images. The authors in [38] deployed CNNs for both binary (carcinoma/noncarcinoma) and multiclass (normal/benign/in situ/invasive) classification tasks. They used the 2015 Bioimaging breast histology classification challenge dataset in their study. Their architectures were able to retrieve information at different scales. For the multiclass classification task, the authors achieved an accuracy of 77.8% while for the binary classification task, they achieved an accuracy of 83.3% using rotation and mirroring as data enhancement methods for both these tasks. Rakhlin et al. [113] deployed different transfer learning architectures using microscopic histological images from the ICIAR 2018 Grand Challenge dataset for binary (carcinomas/noncarcinomas) and multiclass (four classes) classification tasks. They used pretrained ResNet-50, Inception version 3, and VGG16 architectures. They deployed normalization, downscaling, cropping, and color variation as augmentation schemes achieving a correct classification rate of 87.2% for multiclass classification and 93.8% for the binary classification task. The authors in [114] extracted smaller/larger patches using a clustering approach and a CNN (ResNet-50 architecture) at cell and tissue levels deploying the 2015 Bioimaging breast histology classification challenge dataset. For the multiclass (4 classes) classification task, the authors reported accuracy of 88.89% using the proposed approach. The authors deployed stain normalization procedure as a preprocessing method. Shallu and Mehra [115] demonstrated the use of three different transfer learning architectures such as VGG16, VGG19, and ResNet-50 for the classification of histological images on the BreakHis dataset. They deployed rotation as the data enhancement scheme. They found the performance of a fine-tuned VGG16 with logistic regression classifier to be the best achieving an accuracy of 92.6% with this classifier. The authors in [116] deployed CNN, K nearest neighbour (KNN), Inception version 3, SVM, and ANN algorithms for the binary (benign/malignant) classification task. They used different schemes for preprocessing and data enhancement such as gray scaling, channel standardization, flipping, rotation, and cropping as well as image segmentation to reach an accuracy of 97% using ANN architecture. Bevilacqua et al. [117] evaluated two different frameworks for binary and multiclass classification of irregular/regular/stellar/no opacity lesions from segmented high-resolution images. They used ANN classifiers with hand-crafted and morphological features for the first framework. For the second framework, they used different CNN models especially a VGG model. They reported accuracy of 84.19% for the first framework on binary and 74.84% on multiclass classification tasks while for the second framework, they obtained an accuracy of 92.02% for binary and multiclass classification tasks. The authors in [118] make a contrast between two machine learning approaches for the multiclass (8 classes) classification task using histopathological images on the BreakHis dataset. The first approach used handcrafted features while the second approach used CNN as a feature extractor. They used VGG16, VGG19, and ResNet-50 as their CNN models. They used rotation, translation, scaling, and flipping as data enhancement methods. The VGG16 model reaches an accuracy of 93.25% at the patient level for the multiclass classification task. Spanhol et al. [119] proposed a DL model that reused a previously trained CNN model on the BreakHis dataset achieving an F 1 -score of 90.3 at the subject level. The authors in [120] exploited global covariance information using a matrix power normalization procedure into a simple CNN model. This arrangement can exploit second-order statistical information producing effective representations from histological images. On the BreakHis dataset for the binary (benign/malignant) classification task, they achieved an accuracy of 97.92% at the subject level while employing cropping and flipping operations to enhance the size of the dataset synthetically. Khan et al. [121] used different transfer learning (GoogLeNet, VGGNet, and ResNet) architectures for binary classification of benign/malignant tumor cells while deploying BreakHis and a custom dataset. For data augmentation, scaling, rotation, translation, and color augmentation methods were used by them to achieve a correct classification rate of 97.67%. The authors in [122] introduced an information-based architecture that is designed to exploit clinical information. There are six types of records in their dataset of 100 subjects, such as encounter notes, operation records, pathology notes, radiology notes, progress notes, and discharge summaries. They used fine-tuned transformer models from pretrained bidirectional encoder representations achieving a precision of 0.976 for relation recognition. Naik et al. [123] deployed a DL model to assess estrogen status from whole-slide histopathological images. They used the Australian Breast Cancer Tissue Bank as well as TCGA datasets in their study and further deployed flipping, rotation, color jitter, and cutout regularization as augmentation methods. Their model achieved an AUC of 0.861 on TCGA and an AUC of 0.905 on Australian Breast Cancer Tissue Bank datasets. The authors in [124]  Their study shows the superiority of combining models over a single model.  [125] deployed different transfer learning architectures for multiclass (9 classes) classification of colorectal cancer subjects. They used VGG19, AlexNet, SqueezeNet, GoogLeNet, and ResNet-50 models on two datasets of 86 and 25 subjects reaching an accuracy of 98.7% and greater than 94% on them. The authors in [126] deployed a CNN architecture to extract features from Optical Coherence Tomography (OCT) images of colorectal cancer subjects. Their network was trained using 26000 OCT images representing 42 areas achieving an AUC of 0.998. Dong et al. [127] deployed a DL method to exploit information in multiphase CT nomograms in gastric cancer subjects. They used three cohorts to test the effectiveness of their model achieving a discrimination rate of 0.821, 0.797, and 0.822 in the primary, external validation, and international validation cohorts. Woerl et al. [128] deployed a DL method to identify bladder cancer from histomorphological images. They used 2 datasets of 407 and 16 subjects each from TCGA and custom cohorts, respectively, achieving accuracies of 69.91% and 75% on TCGA and custom subsets, respectively. Wang et al. [129] used the idea of weakly supervised learning exploiting image-level labels for the classification of lung cancer images. They used two datasets, one from TCGA and the other is a custom dataset. To enhance the training set, color jittering, translation, flipping, and rotation were used.
Their model successfully achieves an accuracy of 97.3% and an AUC of 85.6% on custom and TCGA datasets. Karimi et al. [130] used a DL method combining three separate CNNs that used different patch sizes for the classification of histopathological images with limited data. They used new data enhancement methods such as elastic deformation and augmentation in the space of learned features for binary classification of cancerous/benign and lowgrade-high-grade patches achieving an accuracy of 92% and 86%, respectively, on both binary classification tasks. Dascalu and David [131] used DL architectures for binary classification of benign/malignant cases of skin cancer subjects. They used a skin magnifier with polarized light and an advanced dermoscope to construct their datasets. The authors achieved an F2-score sensitivity of 91.7% and 89.5% respectively for skin magnifier with polarized light and advanced dermoscope images. The authors in [132] used DL techniques to build a skin cancer classification model for binary and multiclass classification of malignant and benign skin tumors. They used Kaohsiung Chang Gung Memorial Hospital and HAM10000 datasets in their study. Their model achieved an accuracy of 85.8% on the HAM10000 dataset for 7-class classification tasks. On the Kaohsiung Chang Gung Memorial Hospital dataset, their model achieved an accuracy of 72.1% for 5-class classification and 89.5% for binary classification tasks. Thomas et al. [133] applied interpretable DL models to classify skin cancers in a histopathological setting. They studied three types of cancers basal cell carcinoma, squamous cell carcinoma, and intraepidermal carcinoma. They deployed a multiclass (12 classes) classification model to achieve accuracies between 93.6% and 97.9%. To solve the class imbalance problem and to increase the size of the dataset, they used flipping and rotation as data augmentation methods to increase the size of the dataset 8 times. The authors in [134] developed a CNN model for the classification of melanoma and nevi. They used a dataset of 11444 images belonging to five categories. They deployed novel DL techniques to train a single CNN model. In addition, they also asked 112 dermatologists to grade the images. Then, they used a gradient boosting method to develop a new classifier for binary (benign/malignant) and multiclass (5 classes) classification tasks achieving accuracies of 86.5% and 82.95% on both these tasks. Sun et al. [135] developed a DL method to classify liver cancer subjects as abnormal/normal on publically available TCGA datasets. Transfer learning and multiple instance learning were combined for the classification of patch features. The authors used tissue extraction, color normalization, and patch extraction for preprocessing of histopathological images. Diao et al. [136] used a transfer learning-based CNN architecture named Inception version 3 to classify nasopharyngeal carcinoma subjects into three classes. They used a total of 1970 images of 731 subjects. The three classes considered in their study were chronic nasopharyngeal inflammation, lymphoid hyperplasia, and nasopharyngeal carcinoma. Their model achieved a mean AUC of 0.936. Liu et al. [137] used a CNN classifier to diagnose subjects with pancreatic cancer using contrast-enhanced CT images. They used three different datasets to test the effectiveness of their approach. The first dataset named local test set 1 has 295 patients with pancreatic cancer and 256 controls for training and 75 patients with pancreatic cancer and 64 controls for validation. The second dataset named local test set 2 has 101 patients with pancreatic cancers and 88 controls while the third dataset named the US dataset has 281 pancreatic cancer subjects and 82 controls. In local test set 1, local test set 2, and US datasets, their model achieved an accuracy of 98.6%, 98.9%, and 83.2%, respectively. To augment the datasets, the authors used moving window and flipping operations. Korfiatis et al. [138] compared the performances of ResNet-18, ResNet-34, and ResNet-50 architectures for the classification of MRI scans of 155 subjects for multiclass (3 classes) classification of no tumor, methylated methylguanine methyltransferase methylation, or nonmethylated classes. ResNet-50 architecture achieved the best performance with an accuracy of 94.9%; ResNet-34 architecture achieved an accuracy of 80.72% while ResNet-18 architecture achieved an accuracy of 76.75%. The authors in [139] used a two-phase training to study and mitigate class biasedness using a DL-based CNN model for the classification of breast cancer histological images. They conducted their experiments using MITOS12 and 2016 Tumor Proliferation Assessment Challenge datasets. Prior to phase 1 of training, segmentation using the global binary thresholding method was applied. In phase 1, a CNN was trained on the segmented patches using rotation and flipping data augmentation methods as well as the blue ratio histogram-based k-means clustering approach. In phase 2, the dataset was again modified to reduce the effects of class imbalance yielding an F-measure of 0.79. Campanella et al. [140] proposed a DL-based system utilizing information from multiple instances in order to help the pathologists exclude information without compromising performance metrics. They used 44732 whole-slide images belonging to 15187 patients. They achieved AUC above 0.98 and 100% sensitivity for prostate cancer, basal cell carcinoma, and breast cancer metastases to axillary lymph nodes. The authors in [141] proposed two DL-based systems to detect myeloid leukemia from the leukemia microarray genetic dataset. The first DL system is a single-layered neural network while the second one has 3 hidden layers. They used information of 22283 genes extracted from the Gene Expression Omnibus repository. Their models achieved accuracies of 63.33% and 96.67% for single and multilayered DL architectures with a significant normalization test (p > 0:05). Jeyaraj and Samuel Nadar [142] used a regression-based DL algorithm to investigate hyperspectral images to diagnose oral cancer. Their system extracted patches for classification into normal, benign, and malignant classes using BioGPS, TCIA, and GDC datasets. For 100 malignant image patch training, they achieved an accuracy of 91.4% while for 500 malignant image patch training, they achieved an accuracy of 94.5%. The authors in [143] proposed a DL method to study the relationship between genomic variations and traits. They analyzed 6083 sample exon sequencing files belonging to 12 cancer types. They used TCGA and 1000 Genomes Project. They performed both binary (cancer/healthy) and multiclass (12 classes) classifica-tion tasks using specific, total, and mixture models to achieve an accuracy of 97.47%, 70.08%, and 94.7% for specific, mixture, and total specific models for the identification of cancer.
Owais et al. [144] deployed a DL-based classification framework for the diagnosis of gastrointestinal diseases from endoscopic images. They deployed two datasets that are publicly available: Kvasir dataset and Gastrolab dataset. They followed a 2-step process. The classification network predicts the disease type in the first step, and then in the second step, the retrieval part shows the relevant cases. They performed multiclass (37 classes) classification using DenseNet transfer learning architecture, LSTM architecture, PCA, and KNN methods to achieve a correct recognition rate of 96.19% on this task. The authors in [145] proposed a CNN-based DL architecture for the multiclass (4 classes) classification of acute lymphoblastic leukemia. They used stained bone marrow images achieving an accuracy of 97.78%. Kann et al. [146] deployed a 3D CNN model to identify nodal metastasis and tumor extranodal extension. Their dataset has 2875 CT samples, 124 samples for validation and 131 samples for testing. They used a series of rotations and flipping technique to augment the datasets while achieving an AUC of 0.91. The authors in [147] proposed a DL approach to study the limited sample training problem from holographic images. They studied the classification of healthy and cancer cell lines. They used Generative Adversarial Networks (GANs) as the data augmentation method to train a large number of unclassified samples from sperm cells. Their model achieved an accuracy of 99% for healthy/primary cancer/metastatic cancer multiclass classification problems. Table 5 displays a summary of the studies for the task of classification of cancers covered in this subsection.
4.5. Classification, Segmentation, Prediction, and Detection of Brain Tumors. Sun et al. [148] proposed a 3D fully 17 Computational and Mathematical Methods in Medicine convolutional network-based multipathway architecture to extract features from MRI images from the BRATS 2019 challenge for the segmentation of brain tumor regions. They used the concept of dilated convolutions in each pathway to achieve a dice score of 0.89, 0.78, and 0.76 for whole tumor (WT), tumor core (TC), and enhancing tumor (ET) on the BRATS 2019 challenge, respectively. They used cropping, random slicing, and z-score normalization as the preprocessing methods. The authors in [149] proposed a novel architecture combining U-Net encoding and decoding sub-architecture, dilated convolutional feature extracting layers, and a residual module. Their proposed architecture achieved a dice score of 0.843, 0.897, and 0.906 and 0.798, 0.902, and 0.845 on ET, WT, and TC brain tumor subregions on BRATS 2018 and BRATS 2019 challenges, respectively. They used normalization and cropping techniques to preprocess the images. Khan et al. [150] utilized VGG16 and VGG19 transfer learning-based CNN models, partial least square covariance matrix, discrete cosine transform, and extreme learning machine to extract and classify features on BRATS  [157] developed a 3D U-Net model for brain tumor segmentation. They picked up an ensemble of models to extract features from brain MRI images on the BRATS 2018 challenge for segmentation and survival prediction. The authors achieved a dice score of (0.7946, 0.9114, and 0.8304) on (ET, WT, and TC) subregions for the segmentation task and an accuracy of 32.1% on the survival prediction task. The authors in [158] proposed an ensemble of deep CNN architectures integrating two and three paths of parallel models in a single model. They used 2D slices of brain MRI images from the BRATS 2013 dataset achieving a dice score of (0.86, 0.86, and 0.88) on (WT, TC, and ET) subregions. As a preprocessing step, they standardized the slices using the zero mean and unit variance normalization procedure. Naser and Deen [159] proposed a DL approach combining U-Net architecture, VGG16 transfer learning architecture, and a fully connected architecture for classification and segmentation of brain MRI images into lower-grade gliomas belonging to 110 patients. They used normalization, cropping, resizing, padding, rescaling, rotation, zooming, shifting, shearing, and flipping as preprocessing and data augmentation methods. Their approach achieved a dice score of 0.84 on the segmentation task and accuracy, sensitivity, and specificity of 92% on the binary classification (grade II/grade III) task. The authors in [160] proposed a multiscale 3D CNN architecture for the recognition and segmentation of 220 high-and 54 low-grade glioma MRI scans from the BRATS 2015 challenge dataset. As a preprocessing method, the authors used histogram matching to ensure consistency among gray levels. Their model achieved a dice score of 0.89 on the segmentation task, a sensitivity of 0.89, and a specificity of 0.90 on the recognition task. Chang et al. [161] proposed a DL model combining average pooling and max pooling layers along with 1 × 1 kernels. They further combined this model with conditional random fields to optimize prediction results. The authors used the BRATS 2013 dataset to achieve a dice score of (0.80, 0.75, and 0.71) on (WT, TC, and ET) subregions. As a preprocessing method, the authors used an intensity normalization method. The authors in [162] proposed a multiscale CNN model for the categorization of an MRI scan into healthy, meningioma, glioma, and pituitary tumor categories. The authors used 2D MRI images acquired from local hospitals in China to conduct their experiments. They achieved a dice score of (0.894, 0.779, 0.813, and 0.828) on (meningioma, glioma, pituitary tumor, and average), respectively, and accuracy of 97.3% on the classification task. As preprocessing and data augmentation methods, the authors used pixel standardization and elastic transformation methods. Table 6 displays a summary of the studies for the classification, segmentation, prediction, and detection of brain tumors covered in this subsection.

Discussion
The dynamics of cancer growth with respect to time are difficult to estimate. Precise measures can be made largely at the end of the cycle in cancer's evolution, when it is detached from the body. Ongoing mutations provide a rich history of clonal lineages which lead to changes in both genotype and phenotype. Psycho-oncology is a branch of oncology that deals directly with psychological and social issues. It deals with both emotional and psychobiological dimensions of cancer. However, there are still a number of obstacles in its wide adoption such as the dearth of medical practitioners as well as assessment tools and supporting instruments. It is important that both psychological and psychobiological factors influence the way cancers are treated. This domain must fulfill the demands for the availability of resources, support for caregivers and patients, and carving out new research directions for enthusiastic researchers [163,164].
Research in AI has proven its worth in the support of medical decision-making. Due to the unknown nature of these algorithms, their widespread adoption is still limited. Explanatory AI algorithms provide a solution to this problem. However, performance issues might hinder their adoption as well. Robustness, local attribution, and completeness are three key properties of an explainable AI system. One way to get around this problem is to find strategies that optimally merge explainable and nonexplainable AI models. Some solutions that point in this direction are winning the confidence of clinicians by marking the regions in an image that are involved in AI predictions; another way is to attack or deceive the DL models through adversarial augmentations as it could potentially reveal the important features and discard the unimportant ones. There is a close link between interpretability and explainability. An explainable model is interpretable, but the reverse connection may not hold. A prediction relying on thousands of parameters is neither interpretable nor explainable [165,166].
Precise DL model predictions are dependent on the availability of a large corpus of data (labelled or unlabelled), and it is a challenge to train it on a relatively small dataset. One way to look at this problem is through understanding the genetic evolution process. Gene transfer is the transfer of genetic information from a parent to its offspring. Genes encode genetic instructions (knowledge) from ancestors to descendants. The ancestors do not necessarily have better knowledge; yet, the evolution of knowledge across generations promotes a better learning curve for the descendants. There is a need for methods that can mimic this behaviour and use a limited number of examples to achieve their desired performance on different tasks [167]. Catastrophic forgetting is another problem limiting the performance of modern networks as they lack the ability to learn from continuous streams of data. The quality of the feature representation considerably determines the amount of forgetting. Boosting secondary information is the key to improving the transferability of features from old to new tasks without forgetting and is a promising direction for future work [168] especially for cancer diagnosis, prognosis, and prediction.
Despite the claims made by researchers, multiclass classification is an immensely difficult problem requiring a deeper understanding of human visual perception that moves beyond large datasets, and DL is perhaps necessary to solve many domain problems [169] including cancer diagnosis, prognosis, and prediction.
Another challenge that is worth mentioning is to find intricate hierarchical patterns from all forms of data such as labelled and unlabelled in a way that integrates information to perform visual inference. Unsupervised and semisupervised learning can help in this direction by offering potential solutions that help us in delving deeper into cancer pathogenesis and prediction tasks [170].
Can we use real-world images from another domain for calibration? Bridging the gap between cross-domain calibration and in-domain calibration is required to get optimal performance from neural networks. Techniques such as gram matrix similarity can be used as a criterion to select calibration datasets from a candidate pool to further improve performance [171]. This process can be used for effective feature construction in cancer diagnosis, prognosis, and prediction.
Modern DL object detection networks rely heavily on region proposal calculating algorithms to identify object locations. However, region proposal computation is a slow task. Faster region CNNs solve this problem by sharing convolutional layers with object detection subsystems. This process requires further research, and there is a need for improved computationally lightweight methods [25,26]. Cancer lesion detection can be improved by doing thorough research in this domain.
Modern DL networks rely heavily on global image statistics. This reliance can cause problems for these systems as shape and texture recognition is often better done at the local rather than the global level. Research in this domain can lead to better network generalization [172] holding the potential to improve cancer diagnosis, prognosis, and prediction.
Mitigating gradient explosion or decay in RNN training based on pondering over informative inputs to strengthen their contribution in the hidden state and finding computationally efficient ways for this purpose by suppressing noise in inputs or imposing novel constraints is a problem worth investigating [173].
Image recognition and image generation are two cornerstones of computer vision. While both are burgeoning fields, specialized techniques from both subareas can sometimes form a dichotomy. Historically, the field of DL was widely popularized in discriminative image classification with AlexNet architecture and image generation through GANs and Variational Autoencoders (VAEs). Novel data augmentation methods that force a network to pay attention to the moments extracted by layers of a deep network are a need of time [174] and can improve the performance of models in cancer diagnosis, prognosis, and prediction.
Further research should also target the discovery of novel objects (such as those having an aberrant organization, rare tumor, and foreign bodies), interpretable DL models (using influence functions or an attention mechanism), intraoperative decision-making, and tumor-infiltrating immune cell analysis. Some problems such as the appearance of whole-slide image as orderless texture-like image and color variation and artefacts are potentially hindering the performance of DL techniques [175] for cancer diagnosis, prognosis, and prediction.
Different types of imaging modalities like mammography, CT, MRI, and ultrasound have helped in the staging of cancer especially breast cancer. These systems have helped medical practitioners in the early identification of breast 21 Computational and Mathematical Methods in Medicine cancer [176]. For breast cancer, varying types of breast densities make masses very difficult to detect and classify in comparison to calcifications providing room for further research in this domain [177].
Other areas for potential research are scarcity of data, imbalanced datasets, missing data, and high dimensionality of patient data. Future work should be focused on testing and improving methods to achieve better performing DL models for cancer diagnosis, prognosis, and prediction tasks.

Conclusion
DL models have revolutionized the diagnosis and predictions of cancers. Data have been accepted in various forms and multiple sources. These models are excellent feature extractors, and their characteristics can improve cancer prognosis and prediction. Data augmentation is important for cancer diagnosis and prediction tasks to improve the final performance of systems. These methods will play a key role in making predictions about the cancer diagnosis and prediction tasks. However, further testing and validation are required on larger datasets for clinical applications. More research on data augmentation methods, learning in different domains such as frequency domain, and deploying novel architectures such as graph convolutional networks will likely improve their performance further.

Data Availability
No data were used to support this study.

Conflicts of Interest
No conflicts of interest exist between authors for the present study.

Authors' Contributions
A.B.T. conceived and designed the study. Y.K.M. and M.K.A.K. performed the analysis. All authors wrote and revised the draft manuscript.