A Survey of Dental Caries Segmentation and Detection Techniques

Dental caries detection, in the past, has been a challenging task given the amount of information got from various radiographic images. Several methods have been introduced to improve the quality of images for faster caries detection. Deep learning has become the methodology of choice when it comes to analysis of medical images. This survey gives an in-depth look into the use of deep learning for object detection, segmentation, and classification. It further looks into literature on segmentation and detection methods of dental images through deep learning. From the literature studied, we found out that methods were grouped according to the type of dental caries (proximal, enamel), type of X-ray images used (extraoral, intraoral), and segmentation method (threshold-based, cluster-based, boundary-based, and region-based). From the works reviewed, the main focus has been found to be on threshold-based segmentation methods. Most of the reviewed papers have preferred the use of intraoral X-ray images over extraoral X-ray images to perform segmentation on dental images of already isolated parts of the teeth. This paper presents an in-depth analysis of recent research in deep learning for dental caries segmentation and detection. It involves discussing the methods and algorithms used in segmenting and detecting dental caries. It also discusses various existing models used and how they compare with each other in terms of system performance and evaluation. We also discuss the limitations of these methods, as well as future perspectives on how to improve their performance.


Introduction
For easier segmentation and detection of dental caries, prior knowledge for several tasks is needed. ere is a need to understand the various sections of the tooth and the specific position of the lesion on the tooth. An understanding of the types of dental images to be used, for instance, panoramic or bitewing radiographs, is also needed. Furthermore, the specific regions or areas of interest which are required should be clear in order to be able to choose the suitable method for segmentation and detection of caries. All this information is required in order to achieve high performance segmentation and detection of dental caries.

Tooth Anatomy.
e tooth is a small white structure found in the jawbone of animals and human beings. In human beings, the number of teeth ranges from 20 primary teeth in children to 28-32 permanent teeth in adults. Further, the tooth can be broken down into three main layers: Class III: occurs on interproximal surfaces of anterior teeth, with no incisor edge involvement. Class IV: occurs on interproximal surfaces of anterior teeth with incisor edge involvement. Class V: occurs on the lingual or cervical third of the facial surface of the tooth. Class VI: occurs on the occlusal or incisor edge, worn away due to attrition.
From positional classification, caries can also be classified based on the severity of lesions on the tooth. is is done based on the amount of dentin and enamel that has been affected by the caries.
Incipient caries are caries that have a depth of less than half of the enamel of the tooth. Moderate caries are caries that are more than halfway through the enamel but do not touch the dentin. Advanced caries are caries that extend to the dentin region. Severe caries are caries that extend more than halfway through the dentin and even reach the pulp.
Identification of caries under classes I, IV, and VI can be done during clinical inspection, since the regions are visible orally. e introduction of X-rays in the medical field has greatly improved diagnosis of various ailments. In the dental field, radiography has improved the visual inspection of patient's teeth. X-rays have enabled professionals to be able to view previously unobservable regions of caries that would have gone untreated.

Dental Radiographs: X-Rays.
ere are varying degrees of information needed depending on the form of treatment required to diagnose a certain ailment. An X-ray or radiograph is a digital film that represents unobservable information not visible by the naked eye. Figures 2-4 show some types of radiographs. e work in [5] explains three types of X-rays that are commonly used to diagnose dental health. ere are two types of radiographs, intraoral and extraoral [6].
Intraoral radiographs: X-ray film captures radiographic images while inside the mouth. is type is subdivided further into bitewing radiographs, periapical radiographs, and occlusal radiographs. Extraoral radiographs: X-ray film captures radiographic images while outside the mouth. is type is subdivided further into panoramic radiograph, computed tomography (CT), and sialography.

Deep Learning Systems Application Areas
e introduction of deep learning systems has contributed to several application areas using medical images.

Digital Microscopy and Pathology.
Deep learning methods have been very popular in these areas, especially with the growing availability of tissue specimen. In this domain, deep learning techniques developed have had their main focus on three broad challenges. ese are segmentation of large organs, detecting, segmenting, and classifying nuclei, and also detecting and classifying region of interest of lesions. Other areas where deep learning techniques have contributed include normalizing histopathology and color normalization in image analysis. e work in [7] introduces a method for normalizing stains on histopathology images using autoencoders. Color normalization is further demonstrated by [8], where the use of convolutional neural networkss (CNN) for tissue classification in stained images was introduced. Brain. In this area, deep neural networks (DNN) have been used for brain image analysis in several domains, most notably the classification of Alzheimer's disease. Other domains include segmentation of brain tissues and anatomical structures, for instance, hippocampus.
ere are also the segmentation and detection of brain tumor, lesions, and microbleeds. ere are other tasks that require more anatomical information like white matter lesion segmentation, and [9] tackled such scenarios. ey lowered the sampling rate of nonuniformed sampled patches to cover a larger part of the region of interest.
Chest. Deep learning also has contributed to thoracic image analysis, on both tomography and radiology. Deep neural networks have addressed the detection, characterization,  Computed tomography (CT) scans have detected several diseases, including lung diseases, from a single image. Chest radiography is the most common radiology test and therefore uses a large set of images that are used to train systems. ese systems can be a combination of convolution neural networks used for image analysis and recurrent neural networks used for text analysis.
Eye. Deep learning algorithms have also been introduced in the analysis of eye images and have seen CNNs being employed to address the segmentation of anatomical structures. ese networks have also addressed the detection and segmentation of retinal diseases, diagnosis, and assessment of image quality. e work in [10] shows the performance of a Google Inception v3 network for diabetic retinopathy detection and compares the results with those of seven ophthalmologists.
Musculoskeletal. In this domain, deep learning algorithms are used for identification and segmentation of joints, bones, and soft tissues in images. e method used in [11] is one of the applications that trained their system with musculoskeletal disc images and had very good performance across some radiology scoring tasks.
Breast. Research on breast imaging has resulted in [12], which shows significant advances over the state of the art, achieving performance of human readers on the region of interest (ROI). e main task also is to detect breast cancer, and this consists of several subtasks. ese include detection and classification of lesions, micro classification, and cancer risk scores of images. e availability of huge amounts of image data has made mammography easier to perform and the most common method used for breast radiography.
Abdomen. Research on the abdomen is localized on the segmentation of organs, mainly pancreas, kidneys, liver (tumors), and bladder. e main radiograph used for most organs is the CT radiograph, as well as MRI only for prostate analysis. e method used by [13] introduces a hybrid neural network used to extract features that will be used further for classification. Cardiac. Cardiac image analysis has embraced deep learning in segmentation, tracking, classification, and accessing image quality. e MRI is used for radiographic testing. Reference [14] introduced the use of neural networks to segment the left ventricle using the recurrent connection of U-network architecture. Reference [15] used neural networks to perform regression for identifying some cardiac sequence on its model.

Factors That Lead to Dental Caries
[ e biology, prevention, diagnosis and treatment of dental caries Scientific advances in the United States] show factors affecting caries, namely, saliva, bacteria, diet, and hereditary.
Bacteria. Reference [16] shows how plaques form around the surface of the tooth and then come into contact with carbohydrates to form acids that dissolve the tooth structure.
is concept formed the basic foundation of dental caries. ere is no specific way of how bacteria affects the tooth; thus it is hard to control the dental caries disease. In the method [17], how dental caries occurs from bacteria associating with fermented carbohydrates is explained, thus being referred to as a diet-bacteria disease. Researchers have associated the caries process with fermented carbohydrates [18] and showed a relationship between sugar and caries leading to acids formed on the tooth's surface. Rather,[19] demonstrated how frequent sweet snacks consumption is related to dental caries.
Saliva. It plays a critical part in the well-being of soft and hard tissues of teeth inside the mouth [20]. When the saliva flow rate is low, this shows risk to dental caries. Further, [21] explained how saliva flow measurement is an important risk assessment and management measure for dental caries.

Prevention of Caries. Dental caries can be prevented through various ways:
Sealants: this can be done as [22] shows how sealants are introduced over the specific carious region to stop the decay process on the tooth. is prevents food particles from collecting in the tooth pits, thus preventing caries. Remineralization: this is done by hardening the tooth's enamel as explained by [23] to prevent dental caries from even taking place.  Fluoride: this is found from excessive fluorosis in drinking water; in [24], a practicing dentist, associated with the enamel [3], showed its effects on the enamel of the tooth. Research from [25] gave an optimal level of fluoride in water to prevent dental caries. Risk assessment: this entails the prior determination of someone's developing dental caries during a specific period of time as [26] explained to enable easier management. Assessment is very useful in determining whether extra diagnostic measures are required. e work in [27] explained how assessment assesses effectiveness of previous caries control measures and acts as a guide to treatment planning in the future. Further research has seen that caries can be prevented by saliva when fluoride application is combined with regular removal of forming dental plaques, realized by brushing teeth. e research in [28] explained how saliva and small amounts of fluoride contributed to the hardening of the enamel and thus low risk of dental caries.

Diagnosis of Caries.
Diagnosis of dental caries is done by the following: Clinical method: this is described by [29] and is explicitly defined as the visual detection of dental caries from an oral examination. ey used separators to visualize areas of interest and also used dental floss applied on these areas to detect roughness of the surfaces. Radiographic method: the radiographs also known as X-rays are digital images and [30] introduced them in dentistry. e work in [31] further intensified the use of dental radiographs in their research to detect caries. e work in [32] generated images of higher diagnostic quality compared to those found from conventional films to detect caries in bitewing X-rays.

Treatment of Caries.
Focus has shifted from surgical ways to the management of caries via restoration of tooth structure development or implants. e research work in [33] emphasized the prevention of the disease, remineralization steps, and minimizing the access to caries affected regions to avoid further decay.

Image Representation for Dental
Segmentation and Detection e representation of dental images is done through the segmentation of various regions of interest from the larger image to locate objects. Segmentation of images is, therefore, partitioning of an image into several segments to be used to identify objects and their edges. Image segmentation can be categorized according to similarity and discontinuity properties. Discontinuity based methods are referred to as boundary-based methods, while similarity-based methods are referred to as region-based methods [34]. erefore, the segmentation process is based on dividing an image into groups of similar characteristics and features. Mathematically, segmentation of an image R is a finite set of

Categories of Dental Images Segmentation.
Research done by [35,36] further categorizes segmentation methods according to various characteristics such as region, entropy, shape, threshold, and pixels correlation among others. ese characteristics were from thermal, X-ray images to aid analysis of specific points or regions of interest. Research studies show that dental image segmentation is classified as region-based, cluster-based, threshold-based, boundarybased, and watershed-based methods.
Region-based: it divides an image into several regions based on discontinuities of pixel intensity. Reference [37] explained how segmentation of dental panoramic X-ray images assists dentists to detect osteoporosis disease. e work in [38] also used the region-based approach in segmentation of bitewing dental X-ray images.
reshold-based: this is done by choosing a threshold value from pixel intensities of an image. en, pixels that exceed the threshold value are placed into a region, while those that are below the threshold value are placed in an adjacent region.
Cluster-based: it is the automatic grouping of image data based on certain degrees of similarity between the data. e degree of similarity depends on the problem being solved. e algorithm used to perform clustering of data uses the automatically detected groups as initial parameters. Research by [49] performed jaw lesion segmentation using the fuzzy C-means method. e work in [50] used a semifuzzy supervised clustering algorithm to segment dental radiographs.
Boundary-based: it is used to find edge or point discontinuities on images. It detects color or pixel intensity discontinuities in the gray levels of the image. Active contours are used by [51][52][53], as one of the approaches to segment images based on their boundaries. e approach performs segmentation by outlining an object from an image and is also referred to as the snake method. Level set method (LSM) is another approach for detecting boundaries in an image. It handles segmentation by performing geometric operations to detect contours with topology changes. Examples of works on boundary-based segmentation are [54,55], which used LSM to segment radiographic images.
Watershed-based: it is performed on a grayscale image and used mathematical morphology to segment adjacent regions in an image; watershed-based segmentation was used by [56] on bitewing dental radiographs. It was also used by [57] as a combination of the K-means 4 e Scientific World Journal clustering and the watershed method for color-based segmentation.

Diagrammatic Representation of Dental Image Segmentation.
is section aims at giving pictorial representations of various segmentation methods, as can be viewed below in Figure 5. e work in [38] proposed a novel way of finding region of interests in both the gap valley and tooth isolation using edge intensity curves. It used the region growing approach [58] to detect the region of interest. It further used canny edge detection algorithm [59] to detect the edges of the isolated teeth.
A dental classification method by [60] proposed a classification of periapical images via the fuzzy value, which takes care of dental orientation problems such as missing teeth. ey also used the dental universal numbering system to categorize teeth into incisors, canine, molar, and premolar as shown in Figure 6.
Segmentation of the images is done through multiscale aggregation [61], which deals with pixel distortions in image data. Integral projection is used to detect horizontal individual teeth, and this is further classified according to one of the four incisor, molar, premolar, and canine categories. Figure 7 shows images of the above-mentioned method.
Additionally, [62] proposed the use of faster regions of a convolution neural network to detect and number periapical dental images. A filtering algorithm is used to delete overlapping boxes detected by the faster convolution network. A neural network is introduced to detect missing teeth. A rule-based teeth numbering system is used to match labels of detected teeth boxes to modify results that violate set rules. e work in [63] explained how they achieved instance segmentation through the use of a mask regionbased convolution neural network.
is system is an extension of [64], which includes a section of convolution networks to achieve the task of instance segmentation ( Figure 8).
Features are extracted from ResNet101, which then compose the feature pyramid network (FPN). e FPN defines anchors and extracts regions of interest (ROI). e region proposal network is formed from a combination of anchors and the feature pyramid network (FPN) ( Figure 8). Finally, regions of interest are aligned to the same size and, further, each fixed size is classified as a tooth or a background (class scores). e fixed size features are localized by regression of bounding box coordinates. Finally, pixels are segmented by the full convolution network (FCN) [23] in each detected mask as seen in Figure 9. Table 1 clearly shows related works grouped by various categories and methods. A diagnostic system proposed by [66] consists of Laplacian filtering, window-based adaptive threshold, morphological operations, statistical feature extraction, and back-propagation neural network. e back-propagation neural network is used to classify a tooth surface as normal or having dental caries. A study by [67] analyzes feature extraction performance of dental caries image using Gray Level Cooccurrence Matrix (GLCM) algorithm for contrasted two types of caries based on the theory of GV Black, namely, dental caries Class 3 and Class 4.
A CNN model using a U-shaped deep CNN (U-Net) proposed by [68] was for dental caries detection on bitewing radiographs, and it was further investigated whether the model can improve clinicians' performance.

Feature Extraction Techniques for Dental Images
Feature extraction on an image is done by the use of various methods depending on its texture, pixels, and color intensity. e method by [69] proposed evaluating the performance of various texture feature maps to recognize demineralization of caries. It used intraoral image analysis that includes run-length matrices (RLM), first-order features (FOF), cooccurrence matrices, gray tone matrices, local binary patterns, and K-means clustering to transform images of confirmed caries cases. e performance from the different feature maps was compared to that of radiographic images by several radiologists. Feature maps are a product of extraction of features from an image by various methods and techniques that include the following. First-order features (FOF): this is a formalised description of the detection of probability of a particular intensity within data in an image, and this shape is used to determine image parameters such as contrast, sharpness, and other objects. For instance, for an image I with spatial resolution W * H and range intensity G, its histogram is defined as Several equations can be derived from the above equation, namely, mean, variance, and entropy, and can be represented as Run-length matrix is defined by changes in the illumination of pixel values on an image [70], to explain the coarse texture of pixels can be expressed by a larger section of similar color encoded as a matrix. Other works that have embraced this method include [71][72][73].
Gray level cooccurrence matrix (GLCM): this, according to [74], shows texture relations between adjacent pixels in the image texture. e matrix is calculated from entries that represent probabilities of coexisting gray tone pixels that are next to each other, and the distance between the pixels is a e Scientific World Journal 5 feature parameter. Other features that can be extracted by this matrix include contrast, entropy, variance, some average, and homogeneity. Related works include [75,76]. Gray tone difference matrix (GTDM): this matrix, according to [77], is described as gray tone textural properties, which is the difference in pixel intensity level I and illumination I in a K * K neighborhood described as  Another method proposed by [78] combines GTDM, LBP, and K-means clustering for feature extraction.
Laws' texture energy measures: according to [79], a method that utilised set masks to calculate local energy of an image was proposed.
ese masks further detect textural characteristics such as edges, spots, levels, and ripples. Other works that use this method include [80,81].
Local binary patterns (LBP): it detects small structures such as edges, lines, and spots [82], on the skin, thus representing them as binary patterns. For pixel inputs (x c , y c ), a neighborhood of radius R and several evenly sampled points on radius P are specified. Local binary pattern function is given by Here, s[.] operator returns 1 for positive values and 0 for negative values. Figure 10 shows a 3 * 3 local binary operation concatenating all 8 bits to output a binary number, which is converted to a decimal number. is decimal number is the LBP code and is assigned to the center pixel. Other works that use LBP include [83] and [84].
Clustering: this used K-means method to cluster pixels. e intensity of pixels is taken into account; also prior information about the number of clusters should be defined. Euclidean distance is the metric used when adding individual pixels to the nearest cluster. A cluster value is set to the average value of all pixel intensities, and this procedure is iterated until a specified threshold is met [85]. Related works include [86][87][88]. Figure 11 shows the feature maps of the extraction techniques discussed.
According to [89], a textural feature system for diagnosis of dental caries in radiographs was used. is system introduced other feature extraction techniques such as the following.
Gabor filters: they were originally introduced by Gabor, D. (1946) and extracted edge-like components with very high frequency in a local region of an image. ey are described as the best texture descriptor due to their use in segmentation, object recognition, tracking of motion, and image registration. Spatial domain Gabor is defined as Here σ x and σ y are standard deviations for (x, y) axis distribution, and the sinusoidal frequency is denoted by W.
Other related studies on Gabor filters include [90][91][92]. Local ternary patterns: this, from [93], shows how LBP is extended to a three valued code from two values. e ternary code is got from comparing neighboring pixel values with a set threshold value τ. Values that are within the threshold value are set to 0, those above it are set to +1, and those below the threshold value are set to − 1. e threshold function value is defined as where τ is the set threshold, x c is the value of the central pixel, and x i are the neighbouring pixels of x c . Diagrammatically, the process can be displayed by a Figure 12 LTP code for a 3 × 3 matrix image region. Other related works include [94][95][96].
Morphological gradient: the method introduced by [97] is used to increase the intensity of boundary edges of an image.
is method makes it easier to observe edge boundaries and other objects clearly [98] and identify dental caries on teeth. Other works that involve the mMG method include [99][100][101]. Multiple morphological gradient consists of several encryption elements that aid in processing of images and include gradient, multiple values, and threshold.
Morphology gradient: it increases the intensity of the edges of objects in the image, and this is done by dilation subtracted by erosion morphology: (A⊕B) − (A⊖B).   Boundary-based Active contours, level set method [51][52][53][54][55] Watershed-based Watershed [56,57] e Scientific World Journal 7 Here w is the bit depth. reshold: this is done by separating pixel values into two classes, 0 and 1, which are dependent on a constant threshold value q. According to [99], thresholding is defined as Here 0 < q ≤ A max . Table 2 shows various works grouped by feature extraction method.
A novel deep convolution layer network (CNN) with a Long Short-Term Memory (LSTM) model was proposed by [103] for the detection and diagnosis of dental caries on periapical dental images.

Deep Learning Methods and Algorithms
Research by [104] explains how machine learning algorithms are divided into unsupervised and supervised learning algorithms.

Learning Algorithms.
Supervised learning: the model is represented with a dataset D � x, y N n�1 of input features x and y. ey can take different forms depending on the learning task, for instance, with classification y will be a scalar representing a class label. With regression, it will take a vector of continuous variables. In dealing with a segmentation model, y can be a multidimensional label image; basically supervised training finds model parameters θ that best predict data given a loss function L(y, y) .
is y describes the output of the model obtained by feeding x to f(x; ⊖) that represents the model.
Unsupervised learning: there are algorithms that are trained to find patterns, such as data without labels, for instance, clustering and principal component analysis methods.
ese algorithms can be performed on models with different loss functions.

Neural Networks.
ese are networks that contribute to deep learning systems. Neural networks consist of neurons with some activation a and parameters θ � W, β , where W is weight and β is bias. Activation preselects the linear combination of input x to a neutron and its parameters, followed by element on element (.) for nonlinearity, referred to as the transfer function: a � σ(w T + b). Transfer functions for traditional neural networks are sigmoid and the hyperbolic tangent function. Multilayered neural networks known as perceptrons consist of several layers of these transformations: Here W n is a matrix comprising w k rows with k as activation on the output. n is the number of current layers, and L is the final layer. Hidden layers are layers in between the input layer and the output layer. When a network contains multiple hidden layers, it is referred to as a deep neural network.
Activations done to the final layer which is the output layer of the network are mapped to a distribution over the number of classes P(y|x; θ) through a softmax function: Here W L i is the weight of the output node associated with class i.
Stochastic gradient descent is the method used to fit parameters to dataset D. A small subset of the dataset is used  log P y n |x n ; θ .
is further leads to binary cross entropy loss for two class problems and categorical cross entropy for multiclass tasks. Deep neural networks gained popularity in 2006 and, in the method in [105], it was explained how DNNs were pretrained layer by layer and their stacked network fine-tuned to produce good evaluation performance. Currently, the most used deep learning popular networks are the convolution and the recurrent networks.

Convolution Neural Networks.
ese networks have weights that are shared in a manner that the network performs convolutions on images. is means that there is no redundancy in the way the model learns separate detectors for the same object that occurs at different position on an image. It reduces the number of parameters to be used for learning. At each layer of the network, the image is convolved with a set of K kernels W � (W 1 . . . W K ) , with biases β � (b 1 . . . b K ) and each generating X k feature map. e features are subjected to a nonlinear transform σ(.) and the process iterated for every convolution layer l given by e convolution neural network also has pooling layers that aggregate pixels and their neighbors using an invariant function, which is the maximum operation. Aggregation is done to subsequent convolution layers and, at the end of the stream, a regular network layer is added where weights are not shared. e network is trained by feeding activations to the output layer through a softmax function. Under the CNNs, there exist common architectures that are widely used in the analysis of medical images and include the following.
General classification architectures: they include networks such AlexNet [106], which, unlike its precursor LeNet [107], consists of five convolution kernel layers employed in the input and output. AlexNet incorporated rectified linear units (ReLU) as their activation function, and this has become the most common choice in CNNs. Further, there has been an interest in using smaller kernels instead of single layers of kernels with a large receptive field; this in return has less number of parameters. Research by [108] discussed the use of deeper neural networks that have small fixed size kernels in each layer of the 19-layer model, referred to as VGG19 network model. More complex building blocks have been introduced to deep networks, to improve efficiency in the training process, and also reduce the amount of parameters used. ere is also [109], where GoogLeNet is a 22layered network that uses inception blocks with a set of convolutions of different sizes. e ResNet architecture was introduced and consists of ResNet blocks which, instead of learning a function, learn mappings in each layer that is close to the identity function [64]. ere has been an increase in the use of these deep architectures due to their low memory use, and this even contributed to a recent version of Goo-gLeNet referred to as Inception v3 [110].
Multistream architectures: they are networks that accommodate multiple inputs in form of channels towards the input layer and then are later merged at any point in the network. Image processing can be done by the use of multiscale image analysis and classification for brain lesions [111]. Multistream architecture [112] is used for segmentation of natural images. Challenges of deep learning systems in the medical imaging domain are in adapting existing architectures, for instance, with different input formats (2D or 3D) data. Volumetric data can be divided into slices and fed as different streams to a network to avoid a result of large amounts of parameters. ese techniques can still be used to perform knee cartilage segmentation [113]. e multistream architectures can be used for classification in the context of medical imaging [114].
Segmentation architectures: they are specific to the task of segmentation of medical images. CNNs are used to classify individual pixels with those in their neighborhood in an image. To avoid redundancy in the classification of the pixels, fully connected layers are rewritten as convolutions and this helps the CNN to take in input images larger than what it was trained on and produce a likelihood map. e resultant fully CNN (fCNN) can then be applied to an entire volume of images. is further leads to low resolution of output compared to the input images due to pooling of layers. ere is a technique that applies the FCNN to shifted versions of the input, followed by stitching the result together to obtain a full resolution of the final output [115]. e FCNN was improved by proposing the U-net architecture that has convolutions in its downsampling and later an upsampling task to increase the image size [116]. Another method added skip connections to U-net architecture to connect downsampling and upsampling of the convolution layers [115]. A similar approach was used by [117], for 3D data, and [118] incorporated residual blocks and Dice loss layer to the U-net architecture instead of the commonly used cross entropy.

Recurrent Neural Networks.
ese networks were developed for discrete sequence analysis and have varied lengths for both inputs and outputs, thus making them suitable for tasks such as machine translation. In a classification task, the model learns distribution over classes P(y|x 1 . . . x T ; θ) given sequence x 1 . . . x T as input. e RNN has a hidden state h at time t which is the output of a nonlinear mapping from x t and previous state h t− 1 : h t � σ(Ex t + Rh t− 1 + b). Here W, R are weight matrices that are shared over time. For a classification task, several fully connected layers are used and followed by a softmax to map the sequence over the classes as RNNs have problems of memory shortage similar to those of other deep neural networks. Several techniques have 10 e Scientific World Journal been developed, such as the Long Short-Term Memory (LSTM) cell [119], to deal with problem. A simplification of the LSTM is the gated recurrent unit (GRU) [120]. RNNs are increasingly being adopted with promising results, for instance, [121], in the human brain challenge.
Here, h is smaller than x to prevent the model from learning the identity function. According to [122], another solution was introduced to prevent the model from learning the identity function. e model uses a denoising autoencoder that trains the model to reconstruct input from noise. Deep autoencoders are realized by placing autoencoder layers on top of each other, and in most cases the layers were trained individually. Examples of autoencoders include the following.
Variational autoencoders: with these, [123] introduced the use of conditional variational autoencoders for pathology detection in medical images.

Adverserial Networks.
ese are used for image generation tasks and include works from [124,125].

Deep Belief Networks.
ese are a type of Markov random field (MRF) which constitute an input layer x � (x 1 . . . x N ) and a hidden layer h � (h 1 . . . h M ) having a latent feature representation. Its connections are bidirectional and thus a generative model that can be sampled for new data points. An energy function can be defined for a state (x, h) of input and hidden layers as Here c and b are the bias terms. Further, the probability of the system is given by is makes computing the partition function Z intractable, while conditioned inference in computing h conditioned on x is tractable and is given by e use of DBNs can be to fuse medical images [48]. DBNs also are used to extract high level features from medical images and effectively classify them [126].

Software and Hardware.
ese are the processors and software programs used to handle the running of various deep learning techniques. ey include GPU computing libraries such as CUDA and OpenCL, these being very fast processor units compared to the previously used CPUs.
ese GPUs work hand in hand with the available opensource software programs that provide a platform to implement various operations of neural networks such as convolutions. Most popular packages include Caffe [127], TensorFlow [128], eano [129], and Torch [130] that provide interfaces for implementing various operations in deep learning. ey are also third-party packages written on top of frameworks like Keras [131] and GitHub [132]. Figure 13 shows a representation of deep learning architectures.

Deep Learning Uses in Medical Images
ese systems can be used in various tasks like segmentation, classification, detection, and registration.

Classification.
In this approach, one has a single input or multiple inputs with a single variable as output. For instance, in a disease classification setup, one has the disease or not and diagnosis of the disease is based on a sample of the dataset. Transfer learning is therefore realized and is defined as the use of pretrained networks to cover very large datasets for deep networks training. Transfer learning can be used to fine-tune a pretrained network on medical images and also for feature extraction on image data. ese processes are beneficial in saving time used to train deep networks and enable extracted features to be analyzed faster. Object classification is usually done on a small part of the image and can be divided into two or more classes for analysis. For instance, [133] used three CNNs, each taking a nodule patch as input and each feature output is concatenated to form a final feature vector. Multistream CNNs are used to classify skin lesions, with each stream working on different resolutions of the image [134].

Detection.
Detection of objects and their respective regions of interest in images is an important part of diagnosis by medical practitioners. e task consists of localization and identification of small regions in a full image. ese systems are designed to automatically detect regions and decrease the reading time of human experts. An example of detection is [98], which explained the use of edge detection method to detect dental caries on dental images.

Segmentation.
Segmentation of various organs allows deep analysis of several parameters related to volume and shape, for example, in skin or breast cancer analysis. It is always the first step before the detection process and is defined as the identification of the pixels which make up the interior or exterior contour of the object of interest. ere have been a wide variety of methods developed to segment images using deep learning medical images. Segmentation is e Scientific World Journal 11 needed to perform accurate segmentation on 3D CNNs using multistream networks with different scales [135].

Registration.
Registration is a common image analysis task in which coordinate transforms are calculated from one image to another. is is done iteratively assuming parametric transformations and a predetermined metric is optimized. Deep networks can benefit from registration by estimating a similarity measure for two images to drive an iterative optimization, for instance, [136]. ey can also be used to predict transformation parameters using regression, for example, [137].

Databases Used by Dental Images
ere is a need for efforts to build or come up with a public dataset to aid in developing of algorithms to be used in the dental imaging area. In order for this to come to pass, researchers need to release data used in their papers, and this will lead to a repository that can reliably catalogue and archive publicly dental imaging data. Several examples of  12 e Scientific World Journal datasets exist and include the digital database for screening mammography (DDSM) [138], which is used for mammogram image analysis and aids in screening of breast cancer. ere exist several databases that are used in dental imaging and these include the following: PASCAL VOC 2007: this is from the results from the pascal visual object classes challenge of 2007 [139]. Caltech101: this is a dataset of images to facilitate computer vision and its techniques [140]. is dataset is also applicable in image recognition and classification and contains a total of 9,146 images split into 101 distinct object categories. NORB: it is used for experiments in 3D object recognition shapes [141]. e dataset contains 50 toys belonging to five generic categories that include aeroplanes, cars, four legged animals, and trucks. CIFAR-10/100: this consists of 60000 32 × 32 color images in 10 classes, with 50000 training images and 10000 test images [142]. MNIST: it is an acronym for modified national institute of standards and technology, and it is a dataset of handwritten digits commonly used for training and testing data in the machine learning field. It contains 60000 training images and 10000 testing images [143]. LabelMe: it is a dataset and web based tool used for image annotation [144] and was created by MIT computer science intelligence laboratory (CSAIL). It is applicable in computer vision and is very dynamic, free, and open to public contribution. It contains 187240 images, 62197 annotated images, and 658992 labelled objects. ImageNet: it is a large dataset used for visual object recognition and has more than 20000 categories with each category containing several hundred images [145]. Summary: both MNIST and CIFAR-10/100 datasets are available for the public, while the other datasets can be accessed by directly contacting the researchers. Dental datasets are difficult to find and thus researchers prefer using available public datasets. ImageNet is one of the datasets that is used for evaluations by several dental imaging models like ResNets and VGGNets.

Evaluation Protocols for Dental Images
Deep learning techniques are used on problems having very large datasets with thousands of instances, and therefore they need a way to estimate the performance of a given data configuration and use this for comparison with performances of other configurations. One of the ways is splitting data [146], since very large datasets require long training times.

Splitting Data.
e data is split into training and testing data splits; for instance, Keras library for deep learning provides two ways of handling the splitting of data. It can split your data into a validation set and evaluate the performance of your model on that validation set. is is done by setting the validation split argument on the fit function to a certain percentage of your training dataset such as 30% for validation.

Manual K-Fold Cross Validation.
is is used as the standard evaluation method for machine leaning techniques.
is method splits the dataset into k-subsets and trains the model on all the subsets one after the other except one subset that is left out as the validation set. Evaluation is done on the left-out subset, which is the validation set, and the performance is averaged across all models created. e crossvalidation method is applicable on small deep learning models and is used with 5 or 10 folds. We also have the method in [147] as the training-testing split method.

Training-Testing Split.
Data is split into two parts, training set and testing set. A model is then fitted to the training set, and then the fit model is used to make predictions on the testing set. is is further used to evaluate the skill of the predictions and thus referred to as the trainingtesting split. Training-testing split is used as an estimate of how well the model performs on a dataset, especially when presented with new data. is method is preferred, especially with very large data and slow model to train. is is because the skill score for the model is noisy due to randomness of data. e randomness of data makes the model flexible but makes it less stable; for instance, you get different results from training the same model. is can be controlled by introducing a random seed and repeating experiments multiple times. e use of random seed is basically just using the same randomness every time the model is being fit and evaluated. Get an average of the estimated model skill after running the experiments multiple times. We also have [148] that also gives a description of the confusion matrix.

Confusion Matrix.
It is an N * N matrix with N number of classes being predicted and is used mostly with class output models. e matrix consists of several metrics that include the following: Accuracy: is the total number of correct predictions. Precision or positive predictive value: is a proportion of positive correct predictions. Negative predictive value: is a proportion of negative correct predictions. Recall or sensitivity: is a proportion of actual positive cases correctly identified. Specificity: is a proportion of actual negative cases correctly identified.

F1-Score.
is is defined as the harmonic mean of the precision and recall values for a classification problem task. e F1-score is given by e Scientific World Journal 13 precision.recall precision + recall .

(21)
Harmonic mean is preferred over the arithmetic mean because it takes care of extreme values, for instance, those in a binary classification model. e F1-score can also be adjusted to increase effectiveness by adding β as an adjustable parameter to get precision.recall

Conclusion
From this survey, various techniques, methods, and approaches have been discussed concerning the segmentation and detection of dental images. Works that stem from the industry and academia have been mentioned and discussed, which include existing algorithms, segmentation and detection methods, databases, and various protocols for evaluating performance. ere is a huge potential for use of dental radiography and especially work focused on detection of dental images. Most of the existing systems dwell much on dental segmentation and not on feature extraction (detection). ere is a need to improve existing dental detection systems, and one way to do so is by the introduction of automatic blob detection technique. Blob detection has been used in other fields of medical imaging but has not seen substantive use in the field of dental imaging. e use of such image analysis techniques to determine the presence of caries aims to create a system that takes a human diagnostic approach, whereby dental caries are diagnosed based on visual interpretation of teeth.

Future Work
ere are issues and encouraging future perspectives of study which have popped out from our discussions here and they are highlighted as follows: Data availability and reliability: Deep learning networks require large amount of data to be able to achieve meaningful and effective performance results. Due to the nature of dental images, there is a need for hybrid datasets to aid good performance of the networks.
ere is a need for publicly available datasets for dental images to make deep learning in the field possible. Data standardization: Many methods discussed here are handling the preprocessing step through manual methods, such as cropping the region of interest on an image. ese methods contribute to the loss of some key details from the images. Some networks end up dividing a whole image into subregions, and this slows down the learning process that occurs one subregion after the other. ere are methods like downsampling which might lead to deletion of important details and this seems to have been due to limitations in computational power. Deep learning approaches have seen increased learning on whole images rather than the need for manual manipulation of images at the preprocessing stage in order to get more general and accurate results. Weight regularization methods: Deep learning networks can also be improved by introducing weight regularization to improve their performance. e regularization of weights involves optimization of model hyperparameters such as the learning rate and the dropout rate. Optimization of models at several key processes, such as the use of dropout at the segmentation stage, could improve the final result, and future work will look at how to implement such optimizations. Basically, weight regularization methods are introduced into networks for parameter optimization. Furthermore, introduction of early stopping regularization technique will also help in reducing overfitting, which is always a major problem with existing models. Hybrid approaches: Deep neural networks can also be achieved by combining several models or methods to form hybrid networks that will improve overall evaluation performance. e combination can be in any stage of the model, for instance, combining two preprocessing techniques to come with a single one to enhance image quality. is combination can also be handled by joining various attributes of different models to form one hybrid model that will enhance training, extraction, detection, and classification of objects. Combination of different convolution networks to form one hybrid network will be a good area to explore. is will save the long training and testing times that come with large networks having many convolution networks.
Oral healthcare follow-ups from medical practitioners are very important for risk assessment and management of dental caries [149-153].

Data Availability
Deep learning networks require large amount of data to be able to achieve meaningful and effective performance results. Due to the nature of dental images, there is a need for hybrid datasets to aid good performance of the networks. ere is a need for publicly available datasets for dental images to make deep learning in the field possible Conflicts of Interest e authors declare that they have no conflicts of interest.