Deep Convolutional Neural Networks for Chest Diseases Detection

Chest diseases are very serious health problems in the life of people. These diseases include chronic obstructive pulmonary disease, pneumonia, asthma, tuberculosis, and lung diseases. The timely diagnosis of chest diseases is very important. Many methods have been developed for this purpose. In this paper, we demonstrate the feasibility of classifying the chest pathologies in chest X-rays using conventional and deep learning approaches. In the paper, convolutional neural networks (CNNs) are presented for the diagnosis of chest diseases. The architecture of CNN and its design principle are presented. For comparative purpose, backpropagation neural networks (BPNNs) with supervised learning, competitive neural networks (CpNNs) with unsupervised learning are also constructed for diagnosis chest diseases. All the considered networks CNN, BPNN, and CpNN are trained and tested on the same chest X-ray database, and the performance of each network is discussed. Comparative results in terms of accuracy, error rate, and training time between the networks are presented.


Introduction
Medical X-rays are images which are generally used to diagnose some sensitive human body parts such as bones, chest, teeth, skull, and so on. Medical experts have used this technique for several decades to explore and visualize fractures or abnormalities in body organs [1]. is is due to the fact that X-rays are very effective diagnostic tools in revealing the pathological alterations, in addition to its noninvasive characteristics and economic considerations [2]. Chest diseases can be shown in CXR images in the form of cavitations, consolidations, infiltrates, blunted costophrenic angles, and small broadly distributed nodules [3]. By analyzing the chest X-ray image, the radiologists can diagnose many conditions and diseases such as pleurisy, effusion, pneumonia, bronchitis, infiltration, nodule, atelectasis, pericarditis, cardiomegaly, pneumothorax, fractures, and many others [4].
Classifying the chest X-ray abnormalities is considered as a tedious task for radiologists; hence, many algorithms were proposed by researchers to accurately perform this task [5][6][7]. Over the past decades, computer-aided diagnosis (CAD) systems have been developed to extract useful information from X-rays to help doctors in having a quantitative insight about an X-ray. However, these CAD systems could not have achieved a significance level to make decisions on the type of conditions of diseases in an X-ray [2][3][4]. us, the role of them was left as visualization functionality that helps doctors in making decisions.
A number of research works have been carried out on the diagnosis of chest diseases using artificial intelligence methodologies. In [1], multilayer, probabilistic, learning vector quantization, and generalized regression neural networks have been used for diagnosis chest diseases. e diagnosis of chronic obstructive pulmonary and pneumonia diseases was implemented using neural networks and artificial immune system [8]. In [9], the detection of lung diseases such as TB, pneumonia, and lung cancer using chest radiographs is considered. e histogram equalization in image segmentation was applied for image preprocessing, and feedforward neural network is used for classification purpose. e above research works have been efficiently used in classifying medical diseases; however, their performance was not as efficient as the deep networks in terms of accuracy, computation time, and minimum square error achieved. Deep learning-based systems have been applied to increase the accuracy of image classification [10,11]. ese deep networks showed superhuman accuracies in performing such tasks. is success motivated the researchers to apply these networks to medical images for diseases classification tasks, and the results showed that deep networks can efficiently extract useful features that distinguish different classes of images [12][13][14][15]. Most commonly used deep learning architecture is the convolutional neural network (CNN). CNN has been applied to various medical images classification due to its power of extracting different level features from images [11][12][13][14][15].
Having gone through the related research studies, in this paper, a deep convolutional neural network (CNN) is employed to improve the performance of the diagnosis of the chest diseases in terms of accuracy and minimum square error achieved. For this purpose, traditional and deep learning-based networks are employed to classify most common thoracic diseases and to present comparative results. Backpropagation neural network (BPNN), competitive neural network (CpNN), and convolutional neural network (CNN) are examined to classify 12 common diseases that may be found in the chest X-ray, that is, atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, edema, emphysema, and fibrosis ( Figure 1). In this paper, we aim at training both traditional and deep network using the same chest X-ray dataset and evaluating their performances. e data used in the paper are obtained from the National Institutes of Health-Clinical Center [16]. e dataset contains 112,120 frontal-view X-ray images of 30,805 unique patients. is paper is structured as follows: Section 2 presents the methodologies used for diagnosis chest diseases. A brief explanation of the BPNN, CpNN, and CNN is given. A description of the convolutional neural network used for diagnosis chest diseases and its operating principles are presented. Section 3 discusses the results of simulations of the networks used, in addition to the database description. A comparison of the performances of the networks used in simulations is given in Section 4, and Section 5 is the conclusion part of the paper.

Backpropagation
Neural Network (BPNN). Backpropagation neural network (BPNN) is a multilayer feedforward neural network that uses a supervised learning algorithm known as error back-propagation algorithm. Errors accumulated at the output layer are propagated back into the network for the adjustment of weights [16][17][18][19]. Figure 2 depicts a conventional BPNN which consists of three layers: input, hidden, and output. As seen in Figure 2, there is no backward pass of computation except the operations used in training. All the functioning operations proceed in the forward direction during simulation. e pseudocode algorithm for BPNN is given below [20].
(i) Network initialization: randomly choose the initial weights (ii) Select first training pair (iii) Forward computation that includes the following steps: (a) Apply the inputs to the network (b) Calculate the output for every neuron from the input layer, through the hidden layer(s), to the output layer (c) Calculate the error at the outputs (iv) Backward computation (a) Use the output error to compute error signals for preoutput layers (b) Use the error signals to compute weight adjustments (c) Apply the weight adjustments (v) Repeat Forward and Backward computations for other training pairs. (vi) Periodically evaluate the network performance.
Repeat Forward and Backward computations until the network converges on the target output.
To calculate outputs for each neuron based on the input pattern, the equations below can be used. e output of the j-th neuron for the pattern p is O pj : where k ranges over the input indices, W kj is the weight on the connection from k-th input to j-th neuron, and b j is the bias weight for the j-th output neuron.
To calculate the error signal at the output, the equations below can be used: where T pj is the target value of the j-th output neuron for pattern p and O pj is the actual output value of the j-th output neuron for pattern p. e backpropagation algorithm is based on the gradient descent optimization method [20][21][22]. By determining the derivative of error, we can update the network parameters. e output neuron error signal d pj is determined as follows: To calculate the error signal for each hidden neuron, the equations below can be used. e hidden neuron error signal δ pj is given by where δ pk is the error signal of a postsynaptic neuron k and W kj is the weight of the connection from j-th hidden neuron to the k-th postsynaptic neuron [21]. To calculate and apply weight adjustments, the equations below can be used: where c is the learning rate and β is the momentum. Here,

Competitive Neural
Network. e competitive neural network is a simple neural network that consists of two Journal of Healthcare Engineering layers and uses an unsupervised learning algorithm for training. e inputs of the network are features, and the outputs are the classes. e input layer is fully connected to the output layer. Each connection between input and output layers is characterized by weight coefficients. In every epoch, the neurons in the output layer compete among themselves when input features are applied to the network input [23][24][25]. e competitive neural network ( Figure 3) relies fundamentally on the Hebbian learning rule. e distinction is the following: in competitive learning, output neurons have to compete among themselves to get activated, and only one neuron is activated at any time, as compared to Hebbian learning where more than one neuron can be activated or fired at any time.
ese networks use a "winner-takes-all" strategy, where only the weights connected to the winner neuron are updated in a particular epoch, while other weights are not updated [24,25].
is learning process has the resultant effect of increasingly strengthening the correlation between the inputs and the corresponding winner neurons during learning.
When the patterns are supplied to the input layer, the neurons in the output layer compete among themselves to be activated [23][24][25]. e rules used to update the weights of these networks are given below. For output winner neuron k, we have where η is the learning rate, x j is the j-th input pattern, w kj is the weight connection between j-th and k-th neurons, and Δw kj is the computed weight change.
If k-th output neuron loses at epoch p, then Weight update for k-th neuron at epoch (p + 1) is achieved using the following equation:

Convolutional Neural Networks.
Deep learning is a machine learning method inspired by the deep structure of a mammal brain [26]. e deep structures are characterized by multiple hidden layers allowing the abstraction of the different levels of the features. In 2006, Hinton et al. developed a new algorithm to train the neuron layers of deep architecture, which they called greedy layerwise training [12]. is learning algorithm is seen as an unsupervised single layer greedily training where a deep network is trained layer by layer. Because this method became more effective, it has been started to be used for training many deep networks. One of the most powerful deep networks is the convolutional neural network that can include multiple hidden layers performing convolution and subsampling in order to extract low to high levels of features of the input data [27][28][29][30]. is network has shown a great efficiency in different areas, particularly, in computer vision [28], biological computation [29], fingerprint enhancement [30], and so on. Basically, this type of networks consists of three layers: convolution layers, subsampling or pooling layers, and full connection layers. Figure 4 shows a typical architecture of a convolutional neural network (CNN). Each type of layer is explained briefly in the following sections.

Convolution Layer.
In this layer, an input image of size R * C is convolved with a kernel (filter) of size a * a as shown in Figure 4. Each block of the input matrix is independently convolved with the kernel and generated a pixel in the output. e result of the convolution of the input image and kernel is used to generate n output image features. Generally, a kernel of the convolution matrix is referred to as a filter while the output image features obtained by convolving kernel and the input images are referred to as feature maps of size i * i.
CNN can include multiple convolutional layers, the inputs and outputs of next convolutional layers are the feature vector. ere is a bunch of n filters in each convolution layer. ese filters are convolved with the input, and the depth of the generated feature maps (n * ) is equivalent to the number of filters applied in the convolution operation. Note that each filter map is considered as a specific feature at a certain location of the input image [31][32][33].
e output of the l-th convolution layer, denoted as C (l) j , consists of feature maps. It is computed as where B (l) i is the bias matrix and K (l−1) i,j is convolution filter or kernel of size a * a that connects the j-th feature map in layer (l − 1) with the i-th feature map in the same layer. e output C (l) i layer consists of feature maps. In (10), the first convolutional layer C (l−1) i is input space, that is, e kernel generates feature map. After the convolution layer, the activation function can be applied for nonlinear transformation of the outputs of the convolutional layer: where Y (l) i is the output of the activation function and C (l) i is the input that it receives.
Typically used activation functions are sigmoid, tanh, and rectified linear units (ReLUs). In this paper, ReLUs which is denoted as is function is popularly used in deep learning models due to its help in reducing the interaction and nonlinear effects. ReLU converts the output to 0 if it receives a negative input, while it returns the same input value if it is positive. e advantage of this activation function over other functions is the faster training because of the error derivative, which becomes very small in the saturating region; therefore, the updates of the weights almost vanish. is is called the vanishing gradient problem.

Subsampling Layer.
e main aim of this layer is to spatially reduce the dimensionality of the features maps extracted from the previous convolution layer. To do so, a mask of size b * b is selected as shown in Figure 4, and the subsampling operation between the mask and the feature maps is performed. Many subsampling methods were proposed such as averaging pooling, sum pooling, and maximum pooling. e most commonly used pooling is the max pooling, where the maximum value of each block is the corresponding pixel value of the output image. Note that a subsampling layer helps the convolution layer to tolerate rotation and translation among the input images.

Full Connection.
e final layer of a CNN is a traditional feedforward network with one or more hidden layers. e output layer uses Softmax activation function: where w (l) i,j are the weights that should be tuned by the complete fully connected layer in order to form the representation of each class and f is the transfer function which represents the nonlinearity. Note that the nonlinearity in the fully connected layer is built within its neurons, not in separate layers as in convolutions and pooling layers.
After finding output signals, the training of the CNN is started. Training is performed using the stochastic gradient descent algorithm [34]. e algorithm estimates the gradients using a single randomly picked example from the training set. As a result of training, the parameters of CNN are determined.

Simulations
In this section, the simulations of the above networks are described. Note that the BPNN and CpNN networks are trained using 620 out of 1000 images, and the rest is used for testing. e CNN is trained using 70% of 120,120 available data, and 30% are used for testing. e input images are of size 32 × 32 for the sake of reducing computation cost.

Simulation of Chest Diseases Using BPNN.
Backpropagation neural network is based on a supervised learning algorithm, and they are very important and useful in pattern recognition problems [17,19,35]. e training of backpropagation networks includes the update of parameters in order to produce good classification results. Hence, in this paper, several experiments were conducted such that significantly accurate results can be obtained. For this aim, different number of hidden neurons, learning rate, and momentum are applied for obtaining better classification result. e architecture of the designed backpropagation neural network for the image of size 32 × 32 is described in Figure 5.
Since the backpropagation network uses a supervised learning algorithm, it is, therefore, necessary that the training data could be labelled. e used training data have been labelled according to the 12 classes presented in the classification task. In training stage, different number of hidden neurons, learning rate, and momentum were experimented Journal of Healthcare Engineering for obtaining better classification result. Table 1 presents the used architectures of BPNN, denoted as BPNN1, BPNN2, BPPN3, and BPNN4. Since there are 12 classes, 12 neurons have been used in the output layer of the network. e learning curve of BPNN2, which is the network with lowest achieved MSE (Table 1), is shown in Figure 6.

Simulation of Chest Diseases Using Competitive Neural Network (CpNN).
In this section, a competitive neural network using an unsupervised learning algorithm is used for classification of chest diseases. Leveraging on the fact that such networks do not need manual labelling of training data, they save time for the labelling process. Figure 7 shows the architecture of the network used in this paper. e competitive neural network has two layers designated for the input and output signals. e images are fed as input to the network, and the output neurons learn unique attributes or patterns of the images that differentiates one class from the others. e number of input neurons is 1024 (input image pixels), and the number of output neurons is 12 (number of output classes). e training parameters of the networks used in this paper are given in Table 2. ese competitive networks are trained using 32 × 32 pixels images. Since the network uses an unsupervised learning algorithm, there is no mean squared error goal to minimize.

Simulation of Chest Diseases Using Convolutional Neural
Networks. In this section, the design of the convolutional neural network employed for the chest X-ray medical images are presented. e suitable values of learning parameters of the network are determined through experiments. Note that out of the obtained 120,120 images, 70% are used for training and 30% are used for validating the network. e input images of the network are of size 32 × 32. e outputs are 12 classes. e proposed CNN includes 3 hidden layers. Table 3 shows the structure of the CNN and its learning parameters. Here, "Conv" represents a convolution layer, "BN" represents batch normalization, "FM" represents feature maps, and "FC" represents fully connected layer. Note that the filters of size 3 × 3 are used in all convolution operations with padding, while all pooling operations are performed using max pooling windows of size 2 × 2.
During simulation, the size of available training data and system specifications for constructing a model were taken into consideration. us, dropout training schemes and a batch normalization were employed, and the improvement in model generalization was achieved [24,25]. Note that a minibatch     optimization of size 100 via stochastic gradient descent is employed [34] for training. In addition, a learning rate of 0.001 and 40,000 iterations are used for training of the CNN model. e extraction of different levels of features of chest X-ray images in both convolution and pooling layer 1 is given in Figure 8. Figure 8(a) shows the learned filters (or kernels) at convolution layer 1 and Figure 8(b) at the pooling layer of the CNN.

Discussion of Results
e overall performances of the BPNN and CpNN are tested using 380 images. Table 4 shows the recognition rates obtained for the backpropagation networks using 32 × 32 pixels as the input image size.
It can be seen from the table that all the trained backpropagation neural networks (BPNNs) have different training and testing performances. BPNN2 achieved the highest recognition rate for both training and testing datasets compared to the other networks, that is, 99.19% and 89.57%, respectively.
Competitive neural networks that use an unsupervised learning algorithm were also trained and tested using the same images. ese networks are faster to train, considering that they have no desired outputs and therefore no error computations and back-pass of error gradients for weights update. e simulation results of the competitive networks using different learning rate and the number of maximum epochs are given in Table 5.
From the table, it can be seen that CpNN2 has the highest recognition rates for both training and test data. Furthermore, it can be seen that CpNN3 has a higher recognition rate than CpNN2 for the training data. Its performance on the test data is lower than CpNN2; that is, it can be stated that CpNN3 has lower generalization power as compared to CpNN2.
Furthermore, the convolutional neural network (CNN) designed for this classification task is also tested using 30% of the available chest X-ray images, and the results are shown in Table 6.
Overall, the performance of the three employed networks in terms of recognition rate, training time, and reached mean square error (MSE) is described in Table 7.
As shown in Tables 4 and 5, the networks behave differently during training and testing, and this is obviously due to the difference in the structures, working principles, and training algorithms of the three employed networks. Also in Table 7, the CNN has achieved the highest recognition rate for training and testing data, compared to other employed   Moreover, it can be seen that the three networks have achieved a low MSE, whereas the CNN scored the lowest (0.0013). Furthermore, it is noted that the time needed for the CNN to converge is roughly higher than that of BPNN2 and CpNN2. Consequently, this is due to the depth of the structure of a convolutional neural network, which normally requires a long time, in particular, when the number of inputs is large. Nonetheless, this deep structure is the main factor in achieving a higher recognition rate compared to other networks such as BPNN and CpNN. Lastly, Figure 9 shows an example of the CNN testing paradigm.
e networks first take a chest X-ray as an input and output the probabilities of the classes.
A comparison of the developed networks with some earlier works is shown in Table 8. Firstly, it is seen that shallow (traditional) networks (BPNN and CpNN) could not achieve high recognition rates compared to other deep networks, which is obviously due to their deficiency in extracting the important features from input images. Moreover, it is noticed that the proposed deep convolutional neural network (CNN) achieved a higher recognition rate than other earlier research work such as CNN with GIST features [36]. e transfer learning-based networks are also used for chest X-rays classification such as VGG16 [37] and VGG19 [37]. ey have gained lower generalization capabilities compared to the proposed network. ese pretrained models [37] have very powerful features extraction capabilities since they were trained using a huge database, Image Net [38]. Note that, we compared the researches that provided explicitly achieved accuracies. e obtained results can show that applying deep CNNs to the problem of chest X-ray diseases is promising in a way that similar or confusing diseases could be correctly classified with good recognition rates.

Conclusion
In this paper, convolutional neural network (CNN) is designed for diagnosis of chest diseases. For comparative analysis, backpropagation neural network (BPNN) and      Journal of Healthcare Engineering competitive neural network (CpNN) are carried out for the classification of the chest X-ray diseases. e designed CNN, BPNN, and CpNN were trained and tested using the chest X-ray images containing different diseases. Several experiments were carried out through training of these networks using different learning parameters and a number of iterations. In both backpropagation and competitive networks, it was observed that the input image of size 32 × 32 pixels showed good performance and achieved high recognition rates. Based on recognition rates, the backpropagation networks outperformed the competitive networks. Moreover, the competitive networks did not require manual labelling of training data as it was carried out for the backpropagation network. Furthermore, a CNN was also trained and tested using a larger dataset which was also used for training and testing of BPNN and CpNN. After convergence, it was noticed that the CNN was capable of gaining a better generalization power than that achieved by BPNN and CpNN, although required computation time and the number of iterations were roughly higher. is outperformance is mainly due to the deep structure of CNN that uses the power of extracting different level features, which resulted in a better generalization capability. e simulation result of proposed CNN is also compared with other deep CNN models such as GIST, VGG16, and VGG19. ese networks have lower generalization capabilities and accuracies compared to the proposed network. e obtained results have demonstrated the high recognition rates of the proposed CNN.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.