Comparison of Diagnosis Accuracy between a Backpropagation Artificial Neural Network Model and Linear Regression in Digestive Disease Patients: an Empirical Research

Introduction A Noninvasive diagnosis model for digestive diseases is the vital issue for the current clinical research. Our systematic review is aimed at demonstrating diagnosis accuracy between the BP-ANN algorithm and linear regression in digestive disease patients, including their activation function and data structure. Methods We reported the systematic review according to the PRISMA guidelines. We searched related articles from seven electronic scholarly databases for comparison of the diagnosis accuracy focusing on BP-ANN and linear regression. The characteristics, patient number, input/output marker, diagnosis accuracy, and results/conclusions related to comparison were extracted independently based on inclusion criteria. Results Nine articles met all the criteria and were enrolled in our review. Of those enrolled articles, the publishing year ranged from 1991 to 2017. The sample size ranged from 42 to 3222 digestive disease patients, and all of the patients showed comparable biomarkers between the BP-ANN algorithm and linear regression. According to our study, 8 literature demonstrated that the BP-ANN model is superior to linear regression in predicting the disease outcome based on AUROC results. One literature reported linear regression to be superior to BP-ANN for the early diagnosis of colorectal cancer. Conclusion The BP-ANN algorithm and linear regression both had high capacity in fitting the diagnostic model and BP-ANN displayed more prediction accuracy for the noninvasive diagnosis model of digestive diseases. We compared the activation functions and data structure between BP-ANN and linear regression for fitting the diagnosis model, and the data suggested that BP-ANN was a comprehensive recommendation algorithm.


Introduction
Digestive disease involves the tube from the esophagus to the stomach and intestines as well as various organs connected to this tube such as the liver and pancreas, which are very complicated medical situations involving multiple-organ systems and biochemistry, immunology, and pathology mechanisms [1]. Based on the 2015 China Health Statistics Annuals, the twoweek prevalence and chronic disease prevalence of digestive diseases were 15‰ and 24.9‰, respectively [2]. American Centers for Disease Control and Prevention (CDC) reported almost 60 to 70 million people being affected by all kinds of digestive dis-eases in 2001, and 9.3% (18.9 million) of noninstitutionalized adults were diagnosed with digestive disorders annually (Summary Health Statistics for US Adults, 2001, NCHS, CDC) [3]. Economists estimated $912,443,000 being spent in public hospitals on digestive system diseases in Australia from 2001 to 2002 (AIHW National Hospital Morbidity Database, Australia's Health 2004, AIHW) [4] (https://www.rightdiagnosis .com/d/digest/stats.htm). The five-year survival of most digestive diseases is more than 80% if patients could be diagnosed or treated at an early stage [5].
In a previous study, many researchers extracted noninvasive biomarkers for constructing diagnosis or a predictive model at an early disease stage in digestive disease patients, especially in tumors that have shown a great value for medical practice due to the rapid test and convenient sampling of these biomarkers [6]. WHO (World Health Organization) guidelines recommended that APRI and FIB-4 could be applied in HBV-reduced fibrosis assessment instead of invasive examinations in limited medical resource countries [7]. And the Lok score performed well in diagnosing portal hypertension using transient elastography (TE) [8]. Gurung et al. used AST/ALT (the AST-to-ALT ratio) to elevate in the alcoholic liver disease pattern in patients with hepatitis C who progressed to liver cirrhosis [9]. Lin et al. constructed the noninvasive diagnosis of nonalcoholic fatty liver disease and quantification of liver fat using a new quantitative ultrasound technique [10]. Characteristic information, hematological examination, biochemical detection, endoscopic ultrasonography, and pathology score were used in mathematical modeling to predict the disease outcome accurately [11][12][13][14]. A well-designed model can estimate the complicated, undefined relationship between risk factors as input biomarkers and the probability of occurrence of digestive disease as the output variable [15].
In the most common multivariate statistical model for diagnosis or prediction, biomarkers were extracted as dependent variables in order to derive the linear influence relationship between biomarkers and response variables [16][17][18]. However, these high-dimensional data collected from different visits were complicated for linear regression and collinearity between correlated biomarkers could not be detected or solved in the linear model [19]. A sophisticated artificial neural network (ANN) algorithm, the backpropagation ANN model, was able to construct vague and nonlinear connections between input biomarkers and the target biomarker through the simulation of complicated processing neurons. The correlation between input variables and target data could be learned by the ANN after training several times [20,21]. The ANN model mimicked signal transmission in human brains through a set of processing units which consist of neurons, and these neurons were interconnected via the weight connections which make signal transmission in parallel and series [22][23][24]. The most representative construction of ANN consisted of three layers [25]. For clinicians and statisticians, the input layer represented the observed biomarkers of serum biochemical and auxiliary examinations [26,27]. The output layer was the indicator of clinical outcomes. The processing of ANN was driven by input data, and the decision making was achieved with minimum adjustments by human [28][29][30]. In the modeling process, training data were analyzed and then the decision was made through output neurons when new input variables were put in [31]. In most of the ANN models, the backpropagation (BP) network was the commonly used solution in dealing with the nonlinear relationship between input variables and output variables by constantly adapting the connection weight value between neurons and the error threshold in each layer to make the output variables approximately towards the expected outcome [32][33][34][35][36]. The BP algorithm was based on error gradient descent (Figure 1), which was aimed at finding the minimum error by adjusting weights of connections between neurons in the direction of lowest error [37]. The error was estimated from the output variable and backcalculated to converge to the optimum solutions [38].
In linear regression, we assume that the input medical biomarkers and the clinical outcomes would be connected through a nonlinear link function. The BP algorithm demonstrates the modification of weights between synaptic neurons during learning, and the connection would be changed according to an error term computed for neurons throughout each layer. Each computed weight is corrected by the parameters of the activity of the neuron and the error term of the neuron it projects to. We could establish the complicated mathematical model between the input medical biomarkers and each matching output variables. In view of these benefits and limitations, the algorithm with a more accurate calculation and more concise demonstration would be the optimum solution for medical decision. However, the appropriate choice either linear regression or BP-ANN has not been reported in the recent publications and whether the BP-ANN algorithm is always more accurate versus linear regression is controversial. Therefore, our research will try to discuss the following questions based on a systematic review: (1) The characteristics of current studies using the BP-ANN algorithm and linear regression (2) The correlations between the BP-ANN algorithm and linear regression Δw jk = -∂E k ∂ k Figure 1: The gradient descent of error E k was calculated for the updated parameters in the BP-ANN algorithm process.
2 Computational and Mathematical Methods in Medicine structure data, linear regression could be interpreted as a one-layer perceptron neural structure model, which included input variables and output variables ( Figure 2). The input layer transmitted the input clinical biomarkers directly to the output variable through sigmoid function, which would calculate the weight of each input variable in linear regression, that is equal to the regression coefficient using the least square method [39,40]. The activation function is a very important issue because it is the direct bridge between input variables and the clinical outcome, which greatly affect the prediction accuracy. According to the activation function, nonlinearity correlation would be transformed to linear regression, which is an effective solution for calculating the parameters in the hidden layer [41]. Different types of activation functions may lead to different neural network parameters, and the commonly used activation functions include the sigmoid function, tanh function, and Gaussian function [42][43][44].
The input variables were fitted by linear regression before the activation function was applied, as shown in Figure 2, where Figures 2(a) and 2(b) represented the intercept and the coefficient, respectively. In the sigmoid function, variables transformed to the "s" curve and its value is between 0 and 1. In the tanh function, the value transformed between −1 and 1. In the Gaussian function, ω i and ρ i were the center and length, respectively. As seen in equation (1), the activation functions were complex for researchers to determine the optimal function because of the change of parameters; therefore, the machine learning algorithm is used to help optimize and fit an activation function for the neural network [45].

Error Propagation in BP-ANN and Linear Regression.
The linear regression for principal component analysis fitted the straight line which crosses the hidden layer in the neural network, and the next process was to generalize this straight line to a curve. Based on the principal component analysis, the BP-ANN model could fix nonlinear principal component data and the algorithm was backpropagation for mean square error (MSE) and composed of a gradient descent method which was widely used in numerical minimization of a preestablished cost function [38,46]. According to the gradient trends, the BP model could update parameters between hidden layers and the input layer [47]. Combined with the BP network structure, the process of error propagation started from the output layer as follows: where E is the total error and E k is the error for the k th output neuron, which is the deviation between the actual output o k and expected output d k of the k th output neuron. If the above error definition formula is extended to hidden layer neurons, where y j represents the output of the j th hidden layer neuron and ω jk represents the weight of the connection between the j th hidden layer neuron and the k th output neuron, then where f is the activation function.
If the function extended to input neurons, where x n represents the output of the n th input neuron and v nj represents the weight of the connections between the n th input neuron and the j th hidden layer neuron, then According to the above formula, the total error E of the network is the function of every connection weight value ω jk , v nj , so the error E can be reduced by adjusting the weight value of connections.
Based on the total error, the optimal weight could be solved by calculating partial derivatives.
One-layer perceptron h(Y): activation function

Computational and Mathematical Methods in Medicine
If we do β k = 〠ω jk y j , then, equation (5) could be written as follows: After each iteration, the weights were adjusted by adding the change Δω jk or Δv nj to the original weights to minimize the total error. A parameter "η", the learning rate, was used to define the weight change along with the gradient descent algorithm. Based on the above formula, the weight change is negative when the gradient is positive and vice versa, which would maintain the solutions towards the least error [39,[48][49][50][51]. The methodological quality of enrolled cohort studies was assessed with the Newcastle-Ottawa quality assessment scale, an established composite score from 3 items: (1) representativeness of exposed and nonexposed patients and the ascertainment of exposure, (2) comparability of cohorts on the basis of the study, and (3) assessment of the outcome and the follow-up was long enough for the outcome to occur.

Characteristics of Included Studies.
A total of 319 articles were retrieved from the search strategy, and 43 of them were removed based on screening of titles and abstracts. Two hundred and seventy-six articles were assessed for eligibility and 267 articles were excluded. Nine articles met all the criteria and were enrolled in our review. The flowchart of literature search and the selection process was shown in Figure 3. Of those enrolled articles, the publish year ranged from 1991 to 2017. Seven articles were from China (containing Taiwan Province), one article from Austria, and 1 from Korea. The sample size ranged from 42 to 3222 digestive disease patients, and all of the patients showed comparable biomarkers between the BP-ANN algorithm and linear regression (Table 1).

Diagnosis Accuracy Comparison between the BP-ANN Algorithm and Linear Regression for Digestive Diseases.
According to our systematic review, 8 literature demonstrated that the BP-ANN model is superior to linear regression in predicting the disease outcome based on AUROC results (Table 1). Other researchers [63][64][65][66] identically demonstrated that the BP-ANN model had great abilities in information processing, high parallelism related to nonlinearity input variables, generalization, and the fault-tolerant capabilities as the nonparametric algorithm, which is widely used for classification, clustering, regression, and dimensionality reduction in several disease fields. The BP-ANN model was superior to linear regression because of its extraordinary processing ability for dealing with the hidden nonlinear   The BP-ANN models achieved higher prediction accuracy than linear regression. 5 Computational and Mathematical Methods in Medicine relationship between input markers and the clinical outcome, which might be ignored by linear regression and statisticians [67][68][69][70].
The self-learning and adaptive capacity of BP-ANN is one of the advantages compared with linear regression [71]. In the two-phase process of training neural networks, the reasonable rules between input and output variables could be automatically extracted through backpropagation self-learning, which would be remembered and then translated to the neuron weights in networks based on adaptive capacity [72][73][74][75][76]. The self-learning phases were commonly repeated for more than 10000 times; the weighted values and error threshold tended to be optimized until the model converges [77]. For most clinicians, a well estimated noninvasive diagnostic model or disease outcome classifier would help to make a correct decision instead of invasive detection. Based on these clinical demands, the BP-ANN training process had the ability to deal with unrecognized confounders for constructing the more accurate classifier, which could transfer training achievements to the unknown information between input variables and clinical outcomes [78][79][80].
Meanwhile, due to the current availability of big data in multicenter clinical research, enhanced computing power with graphics processing units, and new algorithms to fit neural networks, a computer-aided system could handle thousands of input variables as well as recognize hidden information and made more accurate decisions than fitting in linear regression [81].
The extrapolation performance of the BP-ANN model was a noteworthy development; from the mathematical perspective, BP-ANN could achieve an optimal method that locally searches the global solutions [82][83][84]. In this process, the weights between neurons were adjusted gradually according to the direction of local improvement, which may enable the algorithm and the weights into local extremum convergence [85]. In addition, BP-ANN was sensitive to initial weights in the network and different initialized networks tend to converge to the related local minimum and many researchers constructed different models after training [79,[86][87][88][89].
One literature reported linear regression to be superior to BP-ANN for the early diagnosis of colorectal cancer, in which the input variables were only serum tumor markers, including CEA, CA199, CA242, et al. [46]. Compared with the BP-ANN diagnosis model, logistic regression analysis showed better results, which was related to a multiparameter used within a certain range [90]. The success of the linear regression method in the development of the interpretative and diagnosis model algorithm required the representative and homogeneous of the data structure, elimination of redundancy input variables, appropriate ratio between the number of input variables and the output variables, and accomplishment of strict validation procedures [91][92][93].

Conclusions
The noninvasive diagnostic model is one of the vital issues for digestive clinicians and statisticians. Based on our systematic review, the BP-ANN algorithm and linear regression had high capacity in fitting the diagnostic model and BP-ANN displayed more prediction accuracy in most of enrolled studies. To elaborate the situations, we compared the activation functions and data structure between BP-ANN and linear regression for fitting the diagnosis model and the data suggested that BP-ANN was a comprehensive recommendation algorithm.
Based on the traditional three-layer neural networks, statisticians have developed a set of deep learning algorithms with different approaches [94]. Deep learning neural networks (DNN) have extended the depth of layers to four or more layers and performed better than traditional neural networks in diagnosis and prediction when the neural network construction become complex [95][96][97]. Hinton et al. used an unsupervised restricted Boltzmann machine with deep layers in neural architecture to overcome the limitations of local minimum and overfitting [98]. Also, the convolutional neural network (CNN) encompasses a multilayer of computational connections with minimized processing which performed well in recent research [99][100][101][102]. Therefore, further research may focus on the correlations between the traditional neural network and other machine learning algorithms, including deep learning neural networks, convolutional neural networks, and support vector machine method, to select the appropriate algorithm for digestive diseases.

ANN:
Artificial neural network AUROC: Area under the receiver operating characteristic curve BP: Backpropagation CAH: Chronic aggressive non-A, non-B hepatitis CART: Classification and regression tree CDC: American Centers for Disease Control and Prevention CPH: Chronic persistent non-A, non-B hepatitis

Data Availability
The data used to support this study could be found in listed references.

Disclosure
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Conflicts of Interest
The authors declare no conflict of interest.

Authors' Contributions
All authors have contributed significantly to the manuscript and all authors agree with the content. Conception/design was done by Xu Yang. Data collection and manuscript writing was done by Wei Wei and Xu Yang. Reviewing of the manuscript was done by Xu Yang.