Computational Classification and Diagnosis of Alcoholic Liver Diseases Using General Regression Neural Network

Alcoholic liver diseases cause high incidence of death worldwide. However, computational diagnosis and classification of alcoholic hepatitis have not yet been established. In this study, we used general regression neural network (GRNN) model with a highperformance classification ability to diagnose and classify alcohol hepatitis. We used tenfold cross-validation to demonstrate the error rate of networks. The results show an accuracy of 80.91% of the back diagnosis in 110 patients and the accuracy of 81.82% of predicting-diagnosis in 11 patients referring to the clinical diagnosis made by a group of experts. This study suggested that using the liver function tests as the input layer variables of GRNNmodel could accurately diagnose and classify alcoholic liver diseases.


Introduction
There are many types of hepatitis diseases including diseases caused by viruses, toxic chemicals, alcohol consumption, and so forth.Long-term excessive drinking can cause extensive necrosis within the liver or lead to even liver failure.Fatty liver is an early sign of alcoholic liver damage and then it develops into alcoholic hepatitis, alcoholic liver fibrosis, and finally alcoholic liver cirrhosis.Statistics have consistently shown that cirrhosis and its sequelae are responsible for about 2% of all causes of death in the USA, among them 80% to 90% of cirrhosis was alcohol-related [1].In China, over 20% of alcoholics have alcoholic liver disease (ALD) [2].An appropriate treatment can save a significant number of lives if patients were promptly diagnosed in the early stages.Unfortunately, accurate diagnosis of ALD in the early stages has never been an easy task.Importantly, different pathological stages of ALD require different therapeutic strategies [3].Therefore, accurate diagnosis and classification of ALD are crucial for clinical management.However, many factors make diagnosis and classification of ALD difficult, often delaying the correct diagnosis.For example, ALD is currently classified by clinic symptoms, biochemical manifestations, and pathologic manifestations [2,3].However, clinical symptoms and biochemical alterations are not disease specific.
Pathological tests are usually not acceptable for patients.Therefore, it is important to develop a reliable and powerful medical diagnostic system to classify disease stages, reduce diagnosis time, and improve diagnosis accuracy.However, medical diagnosis by nature is a complex cognitive process.Currently, application of computational methods, such as neural networks, has shown great potential in the diagnosis of liver diseases.However, application of neural networks to diagnose and classify ALD has not yet been established.
Different neural networks have been applied to liver disease diagnosis except for ALD.For instance, Nakano et al. investigated the ability of neural networks to discriminate between mild and severe chronic active hepatitis and demonstrated that neural networks are capable of predicting the diagnosis for 78% of the cross-validation group [4].Kim et al. adopted a prediction model including logistic regression, a decision tree, and a neural network to analyze the risk factors of liver disease and obtained 72.09% of accuracy [5].Bascil and Temurtas adopted Levenberg-Marquardt algorithm to study the hepatitis disease diagnosis and obtained a classification accuracy of 91.87% [6].Lin used an intelligent model of integrated classification and regression (CART) and casebased reasoning (CBR) techniques for the diagnosis of liver diseases and the CBR diagnostic accuracy rate is 90.00% [7].Lin and Chuang established an intelligent liver diagnosis model which integrates artificial neural networks, analytic hierarchy process, and case-based reasoning methods to examine if patients suffer from liver disease and to determine the types of the liver disease [8].One of the numerous successful examples goes to Chou et al. [9], which adopted a prediction model including multivariate adaptive regression splines (MARS) and ANN (artificial neural network) to handle breast cancer diagnostic problems.The accuracy of the model was as high as 98%.However, we have not found any study that considers a system to diagnose and classify the alcoholic liver diseases.
GRNN is one of the most popular neural networks and was adopted because of its strengths [6,10,11]: (1) parallel structure and better generalization make GRNN super than almost any other neural network; (2) GRNN is a noniterative neural network structure and its data flows from input layer to the output layer.Therefore, there is no iterative learning present such as in the case of multilayer perceptrons (MLP) making them faster to some extent; (3) GRNN works quite accurately with light datasets and performs well on noisy data compared to backpropagation neural networks (BPNN); (4) GRNN is also very unswerving and as the size of the dataset increases the error approaches towards zero.These are the reasons why GRNN is being used in medical classification, predictive and diagnostics problems.In this study, we used general regression neural network (GRNN) pattern recognition to diagnose and classify ALD.

GRNN Network Structure
2.1.1.GRNN Theory.Generalized regression neural network theory is based on nonlinear regression analysis, in which the greatest probability value () is actually calculated through regression analysis of dependent variable  on independent variable .Assume that (, ) represents the known joint continuous probability density function of a vector random variable, , and a scalar random variable, .The regression of  on  is given as follows: Ŷ is the predicted value under input  [11].The density (, ) can be estimated from a training set by using the consistent estimator proposed by Parzen [12]: Here, the probability estimator f(, ) is based upon sample values   and   of random variables  and , where

Input layer
Pattern layer

Summation layer
Output layer is the number of sample observations,  is the dimension of the vector variable , and  is the Gaussian width factor (here called smoothing factor) as shown in (2).In particular, using Ŷ() instead of (, ) into (1) and interchanging the order of integration and summation yield the desired conditional mean (3), designed: Due to ∫ ∞ −∞  − 2 d = 0, the available network output Ŷ() is The estimated Ŷ() can be visualized as a weighted average of all the observed value,   , where each observed value is weighted exponentially according to its Euclidean distance from .When the smoothing parameter  is made large, the estimated density is forced to be smooth.As  becomes very large, Ŷ() assumes the value of   associated with the observation closest to .For intermediate value of , all values of   are taken into account, but those corresponding to points closer to  are given heavier weight.

GRNN Network Structure.
A GRNN is a four-layer feedforward neural network based on nonlinear regression theory: input layer, pattern layer, summation layer, and output layer [13] as shown in Figure 1.The input layer merely transports the input variables (data attributes) to the pattern layer.The number of input variables in input layer depends on the total number of the observation parameters.In this study, the diagnosis and classification of alcoholic liver disease were established by generalized regression neural network model, in which the input layer contains seven variables.The number of neurons in the pattern layer is equal to the number of training samples, , while each neuron corresponds to different training samples [10,11].The summation layer includes two units: the first unit sums all the outputs of the pattern layer and evaluates the numerator of ( 4), and the second unit evaluates the denominator of (4).Each unit in the pattern layer is connected to each of the two units in the summation layer.The output unit merely divides the two outputs of the summation layer to get the predicted value.In this study the output layer contains three parameters: alcoholic cirrhosis with decompensated liver function, alcoholic cirrhosis with compensated liver function, and alcohol hepatitis.

Patient Data.
The data for this study was collected from 200 patients who showed symptoms of alcoholic liver diseases.All patients were male, aged 20 to 55 years with average of 43 years.These patients had a disease duration of 5 to 30 years.The dataset contained clinic symptoms, biochemical manifestations, and pathologic manifestations of the 121 patients.Based on several different diagnostic criteria for alcoholic liver disease, several experts diagnosed the 121 patients.34 cases were clinically diagnosed for alcoholic hepatitis, 30 cases were clinically diagnosed for alcoholic liver cirrhosis with compensated liver function, and 57 cases were clinically diagnosed for alcoholic liver cirrhosis with decompensated liver function.Seven liver function tests, such as gamma glutamyl transpeptidase (GGT), total bile acid (TBA), alkaline phosphatase (ALP), total bilirubin (TBIL), alanine aminotransferase (ALT), aspartate aminotransferase (AST), and aspartate/alanine (AST/ALT), were used for computational diagnosis and classification using the GRNN model.

Method.
In this study, the alcoholic liver diseases were diagnosed and classified with GRNN model.The input layer contained seven variables: GGT, TBA, ALP, TBIL, ALT, AST, and AST/ALT, which correspond to  1 ,  2 ,  3 ,  4 ,  5 ,  6 , and  7 in the GRNN, respectively.The output layer had three parameters: alcoholic cirrhosis with decompensated liver function, alcoholic liver cirrhosis with compensated liver function, and alcohol hepatitis, represented using (1, 0, 0) (0, 1, 0), and (0, 0, 1), respectively.Among the 121 samples, 110 samples were selected as training samples and 11 samples were diagnosing samples to test the rationality of the model (Table 1).MATLAB R2008a is used to implement the GRNN methods.There are five procedures to establish the GRNN model to diagnose and classify alcohol hepatitis.Firstly, before training, the data is normalized.The inputs and the targets always fall within a specified range as following: The where min  and max  are the minimum and maximum values in a pattern.Secondly, -fold cross-validation and receiver operating characteristic curves are by far the most popular error-estimation approaches in machine learning [14,15].In this paper, we used tenfold cross-validation to demonstrate the error rate of networks.
Thirdly, the smoothing parameter  is selected according the minimum of the average of MSEs (mean square error) for  separate tests (-fold cross-validation).
Fourthly, the GRNN model to diagnose and classify alcohol hepatitis is established according to the selected smoothing parameter .Finally, the established GRNN model is used to diagnose and classify alcohol hepatitis.

Results
A -folding scheme with  = 10 is applied (using -folding means dividing the dataset to  sets and use  − 1 of them for training and one for testing.This is repeated  times and then the average of the result is taken).The training procedure for each dataset is repeated 10 times, each time with 90% of the patterns as training and 10% for testing.All the reported results are obtained by averaging the outcomes of the 10 separate tests.The mean square error (MSE) is used to evaluate the GRNN model performance.
In this study, the smoothing parameter  is selected according the minimum of the average of MSEs for 10 separate tests (A -folding scheme with  = 10).Figure 2 shows the relationship between  and average of MSEs for 10 separate tests.We find that the average of MSEs for 10 separate tests is 0.209789 at the lowest points when the smoothing parameter  is 0.86.Therefore, the smoothing parameter 0.86 is used to establish GRNN model.The training procedure for each dataset is repeated 10 times when each time with 90% from the patterns as training samples and 10% from the patterns used as testing samples.We find that the GRNN models have a good performance (MSE in the range (0.16992, 0.27269)).
Using the established GRNN model, the 110 training samples were back diagnosed and given an accuracy rate of 80.91% (Table 2).11 diagnosing samples were applied for diagnosis and given an accuracy rate of 81.82% (Table 3).Furthermore, Fisher's linear discriminant is also used to diagnose and classify alcohol hepatitis.The coefficients of Fisher's linear discriminant functions were listed in Table 4.
And the classify results were listed in Tables 2 and 3.It shows the 110 training samples were back diagnosed and given an accuracy rate of 54.55% (Table 2).11 diagnosing samples were applied for diagnosis and given an accuracy rate of 72.72% (Table 3).The GRNN diagnosis showed high consistency with clinical diagnosis and classification made by experts.These results showed the excellent modeling abilities of our GRNN model in alcohol hepatitis diagnosis and classification.

Discussions
It is important to classify alcoholic liver diseases because different disease stages, such as active hepatitis and cirrhosis, require significantly different therapeutic strategies [3].This paper presents a study on diagnosis and classification of alcoholic hepatitis by general regression neural network (GRNN) pattern recognition.Results obtained from GRNN were compared to clinical diagnosis made by a group of experts.The accuracy of back diagnosis of the 110 training samples was 80.91% and the accuracy of the predicting-diagnosis of the 11 samples was 81.82% in accordance with clinical diagnosis made by experts.It was seen that GRNN could be successfully used to help diagnose and classify alcoholic hepatitis.
The architecture of the neural network used in this study is the multilayered feedforward network architecture with 7 input variables and 3 output indicators.The number of input variables was determined by 7 liver function tests and the number of output indicators is represented as the classification of the disease development stages.Using this GRNN model, alcoholic liver disease could be accurately staged.Notably, the changes of these liver function tests are not disease specific.They can only reflect the degree of liver injury and substitute disease development stages.This is because any factors, such as viruses and toxicities, could induce liver damages resulting in similar changes in liver function tests.Therefore, these variables could not be used to distinguish hepatitis caused by different etiologies.In this study we demonstrated that using these variables for the input layer of GRNN model successfully classified the alcoholic liver disease.This is the first report using GRNN model with liver function tests for the classification and diagnosis of alcoholic liver disease.Due to the similarity in liver function damage, we proposed that using the liver function tests and the GRNN model can also classify disease stages for other kinds of hepatitis.

Summary
We developed a general regression neural network model which accurately diagnosed and classified alcoholic hepatitis.Seven liver function tests were used for input variables while three disease development stages were used for output  indicators.The clinical diagnosis was confirmed by a group of experts.The accuracy of the back diagnosis in the 110 training samples and the accuracy of predicting-diagnosis in the 11 samples were evaluated.Using GRNN, ALD could be accurately diagnosed and classified and the accuracy could be around 80.91%-81.82%.GRNN also reduces diagnosis time and improves the diagnosis accuracy which will benefit young physicians.However, it should be noted that the current international classification of alcoholic liver disease still follows no single standard, and this paper is limited because of a small size of cases.Also, we only consider three categories for the classification and more categories need further study.

Figure 1 :
Figure 1: Generalized regression neural network model diagram for diagnosis of alcoholic liver disease.

Table 1 :
The seven indicators of liver function tests and clinical diagnosis.

Table 2 :
Back diagnosis results of alcoholic liver disease by generalized regression neural network and Fisher's discriminant method.

Table 2 :
Continued.Note: Bold indicates diagnosis; 1, 2 and 3 represent alcoholic cirrhosis with decompensated liver function, alcoholic liver cirrhosis with compensated liver function, and alcohol hepatitis, respectively; similarly hereinafter.

Table 3 :
Diagnostic results of the tested samples.

Table 4 :
Coefficients of Fisher's linear discriminant functions.Note: 1st, 2nd, and 3rd express the first, second, and third functions of Fisher's linear discriminant method, respectively.