Clinical Outcome Prediction in Aneurysmal Subarachnoid Hemorrhage Using Bayesian Neural Networks with Fuzzy Logic Inferences

Objective. The novel clinical prediction approach of Bayesian neural networks with fuzzy logic inferences is created and applied to derive prognostic decision rules in cerebral aneurysmal subarachnoid hemorrhage (aSAH). Methods. The approach of Bayesian neural networks with fuzzy logic inferences was applied to data from five trials of Tirilazad for aneurysmal subarachnoid hemorrhage (3551 patients). Results. Bayesian meta-analyses of observational studies on aSAH prognostic factors gave generalizable posterior distributions of population mean log odd ratios (ORs). Similar trends were noted in Bayesian and linear regression ORs. Significant outcome predictors include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age and mean arterial pressure. Heteroscedasticity was present in the nontransformed dataset. Artificial neural networks found nonlinear relationships with 11 hidden variables in 1 layer, using the multilayer perceptron model. Fuzzy logic decision rules (centroid defuzzification technique) denoted cut-off points for poor prognosis at greater than 2.5 clusters. Discussion. This aSAH prognostic system makes use of existing knowledge, recognizes unknown areas, incorporates one's clinical reasoning, and compensates for uncertainty in prognostication.


Introduction
Advances in biostatistics and computing in the past several decades have led to creation of different types of clinical outcome prediction models. Three of these include artificial neural networks, fuzzy logic and bayesian analysis [1][2][3]. These techniques complement classical or frequentist approaches, such as regression analysis.
Artificial neural networks mimic biological neural systems. In biological systems, incoming dendrites collect signals which are fed to the neuron. A signal summation is then sent as a spike of electrical current along an axon, with resultant discharge at the synapse, connecting it to other neurons. Examples of biological neural networks include the human brain and the human retina. Analogous to the biological system, artificial neural networks are made up of a group of input variables which converge on a number of nodes. Nodes are grouped in layers, with interconnection links among themselves. Hidden or latent variables can exist in one or two layers. After processing from different activation functions, output signals are then sent onto output nodes in the network. artificial neural networks assume all or none logic, that is, subjects are regarded as having or not having a diagnosis. Nodes in the neural nets are connected with each other via connection links. Each of these links has an associated weight and activation function. Neural networks are intelligent systems that can learn and change behaviour by themselves as they gain experience. In addition, they also take into account unobservable variables that the researcher is not aware of while designing the neural net. Assuming a basic artificial neural network with inputs, ℎ ( ) hidden or latent variables (in 1 layer), and ( ) an outputs, using a multilayer perceptron model as illustrated in Figure 1, output is equal to the summation of input to hidden layer, as well as hidden layer to output layer.
If the activation function is the nonlinear hyperbolic tangent function, then, where hidden unit = hyperbolic tangent function × (bias term + sum of (weights from input unit to hidden unit × input unit i)), and, output = ( ) = + ∑V ℎ ( ) , where output = ( ) = bias term + sum of (weights on connection from hidden unit to output unit × hidden unit j). Bayesian analysis allows the researcher to make use of existing states of knowledge before incorporation of new data. Simply put, it reflects the fact that knowledge is cumulative. Here, existing knowledge is expressed in the form of distributions (such as the normal bell-shaped distribution).
The "prior" distribution is then combined with its likelihood of occurrence, forming a posterior probability. The end result (or posterior probability) represents a revised or updated belief after taking new data into account. If there is a lack of existing knowledge on the subject of interest, the researcher can still rely on Bayesian techniques. In this case, vague or uninformed prior probabilities are used.
In the Bayesian approach to artificial neural networks [4], the goal is to find the predictive distribution for target values in the new test case/model, given inputs for that case and inputs/targets in training cases. Here, p(D) represents the probability of data according to a particular model. It is an integral, representing the summation of all possible parameter values weighted by the strength of belief (as assigned by the researcher) in these parameter values, or The probability of a new case/model given existing cases and associated parameters ( , incorporating both weights and biases) is expressed as In general, Bayesian neural networks can be expressed as: . . , ( , )) .

(5)
Posterior probability density is proportional to product of prior probability density and its associated likelihood. Likelihood, as explained above, is the product of probabilities of data given parameters (weights and biases) as Fuzzy logic is an extension of neural nets, but with the distinct advantage that it can assume functions with any value from 0 to 1, accounting for the entire spectrum of certainty of diagnoses and spectrum of severity of diseases studied. It registers a mild case of a certain disease, as it recognizes grey zones in diagnoses. Another strength of fuzzy logic lies in its explicit knowledge representation; that is, it allows the clinician to explicitly state its inputs, control actions, and outputs. The clinician can also clarify, or defuzzify, the entire process by carrying out crisp control actions, such as adding a cut-off level for prognoses and diagnoses, and trigger thresholds for treatment.
where defuzz can be max-min, centroid, left of mean, right of mean, or another defuzzification crisp control action rule.

Objectives and Relevance
In this paper, we aim to use advanced biostatistical methods to create clinical prognostic decision rules in aneurysmal subarachnoid hemorrhage derived from a large aneurysmal subarachnoid hemorrhage database, which can be tailored to a specific patient population. We explore novel methods that account for existing states of knowledge (Bayesian metaanalysis and regression), complex nonlinear relationships between independent, latent, and dependent variables (artificial neural networks), and grey zones in prognoses (fuzzy Logic decision rules). In fact, the combination of such techniques can represent a novel health research method for clinical outcome prediction applicable to many diseases in medicine.

Methods
Intracranial aneurysmal subarachnoid hemorrhage affects about 45, 000 individuals in North America annually. Aneurysmal subarachnoid hemorrhage is associated with a mortality rate of at least 45% in the first 30 days following rupture [5,6]. Apart from the primary neurological injury from the aneurysmal rupture itself, other secondary injury processes can further worsen an individual's neurological condition and eventual clinical outcome. These processes include both neurological processes (such as delayed stroke, rebleeding, brain swelling, vasospasm induced strokes, seizures, and hydrocephalus) and systemic medical complications (such as myocardial infarction, fever, pulmonary edema). and Taken together, these processes can lead to long-term disability.
Types of disability include physical, neurocognitive, and psychological impairment [7]. Tirilazad aneurysmal subarachnoid hemorrhage database is used to illustrate prognostic decision principles derived from a combination of techniques from multiple linear regression, artificial neural networks, fuzzy Logic and bayesian analysis. Tirilazad is a 21-aminosteroid compound produced by Pharmacia & Upjohn, Kalamazoo, MI, USA, originally investigated by the University of Virginia Health Sciences Center, as a free radical scavenger for potential treatment of cerebral vasospasm. This medication was investigated in five randomized clinical trials [8][9][10][11][12] involving patients with aneurysmal subarachnoid hemorrhage between 1990 and 1997 in 162 centers from 21 countries across North America, Europe, Australia, New Zealand, and South Africa. Tirilazad was found to have no effect on clinical outcome in patients with aneurysmal subarachnoid hemorrhage. The resultant database from these five studies contains 3550 patients, with its primary outcomes being Glasgow outcome score at 3 months and death from any cause. Glasgow outcome score is a 5-point neurological scale with the following designations: 5: good recovery-normal life activities despite minor deficits, 4: moderate disability-disabled but independent, 3: severe disability-conscious but disabled, 2: persistent vegetative state-unresponsive and speechless, and 1: death [13].
Centers followed strict treatment protocols, and variables had fewer than 5% missing entries. Tirilazad was administered in the intravenous form from day 3 to day 10 after subarachnoid hemorrhage onset. Only one percent of patients were lost to followup.
Patients in each treatment group were managed in a similar manner. Over 85% underwent surgical clipping, with 50% operated within the first 48 hours. Baseline demographics in both treatment and control arms were balanced in terms of gender, age, number of preexisting medical conditions (including hypertension, myocardial infarction, and angina), mean time to treatment, mean admission systolic blood pressure, admission neurological grade, ruptured aneurysm location, and admission amount of blood. These potential confounders were accounted for in statistical analysis. Proportions of patients experiencing vasospasm were similar in different treatment groups, as were the percentages of patients experiencing both neurological and systemic disabilities, including cerebral hemorrhage, cerebral ischemia, second stroke, rebleeding, hydrocephalus, sepsis, pulmonary embolus, brain herniation, pneumonia, and renal insufficiency.
A Cochrane systematic review [14] on the five trials on Tirilazad found no substantial heterogeneity among the trials, and there were no significant differences in adverse events between the treatment group and placebo group. The Tirilazad database represents the largest currently available aneurysmal subarachnoid hemorrhage clinical trial database worldwide. Derivation

Multivariable Linear Regression Analysis.
Frequentist linear regression was created using IBM SPSS. Included in the analysis were predictor variables from the Tirilazad database, without recoding, renaming, reclassification, or data transformation. Treatment variables were not included in this analysis. Predictor variables included were age, gender, neurological grade, intraventricular hemorrhage, subarachnoid hemorrhage thickness, time to treatment, clinical vasospasm, mean arterial pressure, aneurysm location, prior anticoagulation, eye opening, normal motor response, normal speech, admission angiographic vasospasm, intracerebral hemorrhage, hydrocephalus, prior subarachnoid hemorrhage, history of hypertension, history of myocardial infarction, history of angina, history of migraines, history of diabetes mellitus, antiepileptic use, ruptured aneurysm location, cerebral edema, pulmonary edema, vasospasm day, baseline temperature, fever on day 8, and cerebral infarction. Dependent variable is the patient's clinical outcome (Glasgow outcome score) at 3 months. Table 1 lists the statistically significant predictor ( < 0.05) variables for neurologic outcome which include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age, and mean arterial pressure.
All significant prognostic variables, with the exception of normal motor response, point to poorer prognosis. Presence of normal motor response at presentation signifies more favourable outcome. Its collinearity diagnostics reveal close correlation with other prognostic variables, namely, neurological grade.
Vasospasm day is closely correlated with other prognostic variables, namely, clinical and angiographic vasospasm.
Closer examination of the tirilazad patient dataset, using nontransformed data points, reveals that heteroscedasticity is present in the model. As heteroscedasticity is present, we cannot be confident that the strength of prediction of the linear regression equation from this multiple linear regression model is equally strong across all levels of the included independent variables. Therefore, artificial neural networks were used to explore presence of complex nonlinear relationships and latent variables inherent in the database.

Bayesian Analysis.
Beta coefficients of prognostic variables generated with Bayesian regression (WinBUGS version 1.4.3) using uninformed priors are similar in magnitude to those generated from frequentist multiple regression analysis, as demonstrated in Table 2.
These values are very useful clinically by themselves, as they can be applied to patient prognostication. They also represent the informed priors for Bayesian regression models. Because of this database's very large sample size ( = 3551), beta coefficients generated with Bayesian regression (uninformed priors) are similar in magnitude to those generated from Bayesian regression (informed priors). Variables, in order of magnitude of normalized importance in Table 4, include age, second stroke, myocardial infarction, temperature, mean arterial pressure, neurological grade, ruptured aneurysm size, diabetes mellitus, angina, subarachnoid clot thickness, lung edema, admission angiographic vasospasm, previous subarachnoid hemorrhage, vasospasm day, cerebral edema, vasospasm during treatment, aneurysm location, time to treatment, normal motor response, intracerebral hematoma, normal speech, day 8 temperature, gender, eye opening, migraine history, intraventricular hemorrhage, hypertensive history, anticoagulant use, seizures, and hydrocephalus.

Artificial Neural
Interrelationships between input nodes (predictor variables), hidden variables (11 of them in one hidden layer), and output nodes (Glasgow outcome score) are illustrated in Figure 2. Possible latent (unobservable) variables, not measured by investigators, include the following:  (1) disrupted cerebral autoregulation contributing to both ischemia and cerebral edema after subarachnoid hemorrhage, (2) biochemical markers of brain injury predisposing to cortical depression, (3) cellular markers demonstrating physiologic dysfunction (such as mitochondrial dysfunction as reflected by imbalance between oxygen supply and consumption), (4) genetic factors affecting outcome (such as inheritance of genes making patients more prone to microthrombotic events in the cerebral microvasculature disrupting cerebral blood flow),

Neural Networks with Bayesian
systemic risk factor cluster, and (3) neurologic risk factor cluster. Fuzzification begins with assigning each linguistic variable a range of membership functions. When all members of each cluster are present, then maximum membership function of 1 is reached for that particular linguistic variable.
Fuzzy inferences then proceed with derivation of "ifthen" rules that define system behaviour. In our case (Figure 3), if cluster one (demographic risk factor cluster) is fulfilled, then, one has low suspicion for poor neurologic outcome. If members of both cluster one (demographic risk factor cluster) and cluster two (systemic risk factor cluster) are present, then one has raised suspicion for poor neurologic outcome. One has high suspicion for poor neurologic outcome if some members of cluster one (demographic risk factor cluster), cluster two (systemic risk factor cluster), and cluster three (neurologic risk factor cluster) are present.
Next, defuzzification step translates the linguistic variable results into the crisp control action of denoting high likelihood for poor outcome. The centroid rule is applied to designation for poor prognostication, whereby a patient fulfills risk factors from 2.5 clusters.
As an example, an elderly patient (demographic cluster), with a number of medical comorbidities (examples from systemic cluster, such as coronary artery disease, hypertensive, and diabetic), who experiences a number of neurological complications after treatment (examples from neurologic cluster, such as second stroke, cerebral ischemia, and seizures) is predicted to have a poor long-term neurologic outcome (poor three-month Glasgow outcome score).

Limitations
The techniques of bayesian neural networks with fuzzy logic inferences were applied to the Tirilazad database. We note that case mix in this database were patients who underwent surgical clipping of cerebral aneurysms. Since the conduct of the Tirilazad trials, there are advancements in both medical management and surgical treatment of cerebral aneurysms. These include improved neurocritical care of aneurysmal subarachnoid hemorrhage patients and aneurysmal coiling. In addition, the Tirilazad database did not include important prognostic variables such as smoking, alcohol consumption, rebleeding, and infection. In order to overcome these limitations, ongoing efforts are now underway to combine a number of aneurysmal subarachnoid hemorrhage databases worldwide in the multinational Subarachnoid Hemorrhage International Trialists (SAHIT) collaboration. Important prognostic variables as well as aneurysmal coiling patients will be included in this database. In addition, Bayesian neural networks with fuzzy logic inferences will be applied to the SAHIT database.

Conclusions
Complex relationships exist among heterogeneous groups of prognostic factors. The accuracy of clinical outcome prediction depends on clarification of these relationships. General linear models have been used frequently for decades. These popular techniques produce interpretable coefficients for explanatory variables and are easily estimated using commercially available statistical programs. In real life, however, data points rarely fit perfectly linear relationships. Greater deviations from linearity point to the need for exploratory analyses with complex nonlinear systems.
Typical artificial neural networks can fit training data with high precision and detect nonlinear relationships among predictor variables, with the overall aim of predicting yet to be seen observations. Neural networks also incorporate latent (unobserved variables) in one or two hidden layers. Interrelationships between independent, latent, and outcome variables are assigned synaptic weights, or connection strengths. A weighted average of these connection strengths gives a variable's normalized importance, or the percentage contribution of each predictor variable to the overall clinical outcome, taking into account the error between predicted and actual values. The sum of all relative importance values of input variables (representing influences of predictor variables on clinical outcome in relation to the rest of the independent variables) equals 100 percent. Small sample sizes can affect model building, making it more difficult to distinguish between true signal and noise. Over-and underfitting can occur in these cases, which, in turn, can affect model generalization. Neural networks with Bayesian regularization technique have been devised to overcome the above problem, whereby weights are assigned probability density distributions, incorporating Bayesian statistics to estimate weight uncertainty, or the relative degree of belief in the different values for synaptic weights.
Typical artificial neural networks give fixed structures, whereas Bayesian neural networks give flexible structures. Bayesian neural networks are ones that assign probability distributions to all elements of the network, including inputs, hidden nodes, outputs, as well as their associated weights. Prior likelihoods for probability distributions can be generated using Bayesian meta-analysis. Bayesian regularization terms are included, which prevent model overand underfitting. In addition, Bayesian Neural Networks give generalizable posterior distributions without compromising nonlinearity properties. Bayesian Neural Networks are trained by sampling from joint posterior likelihoods of network structure and weights by Monte Carlo sampling methods. This training avoids the problem of convergence at local minima. Posterior distributions of weights can be used to evaluate uncertainties of predictions of trained networks and can also be used to assess network sensitivities. The larger the sample size, the smaller the Bayesian posterior probability distribution ranges. If data points fall into a linear relationship, typical artificial neural networks and Bayesian neural networks can detect this relationship. Hence, linear regression can be viewed as a special case of neural networks.
Typical artificial neural network and Bayesian neural network error can be due to network weight uncertainty (model uncertainty due to imperfections in data, nonoptimum network structure, and nonoptimum learning algorithms), and error from remaining sources, including intrinsic noise that includes random error due to measurement noise, and error due to finite resolution of observation system. If posterior distributions of weights are very narrow in relation to noise distribution, then the width of distribution of networks outputs can be influenced by noise. On the other hand, if the posterior distributions of weights are larger than noise distribution, then, the width of network outputs is dominated by distribution of network weights.
Fuzzy logic approach to Bayesian neural networks allows the clinician to explore where within the Bayesian range the nonlinear relationship for a particular case is most likely to exist. Results generated from Bayesian neural networks with fuzzy logic inferences will, then, slightly differ from case to case, accounting for the special characteristics of that certain case.
Fuzzy logic inferences should be applied at the end of Bayesian neural network formulation. In other words, one should allow the Bayesian neural network learning machine to do its own learning before applying fuzzy logic rules, so that all probability distributions are explored.
Bayesian neural networks with fuzzy logic inference can be conceptually interpreted as follows. Based on one's own experience (summation of existing parameters weighted by strength of belief in what happened beforehand), one can predict (based on one's assigned strength of belief) where along a spectrum of probabilities of the unknown quantity a value will end up. If it falls outside the spectrum in real life, then, one has to check whether there are still unknown elements influencing the outcome variable in question.