Identification of Alcoholism Based on Wavelet Renyi Entropy and Three-Segment Encoded Jaya Algorithm

1School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China 2Department of Informatics, University of Leicester, Leicester LE1 7RH, UK 3School of Computer Science and Technology, Nanjing Normal University, Nanjing, Jiangsu 210023, China 4Digital


Introduction
Alcohol use disorder (AUD) affected 208 million people worldwide in 2010.It can cause severe adverse effects to the brain, liver, heart, and pancreas.The long-term misuse can lead to increased tolerance to alcohol, making it difficult to control the consumption.The short-term misuse can lead to "blood alcohol concentration (BAC)."A BAC from 0.35% to 0.80% can cause fatal respiratory depression and lifethreating alcohol poisoning.
This paper studies the effect of long-term alcohol misuse on the brain.The alcohol misuse can have a damaging effect on the brain neurons; hence, patients with long-term AUD have smaller volumes of white matter and gray matter than age-matched controls.Besides, alcohol causes adverse effect on the prefrontal cortex and cerebellum.The current diagnosis of AUD mainly relies on manual observation based on brain images.However, due to mild symptoms, the radiologists may miss the slight shrinkage of AUD brains and be unable to identify it at an early stage.It is necessary to create an efficient approach that can monitor the patient brain via magnetic resonance imaging (MRI) and provide automatic, early diagnosis.
Over the last decades, computer-vision-based techniques have been proposed for automatically detecting changes on brain structure for brain related disease diagnosis based on MRI scans.Nayak et al. ( 2016) [1] presented a brain image classification algorithm based on random forest.Alweshah and Abdullah (2015) [2] hybridized firefly algorithm (FA) and probabilistic neural network (PNN).They used the proposed method for detecting changes in the brain.Lv and Hou [3] proposed an improved particle swarm optimization (IPSO) to detect alcoholism in MRI scanning.Monnig (2012) [4] suggested detecting white matter atrophy in neuroimaging of AUD.Yang (2017) [5] combined Hu moment invariant (HMI) and support vector machine (SVM) for pathological brain detection.Jiang and Zhu (2017) [6] explored the method using pseudo Zernike moment (PZM).Lv and Sui (2017) [7] used data augmentation technique for alcoholism detection.
Although several of the above methods were developed for pathological brain detection, they can be easily transferred and applied to alcoholism detection.Nevertheless, these methods suffer from several common problems: first, approaches that do not take into account expressions of complexity of the brain structure do not exhibit good performance in AUD.Second, the training algorithms of existing classifiers may fall into local optimal and it is difficult to optimize the hyperparameters (e.g., the number of hidden neurons in a feedforward neural network) of the classifiers [8].
To address the above problems, we propose in this study a novel identification method of alcoholic use disorder.Our contributions include the following: (1) a novel feature extraction method-wavelet Renyi entropy, which can describe the complexity of brain structure at multiple scales-and (2) an improved Jaya algorithm to train a feedforward neural network, which can optimize the weights, biases, and the number of hidden neurons simultaneously.Our training algorithm does not need to set algorithm-specific parameters.
The rest of this paper is organized as follows: Section 2 describes the subjects, scan protocol, and slice selection method.Section 3 presents the proposed feature extraction method-wavelet Renyi entropy.Section 4 describes the classifier construction and the proposed training algorithm: three-segment encoded Jaya algorithm.Section 5 provides the implementation procedure and the evaluation method.Besides, we show how to use grid search to optimize the parameters of wavelet Renyi entropy.The results and discussions are presented in Section 6. Section 7 concludes the work.

Materials
2.1.Subjects.The subjects went through a medical history interview to guarantee they met the inclusion criteria.Those qualified applicants received the computerized diagnostic interview schedule version IV, which ascertains the presence or absence of major psychiatric disorders.Applicants were excluded if mandarin was not their first language, if they were left-handed, or if they had HIV, epilepsy, and stroke; Wernicke-Korsakoff syndrome; bipolar disorder; cirrhosis or liver failure or seizures unrelated to alcoholism, head injury with loss of consciousness more than 15 minutes unrelated to alcoholism, depression, schizophrenia, and other psychotic disorders.
Finally, we enrolled 114 abstinent long-term chronic alcoholic participants (58 men and 56 women) and 121 nonalcoholic control participants (59 men and 62 women).Participants were enrolled through flyers posted in Jiangsu Province Hospital, Nanjing Children's Hospital, and Nanjing Brain Hospital, as well as the Internet-based advertisements.The data collection lasted for a total of three years.The research was approved by the Institutional Review Board of the participating hospitals.Informed consent was obtained from each participant.
The 235 participants were tested by the "Alcohol Use Disorder Identification Test (AUDIT)" [9].The unit "ounce" was transformed to "gram," since the former is not widely identified in China.Their demographic characteristics are shown in Table 1.In this study, we only focus on the structural imaging data.

Scan Protocol.
All 235 subjects lied down as still as possible, with their eyes closed and remaining conscious.Scanning was implemented by a Siemens Verio Tim 3.0T MR scanner (Siemens Medical Solutions, Erlangen, Germany).In total, 216 sagittal slices covering the whole brain were acquired, using an MP-RAGE sequence.The imaging parameters were listed as follows: slice thickness = 0.8 mm, TE = 2.50 ms, TR = 2000 ms, TI = 900 ms, FA = 9 ∘ , matrix = 256 × 256, and FOV = 256 mm × 256 mm.The acquired image was 16-bit gray level depth, and we reduced it to 8-bit gray level depth, since the alcoholism alters the structure of healthy brain and it does not change the gray level of brain images.Besides, 8-bit gray level provides enough information, so it is unnecessary to use 16-bit gray level images.

Slice Selection.
We used FMRIB software library (FSL) v5.0 software [16,17] to extract brain and remove skulls for each scanned 3D image.All the volumetric images were normalized to a standard MNI template.Afterwards, we resampled each image to 2 mm isotropic voxel.The slice at  = 80 (8 mm) at MNI 152 coordinate, which is an average of 152 T1-weighted MRI scans linearly transformed to Talairach space, was chosen for each patient.The reason for selecting the 80th slice is that it contains the two distinguishing features of alcoholic patients: (i) the enlarged ventricle and (ii) the shrunk gray matter, for example, the precentral gyrus [18], inferior frontal gyrus [19], and middle temporal gyrus [20].
Figure 1 shows the clear difference between the alcoholic and healthy samples.Afterwards, the background was cropped, leaving a rectangle matrix with size of 176 × 176 for the subsequent classifier training.The datasets used in this study are available upon request.

The Proposed Feature Extraction Method
To extract distinctive features, this study proposed a new wavelet Renyi entropy (WRE), which combines discrete wavelet transform and Renyi entropy in order to describe the complexity of the brain structure.The wavelet decomposition provides multiresolution and multiscale analysis, while the Renyi entropy provides the complexity description of the wavelet subbands of brain structure.

Wavelet Decomposition.
For a specific signal/image, the discrete wavelet transform (DWT) transforms the signal/image to the wavelet domain.It performs the transformation at multiple levels, by delivering the previous approximation subband to the quadrature mirror filters (abbreviated as QMF) [21].Compared to traditional Fourier transform, DWT has the key advantage of temporal/spatial resolution.
Let () be a given one-dimensional signal, and the continuous wavelet transform of () is depicted as where  represents the coefficients and () the mother wavelet.( |   ,   ) is defined as Here,   represents the scale factor and   the translation factor (both   and   > 0).Formula (1) can be discretized by replacing   and   to discrete variables  and .
where the parameters  and  represent the values of scale and translation factors, respectively.By this means, we can produce the DWT as Here,  means the discrete version of variable .↓ means the downsampling.The functions () and () represent the low-pass filter and high-pass filter, respectively. and  represent the approximation subband and the detail subband, respectively.For a two-dimensional DWT (abbreviated as 2D-DWT) [22], suppose the image is symbolized as (, ), and there are four subbands in all after each decomposition (, , , and ), shown in Figure 2. The subband  is the approximation component of original image.Subbands , , and  represent horizontal, vertical, and diagonal position, respectively. will be decomposed into four new subbands at a higher level, to produce corresponding higher-level subbands.
Step 1 Input a given brain image .
Step 3 For  = 1: 3 + 1 Get the 256-bin histogram of the th subband wavelet coefficients; Obtain Renyi entropy with order of  over the histogram.

End
Step 4 Output the catenation of all Renyi entropy values of all subbands.

Renyi Entropy.
Each subband of wavelet decomposition can be regarded as a discrete variable .Suppose  has possible outcomes as Suppose the corresponding probability is defined as The -order Renyi entropy is defined as [23] The Renyi entropy is Schur-concave and it is a nonincreasing function in .In some special cases, the Renyi entropy will turn to other types of entropies.For instance,  0 () is called Hartley entropy,  1 () is Shannon entropy, and  ∞ () is the min-entropy [24].Suppose we have a binary random variable  with  = [ 1 ,  2 ], where  2 = 1− 1 .The Renyi entropies with different -values against  1 are plotted in Figure 3.The concaveness and the nonincreasing against  properties are obvious from this picture.Zero values are included, since this does not affect the calculation of Renyi entropy.

Wavelet Renyi Entropy.
In the past, scholars have proposed the so-called "wavelet Renyi entropy."Nevertheless, our proposed WREs are different from traditional WREs.First, traditional WREs are mainly for one-dimensional signal, while ours are for two-dimensional image.Second, traditional WREs calculate entropies over the approximation subbands, while our method calculates entropies over both approximation and detail subbands of wavelet coefficients.The pseudocode of WRE is depicted in Pseudocode 1.The bin number of wavelet coefficient histogram is set to 256 in this study.
For a given image, our proposed WRE produced a (3 + 1)-element feature vector.Here, we choose the optimal values of  and  by grid searching approach.The detailed implementation is explained in Section 5.3.

The Classifier Construction Based on a Feedforward Neural Network (FNN) and Three-Segment Jaya
To train the classifier, we have proposed using a feedforward neural network (FNN) and three-segment Jaya algorithm.Scholars have used various classifiers in medical brain image analysis, such as decision tree, support vector machine [25], and naive Bayesian classifier.Nevertheless, the feedforward neural network (FNN) won remarkable success, because of the universal approximation theorem [26], which says the following.Suppose  is a bounded and nonconstant continuous function.Given any function  and any small number  > 0, there exist an integer , real vectors   , and real constants   and   , such that we have where () can be used as an approximation realization of function , which satisfies      () −  ()     < .
However, the traditional FNN training algorithm is a backpropagation (BP) gradient descent algorithm.The BP and its variants often converge to local optimal points.Threesegment encoded Jaya is introduced to address the problem.In the following sections, we will detail each of the methods.

Structure of FNN.
Structurally, the FNN include three layers: (i) an input layer accepted the features; (ii) a hidden layer contains hidden neurons; (iii) an output layer outputs the scores of each class.Finally, the "argmax" function predicts the class associated with the largest score.Figure 4 presents the diagram of FNN.The number of input neurons is the same as the number of features extracted from brain images, the number of output neurons is the same as the number of classes, and the number of hidden neurons is commonly obtained by hyperparameter optimization.

Jaya Algorithm.
As mentioned earlier, the traditional FNN training has an issue with global optimization; to address this problem, a massive number of global optimization algorithms were proposed and employed to train FNN, particularly in the field of brain image classification.For example, Hajimani et al. (2017) [11] designed a multiobjective genetic algorithm (MOGA) to detect cerebral vascular accidents.Chen et al. (2017) [12] used particle swarm optimization (PSO) to classify MRI brain tissues.Subramaniam and Radhakrishnan (2016) [13] used bee colony optimization (BCO) to classify brain cancer image.Raghtate and Salankar (2015) [14] proposed a modified ant colony system (MACS) to realize automatic brain MRI classification.Chen and Du (2017) [15] proposed a real-coded biogeography-based optimization (RCBBO) method for pathological brain detection.
Those algorithms make the classifier more robust than BP; nevertheless, their own parameters need to be finetuned, which causes the hyperparameter optimization problem.To overcome the limitation of existing optimization approaches, Jaya as a powerful global optimization approach has been introduced by Rao (2016) [27] as a benchmark function for constrained and unconstrained problems.It is an algorithm-specific parameter-free approach which has been proven to be superior to state-of-the-art optimization algorithms and has been successfully applied in thermal performance optimization [28], photovoltaic model identification [29], cooling tower design [30], sensing period adaptation [31], heat change optimization [32], and so forth.
where ℎ represents the fitness function.Equation (12) indicates that ( + 1, , ) is assigned with (, , ) if the modified candidate (, , ) is better in terms of fitness than (, , ); otherwise it is assigned with (, , ).The algorithm iterates until the termination criterion is satisfied.We set the termination criterion as follows: either the algorithm reaches maximum iteration epoch or the error does not reduce for five epochs.

Three-Segment Encoded Jaya.
The existing Jaya algorithm is mainly used to train weights and biases of FNN as described in Phillips (2017) [10].However, we believe the number of hidden neurons is also an important hyperparameter that influences the classification performance of FNN.Hence, we proposed a three-segment encoded Jaya algorithm (TSE-Jaya), which aims to optimize the weights, biases, and number of hidden neurons simultaneously.The candidate (, , ) now contains three parts as  (, , ) = [ 1 (, , )  2 (, , )  3 (, , )] , (13) where () 1 , () 2 , and () 3 represent extracting the first part, second part, and third part of the solution candidate representation.The first part encodes the weights, the second part encodes the biases, and the third part encodes the number of hidden neurons (NHN).The modified candidate is defined consequently as  (, , ) = [ 1 (, , )  2 (, , )  3 (, , )] .(14) The modification rule does not obey (10), and the new modification rule is three-fold as follows: where  and  are two random positive numbers, similar to variable .Other procedures are the same as those in Jaya algorithm.

Cross Validation Based
Implementation. Figure 6 presents the flowchart of our method.Here, the -fold cross validation method [34] was used in order to avoid overfitting and report out-of-sample errors.We divide the whole dataset 10-fold.In th trial, ( − 1)th fold is used as validation, th fold is used as test, and other folds are used as training.The training iterates until the accuracy over validation ( V ) set increases for five continuous epochs.For a clear understanding, we plotted a toy example in Figure 7. Here, at epoch 6, the validation error reaches the minimum.Then, from the 6th to 11th epoch, we can observe the validation increases although the training error decreases, which indicates an overfitting occurs.Hence, we should select the weights corresponding to the 6th epoch.The goal of -fold cross validation in this study is to avoid overfitting.
where  means the ideal confusion matrix,  the number of folds, and  the number of repetitions.In this study, we run the 10-fold cross validation 10 times, and the ideal confusion matrix is

Evaluation.
The evaluation was performed on the realistic confusion matrix of 10 × 10-fold cross validation.Suppose the positive class is alcoholism, and the negative class is the control.We can define true positive (TP) as alcoholism correctly identified, true negative (TN) as control correctly identified, false positive (FP) as control mispredicted as alcoholism, and false negative (FN) as alcoholism mispredicted as control.Finally, we define three measures: sensitivity (Sen), specificity (Spc), and accuracy (Acc).
5.3.Grid Searching.In the grid searching, the criterion uses the "accuracy (Acc)" measure defined above.The implementation is explained in Table 2.For wavelet decomposition level , a simple grid searching from 1 to 5 with an increase of 1 was used, since  should be an integer.
For the order  of Renyi entropy, a coarse-to-fine searching strategy was used.First, a coarse grid was set from 0 to 6 with an increase of 1, and we obtained the coarse candidate .
Then, a fine grid was set from −0.5 to +0.5 with an increase of 0.1, and the optimal order  * is obtained as

Results and Discussions
Our programs were developed in-house.The experiment ran on the platform of Dell laptop with 2.20 GHz Intel Core i7-4702HQ CPU and 16 GB RAM.The operating system was Windows 10.MATLAB 2017a is the programing development environment.

Statistical Analysis.
The cross validation divides the dataset into 10 sets.In each run, the number of sets resulting from the division is different.We set the wavelet decomposition level as 4 and Renyi -value as 1.2.The maximum iterative epoch is set as 1000, and the population in Jaya algorithm is set to 20.The sensitivity, specificity, and accuracy of our method are listed in Tables 3, 4, and 5, respectively.We can observe our proposed method achieved a sensitivity of 93.60 ± 1.55%, a specificity of 93.72 ± 1.42%, and an accuracy of 93.66 ± 1.23%.

Comparison to State-of-the-Art Approaches.
We compared this proposed method "WMI + FNN + TSE-Jaya" with four state-of-the-art approaches: FA + PNN [2], IPSO method [3], HMI + SVM [5], and PZM [6].All the algorithms were run over a 10 × 10-fold cross validation over our dataset.The results of 10 × 10-fold cross validation of four state-of-the-art methods are shown in Table 6.Finally, the comparison result is presented in Table 7.

Optimal Wavelet Decomposition Level.
In this experiment, we fixed the Renyi order  to 1.2 and let the wavelet decomposition level change from 1 to 5 with increase of 1.The   corresponding accuracy varied as shown in Figure 8.Here, the accuracy is 85.45%, 91.45%, 93.11%, 93.66%, and 87.96% when decomposition level is 1, 2, 3, 4, and 5, respectively.Obviously, the 4th-level decomposition yields the greatest accuracy; hence, we chose the optimal decomposition level as 4. The Renyi entropy was then calculated over all the subbands of this 4-level wavelet decomposition.
Note that 3D-DWT is more straightforward than 2D-DWT over a one particular slice.Nevertheless, our aim in this study is to select a distinguishing slice, which is related to brain regions affected by alcoholism, in order to reduce the computation burden.In the future, we shall test the results of 3D-DWT.

Optimal Renyi Order.
In this experiment, we shall illustrate why we set the Renyi order  as 1.2 by a coarse-tofine grid search.First, we search the coarse grid from 0 to 6 with an increase of 1.The result was shown in Figure 9(a), and the value of 1 was selected as the initial point for fine grid search.Second, the fine grid from 0.5 to 1.5 with an increase of 0.1 was established, and the result was shown in Figure 9(b).We can observe that  = 1.2 can yield the greatest accuracy.6.5.Effectiveness of Three-Segment Encoding.Our proposed TSE-Jaya can train the weights, biases, and number of hidden neurons (NHN) simultaneously.In this experiment, we compare TSE-Jaya to plain Jaya algorithm [10], which can only train the weights and biases of feedforward neural network [10]; hence, we have to fix the number of hidden neurons by experience.Here, we set NHN as 10 for plain Jaya algorithm.The settings of other parameters were the same as previous experiments.The results of 10 × 10-fold cross validation of plain Jaya [10] are shown in Table 8, and the comparison between Jaya [10] and our proposed TSE-Jaya is shown in Table 9.
The superiority of proposed TSE-Jaya to plain Jaya [10] is clear.This demonstrates the importance of choosing the optimal number of hidden neurons, that is, that the variable number of hidden neurons gives a better performance than fixed number of hidden neurons, which is also validated by Carleo and Troyer (2017) [35].

Training Algorithm Comparison.
To demonstrate the efficiency of the proposed algorithm, we have compared the TSE-Jaya with a several global optimization algorithms including MOGA [11], PSO [12], BCO [13], MACS [14], and RCBBO [15].All the settings of common controlling parameters are the same: the maximum iterative epoch is set as 1000, and the population in all algorithms is set to 20.The algorithm-specific parameters of those four comparison algorithms are assigned by experiences.The results of those four training algorithms over 10 × 10-fold cross validation are shown in Table 10, and the final comparison with our proposed TSE-Jaya was shown in Table 11.
Table 11 shows that the proposed TSE-Jaya performed the best among all six algorithms.The PSO [12] and RCBBO [15] ranked the second and the third, with their accuracies over than 90%.The BCO [13] ranked the fourth, the MACS [14] ranked the fifth, and MOGA [11] performed the worst.The   reasons behind efficiency of the proposed approach can be explained from two aspects: (i) The Jaya does not need to set the algorithm-specific parameters, making it more reliable than other algorithms.(ii) The TSE guarantees the variable number of hidden neurons at each run.

Validation of the Selected Slice.
In this experiment, we validated in terms of the classification performance the selection of the 80th slice.We set a range of  increasing from 30 to 150 with an increment of 10 as shown in Figure 10.
Other settings were the same as the previous experiments.Again, 10 repetitions of 10-fold cross validation were utilized.The curve of accuracy is drawn in Figure 11.It is observed that the 80th slice gives the highest accuracy among all candidate slices.The reason is that this slice contains the enlarged ventricle and the shrunk gray matter caused by alcoholism.On the contrary, hippocampus [36] and striatum [37] are also related to alcoholism.Nevertheless, their altered volumes are relatively small and hence do not provide an excellent performance in this task.
In this case, the optimal slice could be in a position that is vertical to  or -axes, or it can be even an oblique plane to all three axes.Here, we choose a slice vertical to -axis, which is for the convenience of radiologists, since they usually read the axial slices.In the future, we shall develop techniques to handle multislices, and we may develop surface analysis techniques [38].

Conclusions
In this study, we proposed a novel alcoholism identification method from healthy controls based on a computer-vision approach.Our method was based on three components: the proposed wavelet Renyi entropy, feedforward neural network, and the proposed three-segment encoded Jaya algorithm.The experiments showed that our method achieved a sensitivity of 93.60 ± 1.55%, a specificity of 93.72 ± 1.42, and an accuracy of 93.66 ± 1.23 over a 10 × 10-fold cross validation.The performance is superior to four state-of-theart alcoholism algorithms.We validated the optimal wavelet decomposition to be 4, and the optimal Renyi order was 1.2.Besides, comparing to the existing global optimization approaches, the proposed three-segment encoded Jaya is proven to provide a better performance than other methods such as plain Jaya and another five training algorithms.
Finally, we validated the reason why we chose the 80th slice.
The shortcomings of our method lie in two aspects.First, our method needs to scan the whole brain and select the 80th slice at -axis.Second, the wavelet Renyi entropy was

Figure 1 :
Figure 1: Slice examples between (a) a nonalcoholic brain and (b) an alcoholic brain.

Figure 3 :
Figure 3: Renyi entropy with different -values for a binary random variable.

Figure 7 :
Figure 7: A toy example explaining the effect of validation set.

Figure 8 :
Figure 8: Choosing the optimal wavelet decomposition level.

Figure 9 :
Figure 9: Choosing the optimal Renyi order by coarse-to-fine grid search.
(DDE, daily drinks of ethanol; DHD, duration of heavy drinking; AUDIT, alcohol use disorders identification test; LOS, length of sobriety).

Table 6 :
Results of 10 × 10-fold cross validation of four state-of-the-art methods.

Table 9 :
Plain Jaya algorithm versus the proposed TSE-Jaya.

Table 10 :
Results of 10 × 10-fold cross validation of four training algorithms.