Validation of the Revised Stressful Life Event Questionnaire Using a Hybrid Model of Genetic Algorithm and Artificial Neural Networks

Objectives. Stressors have a serious role in precipitating mental and somatic disorders and are an interesting subject for many clinical and community-based studies. Hence, the proper and accurate measurement of them is very important. We revised the stressful life event (SLE) questionnaire by adding weights to the events in order to measure and determine a cut point. Methods. A total of 4569 adults aged between 18 and 85 years completed the SLE questionnaire and the general health questionnaire-12 (GHQ-12). A hybrid model of genetic algorithm (GA) and artificial neural networks (ANNs) was applied to extract the relation between the stressful life events (evaluated by a 6-point Likert scale) and the GHQ score as a response variable. In this model, GA is used in order to set some parameter of ANN for achieving more accurate results. Results. For each stressful life event, the number is defined as weight. Among all stressful life events, death of parents, spouse, or siblings is the most important and impactful stressor in the studied population. Sensitivity of 83% and specificity of 81% were obtained for the cut point 100. Conclusion. The SLE-revised (SLE-R) questionnaire despite simplicity is a high-performance screening tool for investigating the stress level of life events and its management in both community and primary care settings. The SLE-R questionnaire is user-friendly and easy to be self-administered. This questionnaire allows the individuals to be aware of their own health status.


Introduction
Importance of stressors and their impact on human life have attracted many researchers' interest in the recent years. When investigating stressors, their frequency and intensity are the most important characteristics that must be considered [1]. Not all stressors have the same impact. Some of them have more intense impact on the individual's life and some have less. The importance of the same stressor can also be different in various societies and cultures. Evaluation of the importance and impact of stressors is a very attractive subject in psychological studies and many studies have been performed in this field [2].
To measure stressors, different scales and tools have been developed in developed countries [2][3][4]. Recently, the developing and use of stress measurement tools has also been the subject of many studies in developing countries as well [5].
The use of self-administered questionnaires is demonstrated to be a great tool to obtain useful information about the health status in epidemiologic studies and health surveys [6,7]. Meaningful estimates of disease status could be obtained by cross-check and agreement between questionnaire data and standard criterion such as medical records. Many studies have been considered an agreement between questionnaire data and a criterion standard [7][8][9][10][11][12][13]. In these 2 Computational and Mathematical Methods in Medicine studies, researchers used user-friendly and easy to be selfadministered questionnaires.
Being user-friendly for questionnaire provides a key characteristic and a better chance for success in questionnairebased researches. One of the advantages of the user-friendly questionnaire is that individuals could become aware of their health status while using them.
Questionnaires should be designed in a way to be an indicator of health status or other under-research conditions. In other words, a user friendly questionnaire could easily provide information about the individual health status by comparing the scores that are calculated with its cut point.
By using machine learning algorithms such as artificial neural networks (ANNs) and determining a cut point we aimed to provide a weight for stressful life events that are introduced in the stressful life event (SLE) questionnaire. This way, a revised SLE questionnaire could be used as a screening tool to differentiate healthy individuals from those who are at risk of a disease.

Dataset.
The data we used in this study is a part of the data collected for the Isfahan Healthy Heart Program (IHHP) research. IHHP is a community-based intervention program to prevent and control noncommunicable diseases. Details of the methodology used in IHHP including sampling strategies, survey instruments, data entry, and analysis in addition to the evaluation and followup of the subjects are described by Sarraf-Zadegan et al. elsewhere [14,15].
In summary, IHHP includes a baseline survey, four follow-ups and eventually a final phase. The baseline survey was conducted in 2001 in 2 interventional and 1 referral regions. The interventional regions include Isfahan and Najafabad. We selected Arak as the referral region. Four phases of annual followup and evaluation were performed on independent samples from 2002 to 2005. The Final phase was conducted in 2007. According to the regional population distribution based on the CINDI protocol, a multistage cluster random sampling was applied in order to differentiate urban versus rural areas [14].
Data used for the current study include the information from 4569 adults aged over 18 years who have completed the final phase and have all their related data available. The related data include demographic information such as age, sex, and educational years. We asked all participants to complete the SLE questionnaire and the general health questionnaire-12 (GHQ-12). An informed consent was obtained from all subjects.
We evaluated the frequency and intensity of the stressful life events for participants by the SLE questionnaire which was developed and validated by Roohafza et al. [1]. This questionnaire consists of eleven domains identified by factor analysis, including home life, financial problems, social relation, personal conflict, job conflict, educational concern, job security, loss and separation, sexual life, daily life, and health concern. In each domain few questions with regard to the related stressful event exist. To score each stressful life event, we used a 6-point Likert scale (0 = never, 1 = very mild, 2 = mild, 3 = moderate, 4 = severe, and 5 = very severe). For SLE, the standardized Cronbach's alpha is 92%.
The GHQ-12 was used for the validation and evaluation of the SLE questionnaire. The GHQ-12 is a well-established screening tool to detect psychological morbidity in community and clinical settings such as in primary care or general practice [16,17]. It includes 12 items that are classified in two main areas such as the ability to carry out normal functions and the appearance of new and distressing experience. If the GHQ score is ≥4, the individual has a high stress level.

The Basic Concept of Artificial Neural Networks.
Inspired from the biology of human brain, an artificial neural network (ANN) is a parallel processing system which has an executive performance [18]. These systems are able to learn rules embedded in the data, and due to this ability, it is usually used as a model for complex relationships between inputs and outputs or to find patterns in data. Pattern recognition is achieved by adjusting the ANN parameters by an error reduction method through learning. ANN consists of one input layer, one output layer, and one or more hidden layers. Figure 1 demonstrates an example of an ANN with an input layer, 2 hidden layers, and an output layer.
In Figure 1, circles in layers show neurons. The input layer takes input signals and transfers them to the hidden layers. The output layer takes the outgoing signal from the last hidden layer and presents the appropriate result.
Each layer is determined by its weight matrix. The matrix multiplication operation of the various layers along with applying transfer functions is formed network structure and map input signals transfer to output data. If the network consists of layers, the estimated mapping function will be ( ) where : input signals vector, : total number of layers, (⋅): transfer function layer , , : weight matrix incoming to layer from layer , 1,1 : weight matrix incoming to 1st hidden layer from input layer.
The number of neurons in input and output layers depends on the structure of the subject of interest.
An overall method for determining the number of hidden layers and neurons is not found in the literatures. Applying an evolutionary algorithm such the genetic algorithm (GA) and the particle swarm optimization (PSO) is one of the most widely used methods in this field.

The Basic Concept of Genetic Algorithm.
GA is a part of evolutionary computing theory that is growing rapidly. The main idea of this algorithm lies in Darwin's theory of evolution. In every step of this method, an initial population of chromosomes (initial responses) produces a new population of chromosomes (secondary responses). Repeating this process and generating a new population from a previous one in each step result in population growth and optimum response.  The process of producing a new population from a previous one uses four operators as follows; Selection: This operator selects a number of chromosomes to produce the next generation in each population. The selected chromosomes are parents. The probability of selection of chromosomes with potential to reach more optimal results is higher than others. Crossover: This operator takes more than one parent solution and produces one or more child solutions from them. Mutation: Mutation is a genetic operator that alters one or more gene values in a chromosome from its initial state and will produce a new chromosome. Mutation helps to prevent the population from stagnating at any local optima. Elite: It is possible that some relatively optimal chromosomes are eliminated during the selection process so by this operator a number of elite chromosomes are transferred directly to the next population. Figure 2 demonstrates a general overview of the GA method.

Problem Definition.
In this paper, the objective is to obtain a linear relation between the stressful life events introduced in the SLE questionnaire and the person's demographic characteristics. Demographic characteristics are considered as inputs and the stress level as an output variable. Stress level was measured by GHQ-12. In other words, we aim to determine the coefficients ( = 1, 2, . . . , 46) and ( = 1, 2, 3) in the following equation: In the above equation, is intensity of , the stressful life event introduced in the SLE questionnaire. If the stressors actually exist in person's life, is set to be the mean value of this stressful life event in the population; otherwise, in the case of lack of stressor in person's life, is set to be equal to zero. Parameters 1 , 2 , and 3 are representing gender, age, and education level, respectively. Gender is a binary variable. Furthermore education level is a variable that is equal to educational years. is the response variable determined by using the following formula: If the GHQ score is less than 4, the individual is classified in the low stress level group and otherwise in the high stress level group. The cut-point score of 100 in GHO is welldefined. That means if is less than 100, the subject is healthy otherwise is at risk of disease.
The purpose of this study is to measure the agreement between SLE questionnaire data and variable as GHQ score changes. The SLE questionnaire data for a person is shown as follows: 2 , 3 , . . . , , . . . , 46 , 1 , 2 , 3 ) .
Once we determined the coefficients, (2) was used as a screening tool.

Model Development.
In this study, we have used a hybrid model of GA and ANN to obtain coefficients of variables ( , ) in (2). To extract the linear relationship between input data and response, variable ANN was used. In order to increase the efficiency of the ANN in this model, some of its effective parameters, such as the number of hidden neurons, are regulated by GA. Use of GA methods allows validation and regulation of values by searching in the domain and for number of hidden neurons. This is done in a way that the relation extracted by ANN affords the maximum discrimination and can discriminate healthy people from those at risk in the best way. After determining , in order to simplify the use of (2) as a screening tool, we changed this equation as follows: = 1 1 + 2 2 + ⋅ ⋅ ⋅ + + ⋅ ⋅ ⋅ + 46 46 In the above equation, is product of and the stressor mean value. is a binary variable that is defined as follows: We established an expert panel consisting of three psychiatrists, two psychologists, two epidemiologists, and one statistician to evaluate coefficients. At times, the expert panel changed some coefficients ( ).

Sensitivity and Specificity.
Once the coefficients in the equation were defined, the stress level of individuals was calculated (5). The resulting number was compared with the GHQ score. Sensitivity and specificity are calculated by defining the following variables: (a) true positive (TP) which means that individual diagnosis as sick person is truly correct, (b) true negative (TN) which means that healthy individuals are correctly diagnosed as healthy, (c) false positive (FP) which means that healthy individuals are incorrectly identified as, sick and (d) false negative (FN) which means that sick individuals are incorrectly diagnosed as healthy. In our study, TP for an event is a calculated value greater than 100 and the GHQ score greater than 4. TN for an event is a calculated value less than 100 and the GHQ score of less than 4. FN for an event is a calculated value less than 100 and the GHQ score greater than 4. FP for an event is a calculated value of greater than 100 and the GHQ score less than 4. Sensitivity and specificity are calculated as follows:

Result
We studied 4569 individuals including a female population of 2252 (49.2%). The mean age of female and male participants was 38.6 ± 15.1 and 38.5 ± 15.4, respectively. The mean number of educational years in the female population was 8.9 ± 4.8 and in the male population was 7.2 ± 4.9 (mean ± standard deviation). The SLE questionnaire included 46 stressors which were scored by a 6-point Likert scale (0 = never, 1 = very mild, 2 = mild, 3 = moderate, 4 = severe, and 5 = very severe). Mean scores for stressful life events in the study population are shown in Table 1.
As shown in Table 1, some financial stressors like financial inflation and low income have the greatest mean in the studied population. The higher the average score, the more the intensity of the stressor.
If zero was selected for specific stressors in the SLE questionnaire, the total score remains unchanged, and if any of the values from 1 to 5 was selected, the total score was replaced by the mean value of stressors. In addition to other stressors, a set of vectors was considered in some cases, including demographic characteristic and changed GHQ score. This set of vectors is used for training and testing neural networks.
After determining ( = 1, 2, . . . , 46) (as shown in (2)), the calculated coefficient for each stressor was multiplied by its stressor mean value. At the end, the resulting value was revised by the expert panel. We sorted the final results for weight of each stressor in order of their importance and impact in Table 2.
Among all stressful life events, death of parents, spouse, or siblings was the most important and influential in our studied population. In contrast, increased working hours and concern about addiction of a family member did not gain much scores and was not as important.
The results for the coefficients of demographic characteristics ( 1 = 0, 2 = 0.2, 3 = 0) showed the effect of age on stress level for individuals and number of stressors. Changes in GHQ score increase, by 0.2 per year.
In this study, sensitivity was 83% and specificity was 81%. Our result shows that the SLE-R questionnaire is a screening tool with acceptable accuracy.

Discussion
In this paper, a revised SLE questionnaire is presented. We added weights to the SLE questionnaire's stressful life events using a hybrid model of GA and ANN. The importance of this promoted questionnaire is because of added weights to stressful life events. The weights calculated based on the  population response. The SLE-R questionnaire is a wellestablished tool that differentiates individuals who are at risk of a health problem or disease from those who are not at risk.
In this study sensitivity of 83% and specificity of 81% were obtained for a cut-point of 100. Agreement between questionnaire data and criterion standards is the subject of many questionnaire-based studies. The objective of these studies is developing a diagnosis tool by determining a pattern between questionnaire data and a criterion standard. Many of these studies are dealt with multivariables and it is highly likely that there are some interactions between them; ANNs are powerful tools that are used for correlations between known inputs and outputs and could consider the interaction between inputs in identified patterns.
Many studies have showed high efficiency of ANNs in various applications. DeGroff et al. used ANNs as a diagnostic tool and thereby classified heart sound data to innocent and pathological classes [19]. Sensitivity and specificity of 100% have been reported in their study. Wan et al. introduced a prediction approach based on a questionnaire using ANN models for skin attribute prediction [20]. The results of applying ANNs in the current study and other mentioned studies show that ANN is a strong and a valuable method.
The Holmes and Rahe stress scale (HRSS) is one of the most commonly and widely used quantitative measurements of psychosocial stress. It is a self-reported list of 43 common events associated with some degree of disruption of an individual's life. HRSS assists people to discern their internal stress and understand their cumulative stress over a one-year period. It assigns a number to the amount of stress being felt by a person with no margin for differential of how a person actually internalizes the stress. Individual's stress level is determined based on the calculated value in HRSS. Userfriendliness is one of the prominent characteristics of HRSS that makes it to be commonly used. One of the other widely used stress scales named Life Experiences Survey (LES) that developed by Sarason et al. is a self-reported structured interview that assesses major life events in the past year. This user-friendly questionnaire includes 57 items divided into two sections. Each participants can record positive, negative, or no effect for each event on their life.
The SLE-R questionnaire that is proposed in this paper is a screening tool similar to HRSS and LES. Despite simplicity, these questionnaires are high-performance screening tools for investigating the stress levels. In many cases, clinical tests are costly and time consuming and associated with uncertainty. In this situation, clinical screening tools can help experts to better identify a diagnosis. Because these instruments are self-reported, individuals could gain some information about their own health status while filling them up.
In the SLE-R questionnaire, a global number is determined for each stressful life event, and differences between individuals are not considered. This questionnaire being selfreported is another limitation for our study because the participant could only remember that stressor that is subject of a question in the questionnaire. The SLE-R questionnaire has not been tested for normalization. The questionnaire should be compared with other questionnaires and tested in different communities.
In conclusion, the SLE-R questionnaire is introduced as a screening tool with high sensitivity and specificity. The features of this questionnaire make it a useful research tool that Computational and Mathematical Methods in Medicine 7 could be used in clinical and primary care settings. Offering clinical diagnostics to communities is very costly and not practical. Screening tools such as the SLE-R questionnaire are required to consider a smaller target group. The SLE-R questionnaire can be completed by individuals with low literacy.