Joint Modeling of Exam Results and Attrition Status of Students at Hawassa College of Education, Ethiopia

Student attrition is a challenge for higher education institutions across the world. The purpose of this study was to examine the application of joint model using students’ commutative grade point average and attrition status. A total of 258 college student samples were used in this study. A mixed effect model for students’ grade point average and a Cox hazard model for students’ attrition status were modeled independently, and both submodels were modeled jointly by linking random effects through a shared parameter model. This study focused on gender issues, academic background, peer support, and residence. From the finding of this study, 26.4% were attrition, and 73.6% were retained. The estimated trend of commutative grade point average was found to be negatively associated with attrition status. The major factors that encounter students’ attritions were academic background and institutional factors.


Introduction
Education enables individuals and society to make allrounded participation in the development process by acquiring knowledge, ability, skills, and attitudes [1]. Education is a development of physical, mental, moral (spiritual), and social values of individuals for a life of dedicated service [2]. A significant and imposing challenge in education is to provide equitable instruction and assessment to all students, like students' performance is considered as objectively as possible for the field and in relation to their peers [3,4].
Student attrition is one of the greatest areas of interest in higher education, and it is usually defined as the number of noncompleting students (i.e., students who have not yet finished their study program) who are enrolled in a specific university, college, school, discipline, or program in a given year, but not enrolled in that same program the following year [5]. In Ethiopia, student attrition is considered to be of significant importance within the education sector due to its critical role in meeting national goals and institutional objectives. Because Ethiopia's various national policies which determine the development of human capital as a key strategic tool for meeting the country's aspiration of becoming a middle-income country by 2025 [6], student attrition has upsetting and costly impacts the country's economy, and it has drawn a considerable attention globally and nationally in colleges and universities, due to negative consequences on individual students' lives, their families, and a country's economies [2,7].
Some have argued that student completion rates are a fundamental measurement of student success, and it is often depend upon the prerequisite skills, knowledge, and commitment of students. The author goes on to mention that low levels of basic skills, an inadequate knowledge base, and low self-confidence were contributing to the failure and attrition of students in undergraduate programs. Confidence, self-efficacy, and self-esteem also seem paramount to new student success [8].
Numerous studies have sought to identify models and sets of variables to explain what causes students force to leave the tertiary education system. For most students, deciding to leave tertiary education is not the result of one factor. Rather, it is the result of a combination of complex, interconnected factors that develop over time. Many studies have identified previous academic performance, mismatches between student expectations and experiences, student disorientation or socialization, and other factors being key predictors of attrition. There are many variables that are likely to affect the academic success of students who enrolled to university through enabling programs. The study conducted by Alter and Haydon assessed that age, gender, educational and family circumstances, ability, self-confidence, achievement goals, and approaches to self-regulation of academic behavior were some of the main variables that affect students' performance severely [9].
The time constraints of college terms and the amount of academic rigor required in college courses can lead to student stress and dissatisfaction [10]. If students had an accurate estimate of time remaining to a person's retention, they could concentrate on those students whose studies are going to be prolonged and help them to remove the obstacles from timely retention. In a situation, when both separate and joint outcomes are observed in one subject, separate modeling does not take into account the dependence between the two types of responses. So, this approach has enabled researchers to carry out questions of factors influencing the longitudinal measure of students' result, the risk account for student retention and to test the association between cumulative grade point average (CGPA) repeated measurements and time-to-event data.

Data and Methods
The study has been carried out at Sidama Regional State, Hawassa College of Teacher Education, Ethiopia. The sample size used for this study was determined based on [11] formula designed for the appropriate size determination. The samples of 285 students were included in this study. Data for both the longitudinal and time-to-event outcomes were collected by using questionnaire, in which the questionnaire was prepared by the researcher based on literature which states about students' attrition and factors affecting attrition of the students. In the study, a batch of 2009 E.C regular natural science students was included, and others were excluded and studied repeatedly for the first five semesters The systematic random sampling was employed, and samples were selected using their ID.
The analysis of this study consists of both exploratory and inferential analysis. In exploratory data analysis, we had used mean profile plot for longitudinal data and Kaplan-Meier survival plot graph for time-to-event data. In the inferential analysis, linear mixed effect model for the longitudinal data Cox proportional-hazard model for the time to-event data and joint modeling for the two data altogether were used. As a first step of analysis, the data was explored in different ways in order to get details that helps to make decisions in the subsequent steps of the analysis. The mean and the correlation structures were also explored through graphical techniques. In parallel to defining the fixed effect model, a random effect model was chosen to define a covariance model. After deciding the fixed effects, the study selected a set of random effects to be included in the model.
In the longitudinal data analysis, the variable CGPA was used as outcome measure, and covariates were sex, age, distance of residence, high school result, family size, level of peer support, entrance result, marital status, and house head education, while in the survival model, time to attrition was considered as the response variable which can be affected by sex, entrance result, peer support, marital status, interest of field study, income, study time, and place of residence of students.
In longitudinal data analysis, mean profile plots, correlation structure, and variance structure plots were obtained in order to gain some insight of the data [12]. The individual profile plots and the variance structure were used to gain insight of the variability in the data and to determine whether random effects (random intercepts and slopes) were to be considered in the analysis [13].
For longitudinal data, two sources of variations are considered. Modeling with in subject variation helps us to study changes over time, while modeling between subject variations helps us to understand differences between subjects [14].
A mixed model is one that contains both fixed and random effects part. For the continuous case, the linear mixed effect model (LLM) provides a general and flexible modeling framework where subject-specific random effects assumed to follow a normal distribution are included to account in the correlation [15].
Let β denote a P × 1 vector of unknown population coefficients for the fixed effects and Х i be known n i × P design matrix values of the fixed predictors linking β to set of longitudinal measurements Y i .
Let b i denote a K × 1 vector of unobservable individual random effects and Z i be a known n i × 1 design matrix values of the random factors linking b 1 to Y i , and ε i is n i × 1 vector of unknown random errors.
Then, the general LMM of the longitudinal data is given by: where ε i distributed as Nð0, σ 2 IÞ is a vector of residual components, combining measurement error and serial correlation.
Survival analysis is an area of statistics that studies the time until a prespecified event of interest occurs.
Let T be a nonnegative random variable representing the time to attrition status of the student. The more optimistic survival function SðtÞ at time t, SðtÞ = PðT > tÞ is defined to be the probability that a randomly selected individual will survive beyond time t. We regard T for the i th respondent and C as the corresponding censoring time. Let δ i = IðT i ≤ C i Þ, where Ið:Þ is an indicator function and takes the value Education Research International The Cox proportional hazard (CPH) model is the most widely used semiparametric survival regression model in which for a set of covariates X i for the i th subject, and β is the ρ × 1 parameter of coefficients; the hazard at time t is expressed as follows: where λ i ðtÞ represents the hazard of attrition for a subject i at time t. λ 0 ðtÞ is a baseline hazard function that describes the risk for individuals with X i = 0: A novel use of joint model, which gains increasing interest in recent years, refers to the statistical analysis of the resulting data while account of any association between the repeated measurement and time-to-event outcomes [16]. Joint modeling of longitudinal and survival data can be formed where the association between the two endpoints is due to shared random effects that means random effects account for both the association between the longitudinal and time-to-event outcomes and the correlation between the repeated measurements in the longitudinal process. This type of joint model is also called a shared parameter model as both processes shared these random effects [17].
The longitudinal and survival components of the joint model are typically linked through the trajectory function. Specifically, the shared random-effect models at time t can be written as follows: where m i ðtÞ represents the history of unobserved longitudinal response up to time t, Ψ i represents the vector of baseline covariates with corresponding parameter estimates γ, and α measures the effect of the longitudinal outcome to the risk of an event; the risk of an event at time t depends on the true value of the longitudinal endpoint at that time.
Parameters in the three models were estimated mainly through the use of maximum likelihood (ML) estimation which is a very general approach to statistical estimation which widely used to handle many difficult estimation problems. Models are compared with based on the value of Akaike's information criterion (AIC), the Bayesian information criterion (BIC), and the likelihood ratio test (LRT) methods for nested model assessment criterion for model selection [16].

Result
In this study, 1290 observations were considered to collect the CGPA from 285 students which was evaluated at fixed time points and measurements. All the CGPA were taken at two semisters of first, second, third, fourth, and fifth which had equal time intervals of 5 consecutive semister CGPA measurements.
Among the total study subjects included in this study, 183 (70.9%) were males. In terms of marital status, about 204 (79.1%) were single, and the remaining 54 (20.9%) were married. When we look at their place of residence, 168 (65.1%) were from rural areas, and 90 (34.9%) were from urban areas. According to the field of study, the student enrolled to the higher education institution, about 34 (13.2%) were in mathematics department, 50 (19.4%) were in chemistry department, 49 (19.0%) were in physics department, 49 (19.0%) were in biology department, 16 (6.2%) were in integrated science, and 60 (23.3%) were in MNS. All students did not enroll with their interest to field study. About 150 (58.1%) students were enrolled to study program without their interest, and 108 (41.9%) students were placed with their interest of study program. From the same result, with regard to the education level of house head, about 114 (44.2%) were illiterate, 86 (33.3%) household head had primary education, about 29 (11.2%) had secondary education status, and 29 (11.2%) were certificate and above. The mean age students enrolled to the Hawassa Teacher's Education College was 19.91 years with standard deviation of 1.99 (19:91 ± 1:99).
The longitudinal response variable, CGPA, was measured from semester one to semester five consecutively. As it can be seen from Table 1, common measurements are used for all respondents at these five semesters; in semester I, there were 258 (100%) students; in semester II, there were 250 (96.9%) students; in semester III, there were about 208 (80.62%) students; in semester IV, there were about 195 (75.58%) students, and there were 190 (73.6%) students in semester V. From this result, we clearly observe that there was a sharply increase in degree of attrition over five consecutive semesters. There are so many reasons for attrition of students from schooling such as academic dismissal, readmission, withdraw from the program due to health problem or financial problem, and dropouts from their batch. Among the students who included in this study, about 190 (73.6%) were retained up to the time duration of the study. The average mean score of CGPA for five semesters was about 2.28 out of 4.00 point with a standard deviation of 0.57 per individual.This is reletively low score as compared to 4.00 point which is the highest score in Ethiopian grading system.
The survival response variable was the length of time from enrollment semester until the semester of attrition or retained. Among the students involved in this study, about 73.6% were retained until 5th semester, and 26.4% attrite were due to academic delay or any other individual cases ( Table 1).
As we observe from Figure 1, the average progress of CGPA for male students was higher than the female students over the five semesters (Figure 1(b)), and the survival probability of male students was greater than that of the female students (Figure 1(a)). These two explanatory analyses revealed that male students were more survived from the program with better CGPA than female students which implies that there were high attrition rates among female students as compared to male students. Good models that best describe the observed average trends and also reflect the observed correlation structures were sought for the data sets. To identify the appropriate covariance structure, we had test three different commonly used covariance structures: compound symmetry (CS), unstructured (UN), and first order autoregressive (AR (1)) could be considered.    Table 2 show that the fitted linear mixed effect model for preliminary final model containing significant main effects and possible interaction effect reveal that sex, high school result, family size, peer support, college entrance result, and house head education level of students were statistically significant (p value < 0.05) while the distance from the college to the student residence does not affect the student CGPA.

Education Research International
To explore the survival process, we assessed each factor through Cox regression model and found that the variable sex, college entrance result, peer support, interest of field of study, and marital status were statistically significant under separate model of survival analysis.
From the result displayed in Table 3, it can be seen that sex, entrance result, level of support, interest of field study, and marital status are statistically significant at 5% level of significance.
The estimates of the parameter in the separate and joint models are quite similar to each other but not identical. "Association" is in fact parameter (α) in equation (4) that measures the effect of m i ðtÞ, where m i ðtÞ represents the history of unobserved longitudinal variable CGPA. The estimated trend of CGPA was provided that CGPA is negatively associated with attrition. This shows that the effect of longitudinal measure of CGPA on the attrition status of the student. The quantification of the effect of this repeatedly measures of CGPA is fundamental to understand the trend and take appropriate intervention mechanism.
In the same result for a one unit increment of entrance result, there were 9.751 unit increase in the average change in the CGPA of students. Therefore, this revealed that higher entrance result reduces the attrition of students from their study because of better entrance result mostly scored by better performing student who will have higher probability of retention to the study program from enrollment up to graduation (Table 4).
Result from Table 5 shows that the highest variability of residuals was from the random intercepts in both the separate linear mixed effect and survival models. It also shows that the variance of the random intercepts was higher than that of the random slopes. The residual variability was smaller in joint analysis (15.3876) compared to the relative linear mixed effect analysis (18.943). Why the variability in joint becomes lower was probably due to the reason of the standard errors were adjusted for the correlation between the responses in the joint model analysis.
3.1. Implication. This study tried to assess the effect of longitudinal measure of the commutative grade point average (CGPA) of students in each semester on the attrition status of the student. The quantification on the effect of this repeatedly measure CGPA is fundamental to understand the trend and take appropriate intervention mechanism. Understanding the effect of each semester's result for student is fundamental for timely intervention in particular, and it gives important baseline information for policy makes in general. Furthermore, identifying particular risk factors is important for student's specific intervention. This study can be the baseline for other further related study in the area of education.

Discussion
This study was focused on the student attrition based on the data obtained from Hawassa College Teacher's Education, students who enrolled in 2009 E.C and attended the first five semesters. In the longitudinal analysis, we used the square  Education Research International root transformation of CGPA measurements to satisfy one of the basic assumption in data analysis supported by [18]. The trend of CGPA was attended, and the average stable level was noted till the end of the study period. CGPA was found evolving differently between female and male respondents based on the result from the two models, (i.e., separate and joint models). The progress level was higher for males as compared to the females. This result also confirms and parallel to the study result obtained by [19].
From this study, the linear mixed effect model result revealed that sex of students, peer support, high school result of students, entrance result, family size, and house head education level were the significant predictors for the CGPA result of students in Hawassa College Teacher's Education.

Education Research International
This finding was agreed with the study result of [20]. The baseline CGPA was shown to be significantly determining the student's retention progression rate. This implies that the higher value of CGPA results is one means of a better retain of students in the study program which is in line with the study findings of [21]. From the plots of survival analysis indicated in Figure 1, male students' retention had slightly higher survival rate than female student's retention. Thus, the survival times are found to be significantly different among male and female students. We assessed each factor through univariate Cox regression model and found the variables sex, entrance result, peer support, interest of study, and marital status were statistically significant; which implies the variables had either positive or negative impact on the retention of students while the other variables income, study time, and place of student were not found to be significant. This finding is consistent with the study findings of [22].
Furthermore, this study also tried to assess the effect of longitudinal measure of the CGPA of students in each semester on the attrition status of the student. The quantification on the effect of this repeatedly measure CGPA is fundamental to understand the trend and take appropriate intervention mechanism.
Moreover, the estimates of the parameters for the separate and joint models are quite similar to each other but not identical. The estimate of the association parameter due to the slope of CGPA is negative (-0.00907), indicating that CGPA result is negatively associated with the risk of attrition of students from semester retention. This indicates that an increasing trend in the CGPA result of students undergoing class handling significantly reduces the risk of attrition of students. This finding was also in agreement with the studies of [22,23] that shows the significance of the shared parameter which links the two processes and the reduction in the standard error of the parameter estimates when compared to independent model estimates. This suggests the need for a joint analysis of this data compared to the use of separate models. The estimated association parameter (α) in the joint model is -0.00907 and statistically significant at 5% level of significant. This indicates that there is strong evidence of association between the effects of the longitudinal outcome to the risk of attrition, implying initial higher values of the CGPA result associated with a better retention of students in favor with the study result of [24]. The residual variability was smaller in joint analysis (15.3876) as compared to the relative linear mixed effect analysis (18.943) that was adjusted for the correlation between the responses.

Conclusion
Several studies were conducted on students' CGPA and attrition status separately. However, there were only few studies conducted using joint model of the two together about students' CGPA and attrition status in education. The mixed effect model was confirmed to be adequate for the prediction of CGPA of students based on the available variable of health determinants. The pattern of mean change in CGPA revealed a linear distribution that decreased over time in the five semesters.
The predictor variables included in this study sex, high school result, family size, peer support, residence distance, entrance exam, and house head educational level were found statistically significant. In the same expression, the covariates sex, entrance exam, peer support, marital status, residence, and interest of field study were found to be significantly associated with time to event. Among the subjects of the study, 73.6% were retained and 26.4% attrite due to academic delays or other individual cases. This study also confirmed strong association between the effects of the  longitudinal outcome to the risk of attrition. The results of both the separate and joint modeling are consistent. However, the use of a joint modeling compared to independent models adjusted for correlation between the responses indicates that more adequate and efficient inferences can be made using joint model estimates. This means that joint modeling can benefit the analysis of the longitudinal measure of CGPA and survival time-to-event outcomes. This study tried to assess the effect of longitudinal measure of the commutative grade point average (CGPA) of students in each semester on the attrition status of the student.

Limitations of the Study.
A limitation exists in the ability to use the findings of this study to make generalizations about use of joint modeling to study of students' CGPA and attrition status in college education. One of the limitations of this study was it was conducted in only one college in a particular location in the Hawassa, Ethiopia. The other limitation of this study was it includes only a batch of 2009 E.C enrolled students. This study was also restricted only on regular students but not included other programs. Lastly, this study takes sample from only natural science students who were more homogenous but not from other colleges, which may not be reliable generalization to be made about attrition in college education in general.