Using Machine Learning Techniques to Predict Learner Drop-out Rate in Higher Educational Institutions

. Recently, students dropping out of school at the tertiary level without prior notice or permission has intrigued deep concern among academic authorities, instructors, and counsellors. It has therefore become necessary to understand factors that lead to high attrition rates among learners and identify at-risk students for urgent academic counselling. In providing a proactive response to learner attrition, the study deployed a machine learning algorithm with high model accuracy to predict students’ drop-out rates and identify dominant attributes that afect learner attrition and retention. An attrition model was built and validated among support vector machine, decision tree, multilayer perceptron, and random forest algorithms. Te machine learning algorithms were tested for accuracy, precision, recall, F-measure, and ROC using the 10-fold and the 5-fold comparative cross-validation techniques. In addition to the cross-validation technique, the chi-square feature selection mechanism was implemented to understand the algorithms’ training time and accuracy. Te random forest emerged as the best-performing algorithm, with an accuracy of 70.98% and 69.74% for the 10-fold and the 5-fold cross-validation implementations, respectively.


Introduction
Students' retention and attrition have become problematic for most higher educational institutions. Globally, the reputation of institutions is tied to enrollment management with a primary focus on reducing the attrition rate [1]. A higher drop-out rate in such institutions indicates a concern and ultimately undermines their global reputation and rankings. According to a 2009 survey by the Organisation for Economic Co-operation and Development (OECD) [2], 31% of students in 19 OECD countries fail to complete their tertiary education. In the OECD report, countries including Hungary, the United States, and New Zealand recorded more than 40% attrition rate, whiles a lower than 24% attrition rate was recorded in Belgium, Denmark, France, Germany, and Japan. Te high attrition rate globally shows the inability of tertiary institutions to keep learners in school until they graduate. Tis global attrition phenomenon is exacerbated further by government policies, institutional culture, and unsavoury student traits [3,4].
Following the global trend, the attrition rate of students in Africa's numerous educational institutions is signifcant. A recent report by the Department of Higher Education and Training (DHET) in 2022 shows that 50-60% of frst-year students in South Africa drop out across tertiary institutions [5]. Nyoroge et al. [6] research on student attrition in Kenya among thirteen private universities shows a drop-out rate of 37%. A fve-year survey of medical students at the Ebonyi State University in Nigeria shows a 7.8% attrition rate during preclinical classes [7]. In Uganda, almost 30% of students who enrol in tertiary institutions never fnish their courses [8]. A similar study by Mwenje and Kasowe [9] in open and distance learning at Zimbabwe Open University shows an attrition rate that exceeds 50%. Even though the studies involving attrition rates in Africa are limited due to the unavailability of data, the concluding research fndings indicate a more complicated educational issue.
Improving learner retention in tertiary institutions requires proactive, predictive analytics instead of a reactive solution after the problem has occurred. Modelling a learner's behaviour and thought processes in understanding high-risk factors that lead to attrition is a priority in the solution domain. Students face varying issues from family, fnances, relationships, and studies that can easily result in adverse decisions even as they strive to complete their education with good futuristic employment chances. A university graduate's skill set includes critical thinking, creativity, collaboration, information literacy, leadership, technology, and knowledge, which are part of 21 st century skills [10]. Tese learner skills are acquired in the classroom to meet the demands of Industry 4.0 [11]. Te consequences of dropping out of school are severe and include reduced employment chances, social stigma, poorer pay, increased crime, and extreme suicidal thoughts [12].
Te advent of machine learning (ML) has become a catalyst for analytics and growth in varying domains. In healthcare, ML is currently used to predict the life expectancy of patients with personalised treatment based on health records and family medical history [13]. Te boost in e-commerce comes with transactional fraud. Te e-commerce sector has seen the application of ML in detecting suspicious transactions with real-time analytics that triggers automatic rejection when unusual patterns are discovered [14,15]. Loan approval at the banks has seen the application of ML in detecting high-risk applicants and fraudulent paper documents [16]. In the retail industry, ML is presently employed as chatbots that perform scripted functions and leverage natural language processing for customised conversational discussions [17]. Intelligent transportation systems (ITS) have seen the integration of ML for price calculation, ridesharing, ride surge demand locations, and trafc pattern detection [18]. Te ITS has seen signifcant traction recently and remains a key enable in tomorrow's smart cities. Te social media environment has a tremendous deployment of ML in engaging billions of users. Te application domain of ML in social media varies from detecting new friends, personalised news feed, and targeted adverts [19].
Te educational sector has recently taken a positive trajectory in the application of ML for data-driven decision-making. Educational Data Mining (EDM) is a ML niche that identifes hidden patterns in educational data [20]. Te Covid-19 pandemic has expanded the academic usage of the Internet. In addition to the traditional faceto-face classroom, massive databases of student-generated educational data have been produced. Te most prevalent research in EDM involves students' academic performance prediction, learner assessment modelling, smart tutoring systems, learner attrition modelling, and behaviour modelling [20]. Even though attrition rates in Ghanaian Universities are on the rise [21,22], implementing a machine learning model for students' drop-out detection is understudied. Based on the above research problem identifed, the study is conducted.
Te study's objective is to identify dominant factors that can increase the attrition thoughts among learners and predict future drop-out thoughts. In line with the objectives, the following research questions (RQs) guide the study: RQ1. What are the dominant attributes likely to cause drop-out thoughts among students? RQ2. Which classifcation algorithm has the highest accuracy in predicting learner attrition thoughts? RQ3. To what extent has the chi-square feature selection technique improved the algorithm's accuracy and training time in research question 2?
Te main contributions of the study are as follows: (1) Comparison between the 10-fold and the 5-fold cross-validation techniques in building a classifcation model for students' drop-out prediction. (2) Implementing the Chi-Square feature selection mechanism to examine the model's accuracy and training time.
Te rest of the paper is organised as follows: Section 2 discusses related literature. Section 3 examines the methodological procedure, data, and algorithms. Section 4 analysis the results and fndings from the classifcation. Section 5 discusses the fndings and compares them to the literature. In Section 6, we conclude by summarising the study.

Review of Literature
Te review section discusses research on student attrition and retention using machine learning algorithms. Te specifcs of the review include the data sample, machine learning algorithms, accuracy enhancement, and the fndings.
Te frst aspect of literature relates to students' attrition modelling when engaging in online courses. Te massive open online courses (MOOCs) are open and available online courses with a diverse variety of topics that expose the learner to relevant educational content. Since the classes are online, the drop-out rates are extremely high [23,24].
Al-Shabandar et al. [25] deployed a machine learning model to detect at-risk students in danger of early withdrawal from an online course. Te study focused on learner engagement levels and motivational attributes that cause students to withdraw from MOOCs. Five machine learning algorithms, including Random Forest (RF), generalised linear model (GLM), gradient boosting machine (GBM), MNET1, and MNET2, were applied to online data from Harvard University, Massachusetts University, and the Open University in building the ideal classifer. Te training results show that the MNET1 algorithm has the highest accuracy of 91.57% for full and reduced set features.
Xing and Du [26] built a machine learning model using a deep learning algorithm to predict the retention probability of learners at risk in MOOCs. Data from 3,617 students under varying MOOCs activities, including access to courses, forums, quizzes, module pages, announcements, assignments, and grade books, were used as the main attributes. Te drop-out week, which indicates the week learners abandoned the course, is used as the class label in building the classifer. In creating the model, the deep learning algorithm was compared to the k-nearest neighbor (KNN), support vector machine (SVM), and decision tree (DT). Te data set was divided 70/30 for training and testing, and the 10-fold cross-validation method was used to avoid model overftting. Te deep learning algorithm has classifcation accuracy with an average of 95.8% compared to 94.6% for KNN, 93.7% for SVM, and 96.7% for DT. Even though DT has the highest classifcation accuracy, the deep learning algorithm has a stable test data accuracy of 93.0% compared to 91.5% for DT. Te results gave the deep learning algorithm more stability in building the classifer for future prediction of drop-out students.
Figueroa-Canas and Sancho-Vinuesa [27] implemented the tree-based classifcation models on 197 learners who have enrolled in an online course in Computer Engineering at the Universitat Oberta de Catalunya. Te study aims to identify failures and drop-out-prone at-risk students halfway through the semester. Te class label for the data set is based on the availability of the learner during the compulsory fnal exams, and the defnition includes two classes, a drop-out and a completer. Te conditional tree model with random undersampling is applied to the data set to eliminate bias toward the majority class and increase the classifer's accuracy. Even though only the DT classifcation algorithm was implemented, the F-measure performance compared with existing literature using similar attributes shows a 76.3% score after implementing the 5-fold cross-validation technique on the model. Sun et al. [28] compared the recursive neural network with GRU units (GRU-RNN) algorithm to XGBoost, Gradient Boosting Decision Tree (GBDT), and the RF base algorithms to predict the attrition rate of learners in a MOOC course. Te training data for the study consists of 10278 learners, whiles the test data has 2568 students. Te study utilised the maximum input sequence feature in the RNN model and tested the max-length of 500 and 1000 RNN against the base algorithms. As increasing weekly data samples are trained with the proposed GRU-RNN algorithm to the compared base algorithms, the RNN classifcation accuracy increases. Te 1000 GRU-RNN has a signifcant performance accuracy compared to the 500 GRU-RNN and the other base algorithms.
Te second aspect of literature involves the traditional tertiary institution when teaching and learning occur in person. Te discussion of the literature on the conventional campus involves factors that afect learner attrition in the classroom and on campus.
Solis et al. [29] analysed the accuracy of RF, NN, SVM, and Logistic Regression (LOGIC) on 80,527 records of students from the Instituto Technologico de Costa Rica (ITCR) University. Te frst class variable are drop-outs who have not graduated and have not enrolled in school for two years. Te second class variable are active students who have graduated successfully. After implementing the 5-fold crossvalidation technique across all the algorithms, the sensitivity, kappa, and true positive results were analysed. Te RF emerged as the best classifer with a true positive percentage of 94% and a sensitivity of 93%. Te kappa statistics of RF has a signifcant value of 0.85 compared to 0.84 for SVM, 0.84 for NN, and 0.84 for LOGIC.
Lee and Chung [30] trained the RF, boosted decision tree (BDT), RF with synthetic minority oversampling techniques (SMOTE), and BDT with SMOTE classifers on 165,715 data samples obtained from the National Educational Information System (NEIS) in South Korea. Te study aims to compare the sensitivity results of the classifcation algorithms since it represents the fraction of actual drop-out learners correctly predicted. From the 165,715 data instances, 1348 students were identifed as drop-outs based on primary negative reasons, including poor academic scores, school rule violation, strict rules in school, committee requests for expulsion, and relationships with teachers and friends. A split ratio of 80% to 20% for training and testing datasets was implemented to evaluate the classifers during preprocessing. Te classifcation results show that the BDT algorithm with the area under the ROC Curve (AUC) value of 0.898 outperformed other algorithms and was utilised as the model for detecting early attrition among students.
Kemper et al. [31] compared the logistics regression (LR) and the DT algorithms to predict the drop-out of 3,176 data samples from the Karlsruhe Institute of Technology (KIT). Te underrepresented minority class of 620 drop-outs against 2556 successful graduation created a biased classifcation scenario that was solved using the SMOTE technique. Te 10-fold cross-validation, stopping, and pruning techniques were implemented to avoid over and underftting of the classifcation model. Te DT algorithm has the highest performance accuracy compared to the logistic regression algorithm.
Palacios et al. [32] separately predicted student retention among frst, second, and third year students using DT, LR, RF, SVM, naïve Bayes (NB), and kNN classifcation algorithms on 6656 data samples from the Catholic University of Maule. Te features for modelling the classifer were sought under dominant attributes, including university performance, high school performance, fnancial indicators, socioeconomic index, geographic origin, and demographic background. After implementing the 10-fold cross-validation and SMOTE, a predictive model for frst, second, and third year students was analysed. For the frst-year model results, the RF algorithm ranked superior as the highestperforming algorithm with an F-Measure score of 0.947 compared to a 0.910 score for the DT algorithm. For the second-year students, RF has an increased F-Measure value of 0.975 compared to 0.966 of KNN. RF increased again in F-Measure score to 0.984 but levelled in performance with KNN for third year students.
Perez et al. [33] compared DT, LR, and NB classifcation algorithms using 802 instances of data in modelling the retention rate of learners at a private university in Bogota, Columbia. Te attributes for data collection included minimum demographic, expected graduation date, accessible fnancial aids, and ofcial transcript records. A dropout class type is determined by the failure to complete an undergraduate degree within six years after the start day of enrollment. Te experimental results show a higher score for Mobile Information Systems 3 DT with an AUC value of 0.94 compared to 0.92 for 0.92 LR and 0.87 for NB. Hegde and Prageeth [34] implemented the NB classifcation algorithm in R language to predict the retention of students using 24 attributes after feature selection. Te attributes were divided into demographic, psychological, academic performance, social media usage, and social integration on campus. Te class label displays a survey response on whether the student will prefer to continue the course or drop-out based on personal developments in school over the 24 attributes. Even though the result of the NB classifer was not compared, the model has a signifcant accuracy of 72%. Table 1 summarises the reviewed literature and highlights the limitations of previous studies in which feature selection mechanisms were not used. Furthermore, the 10fold cross-validation was not compared to the 5-fold but was implemented in isolation.

Methodology
As illustrated in Figure 1, the study modifed the Cross Industry Standard Process Mining (CRISP-DM) methodology [35] by replacing the business understanding with the problem defnition and inculcating the classifcation algorithms into the diagram. Te CRISP-DM is a cyclical process in a data mining project that consists of business understanding, data understanding, data preparation, modelling, evaluation, and deployment stages. Te CRISP-DM approach is primarily centred around a big data engine with attributes and tuples.

Students' Data.
Te research data was sorted from students at the south campus of the University of Education, Winneba. Data from students in the ICT Education, Biology Education, Integrated Science Education, and Math Education departments was explicitly collected. Te study utilised the convenient sampling approach, a nonprobability method for data collection. Te convenient sampling method was adopted due to the respondents' immediate availability and accessibility at the university's south campus. Google form was administered to students from year one to year three from the mentioned departments, and the research objectives were clearly stated. In addition, the respondents were mandated to agree to an ethics consent form before flling out the questionnaire. Troughout the CRISP-DM process, the nondisclosure and privacy of respondents' data were adhered to strictly. In adhering to the confdentiality of data, no information in the questionnaire could be traced back to the respondents. A total of 1239 responses were received under the personal and family biodata, the senior high school (SHS) tracker, the university tracker, and the decision tracker.

Attribute Selection based on Student's Attrition
Problem. Te attribute selection for responses was linked closely to student attrition. Te attributes utilised were based on factors that could infuence learners' drop-out thoughts from a programme. As shown in Table 2, the attributes were grouped based on similarity in family traces and academic paradigms of the respondent. Table 2, was collected under four major sections with twenty-three attributes. Under the personal and family biodata, 65.1% of the respondents are males, whiles 34.9% are females. Precisely 93.9% of the respondents have siblings, with 77.3% of parents being accommodating. Under the SHS tracker, 84.6% of the respondents attended mixed schools, with 70.8% boarding status. According to the university tracker, 78.8% of lecturers are accommodating and mostly encourage their students during lesson periods. About 83.9% of learners strongly suggest that most lecturers should adopt new teaching strategies and expect more advanced facilities on campus for progressive teaching and learning. Te responses also show that getting accommodation on campus is difcult, and paying fees every academic year has become fnancially strenuous. Te data reveal that 21.8% of the students have an excellent cumulative grade point average (CGPA), 71.4% have a good CGPA, and 6.8% have a poor CGPA. Te data also shows that 40.4% of the respondents are in level 100, 40% in level 200, and 19.5% in level 300.

Data Preparation.
In the data preparation and cleaning phase, unrelated data inconsistent with missing values are removed prior to classifcation. In building the classifer, a total of 1,239 responses for modelling was utilised. Te data set was composed of 100% valid data, making it an optimal data set for the classifcation model. Te decision tracker, which represents the class label, has two values, "Yes, I want to quit" and "No, I will never quit." Te "Yes, I want to quit" class category are learners who have thought about dropping out of school based on difculties. On the other hand, "No, I will never quit" class category are students who have never thought about stopping school, no matter the problem.

Classifcation Algorithms.
Te support vector machine (SVM) algorithm identifes a hyperplane that uniquely classifes the data points in an N-dimensional space. Te hyperplane in SVM is a decision boundary that segregates the data set into classes using vectors. Te SVM algorithm is one of the best-performing classifcation algorithms compared to other algorithms in building multiple applications [36,37].
Te Random Forest (RF) algorithm [38] is an ensemble of the decision tree (DT) algorithm and is trained using the bagging method to increase classifcation accuracy. In the RF algorithm, the right and wrong class for classifcation is determined using the margin function. In addition to classifcation, the RL algorithm has seen immense deployment in solving regression-related problems.
Multilayer Perceptron (MLP) [39] is a neural network with input, hidden, and output layers. MLP learns a relationship between linear and nonlinear datasets as part of the feed-forward neural network functionality. Te MLP also uses the backpropagation technique to minimise the cost function by iteratively adjusting the weights in the network.
Te decision tree (DT) algorithm [40] is a tree-structured classifer with branches representing decision rules that internodes the dataset's features. In a DT algorithm, the leaf node represents the class label outcomes with no additional segregation. Using the attribute selection measure, the DT algorithm recursively generates tree nodes until a leaf node is reached.

Results and Analysis
In the simulation of the results, the Waikato Environment for Knowledge Analysis (WEKA) software, which ofers a range of machine learning algorithms, is utilised in building a predictive model for the data set. Te WEKA stable version 3.8.6 with classifcation and regression extensions was installed via the package manager.

Dominant Attributes.
To respond to RQ1, the SMOTE instance supervised flter in WEKA was frst applied to the minority class label to increase the instances by 50%, as shown in Figure 2. Te SMOTE oversampling technique addresses the concern of data imbalances and prevents overftting. Applying the SMOTE technique increased the minority class signifcantly and created a balanced data set for classifcation.
Te feature selection mechanism in machine learning orders attributes based on a correlation score between the attributes and the class label. Te data set instances for the study have nominal inputs with a nominal class label. Te chi-square [41] feature selection methods have seen outstanding results for nominal input and output attribute instances. Table 3 depicts the top ten dominant attributes that afect learner drop-out thoughts using the chi-square attribute evaluator and the ranker search method.
Te results indicate a strong infuence of lecturers' encouragement, SHS counselling, the fexibility of lecturers, CGPA, accommodation difculty, adoption of new learning strategies, fee payment difculty, guardian education, and participation during the lesson as dominant attributes. Te top three attributes, as shown in Table 3, are lecturers' encouragement during lesson time (correlation score of 33.18), SHS guidance on the program to select at the university (correlation score of 32.94), and lecturers' strictness or fexibility (correlation score of 32.82).

Classifcation Accuracy.
In determining the best classifcation algorithm which answers RQ2, the SVM, RF, DT, and MLP supervised learning algorithms were used to model the dataset in WEKA. Comparatively, the 10-fold and 5-fold cross-validation techniques were implemented to ascertain the performance of the classifcation algorithms. Te k-fold cross-validation technique divides a dataset based on the number of folds with an iterative division between the training and test data. As depicted in Table 4 and Figure 3, the 10-fold cross-validation signifcantly performs better than the 5-fold cross-validation technique. Te Random Forest algorithm has the highest accuracy of 70.98%  In analysing the confusion matrix to validate the performance of the classifcation algorithms, the precision, recall, f-measure, and receiver operating characteristics (ROC) curve results of the supervised learning algorithm were analysed. As depicted in Table 5, the RF algorithm maintained dominance with a precision of 0.708 and a ROC value of 0.771 compared to the DT algorithm, which has a precision value of 0.649 and 0.653 ROC value. Te ROC curve indicates the true positive (TP) and false positive (FP) measure of the classifcation model based on the actual and the predicted class of the confusion matrix. A ROC curve value closer to 1 shows a good measure of separability between the positive and negative classes. As shown in Table 5, the RF ROC value of 0.771 is the highest and indicates that the model built with the RL algorithm has a 77.1% chance of correctly distinguishing between "Yes, I want to quit" and "No, I will never quit" class label among learners.

Classifcation Accuracy with Chi-Square Feature Selection.
In response to RQ3, the feature selection mechanism is primarily implemented to remove weaker attributes and maintain more vital features to improve classifcation accuracy. As already depicted in Table 3, the chi-square feature mechanism is adopted, and the top ten relevant attributes are   Mobile Information Systems maintained for classifcation. After implementing the 10fold cross-validation on the chi-square data, the classifcation algorithms' accuracy decreased across all the algorithms, as indicated in Table 6 but slightly improved the model's training time, as shown in Table 7

Discussion of Findings
Student retention modelling in higher educational institutions using a machine learning approach is primarily determined by the attributes for prediction and the algorithms implemented. Te study's fndings using the chi-square feature selection mechanism listed lecturers' encouragement, SHS counselling, and lecturers' accessibility and fexibility as the most relevant attributes. Te feature selection mechanism help identify attributes that have a high correlation to learner drop-out thoughts. Te features that rank high after applying the appropriate feature selection techniques afect the class label signifcantly. Feature selection application on data attributes is suitable for the academic counselling of students since counsellors will identify signifcant features likely to cause attrition or retention among learners. Developing a highly accurate predictive model for student attrition is based on the supervised learning algorithms deployed. Te performance of classifcation algorithms is linked closely to the type of dataset. Te literature reviewed by Solis et al. [29] and Palacios et al. [32] compared the RF method to various classifcation algorithms, and the classifcation accuracy results show RF as the best-performing algorithm. Te RF in this study also emerged as the best-performing algorithm for the dataset with a 70.98% accuracy using the 10-fold cross-validation technique. Te DT algorithm ranked as the second bestperforming algorithm with an accuracy of 65.03%, while the SVM's 62.60% was the worst-performing algorithm. Te     results from research question 3 also indicate that feature selection implementation decreased the classifcation algorithms' accuracy but with a better model training time. In this research, the feature selection mechanism restricted to the top ten performing attributes did not increase the algorithm's accuracy. For the dataset, classifcation accuracy increased with larger sample sizes and attributes.

. Conclusion
Tis research focused on three aspects: (i) the use of feature mechanism to list signifcant attributes for learner attrition modelling; (ii) building a learner attrition predictive model using a classifcation algorithm with the highest accuracy; and (iii) understanding the impact of the feature selection method chi-square on the accuracy of the algorithm. Since learner attrition has become problematic to academic authorities, implementing the model for future prediction will help identify learners with attrition thoughts for immediate academic counselling. Te model primarily forms the bases for future prediction of learner attrition thought among students at the University of Education, Winneba, Ghana. Given a test data, the RF-based model has a 77.1% chance of separability between the two class labels, "Yes, I want to quit" and "No, I will never quit." In building the model, the RL, SVM, MLP, and DT algorithms were compared using the 10-fold and the 5-fold cross-validation techniques. Before comparing the algorithms, the SMOTE technique was utilised during the data preprocessing stage to increase the minority class by 50% for a more balanced dataset. Te chi-square feature selection mechanism was also utilised to sort relevance attributes with a high correlation value to the class label. Among the supervised algorithms compared, the RF algorithm performed best with an accuracy of 70.98% and 69.74% for the 10-fold and the 5-fold cross-validation implementations, respectively. Te precision, recall, F-measure, and ROC results also indicate the RF algorithm's dominance compared to other tested algorithms.

Data Availability
Data available on request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.