Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance

Student performance is crucial to the success of tertiary institutions. Especially, academic achievement is one of the metrics used in rating top-quality universities. Despite the large volume of educational data, accurately predicting student performance becomes more challenging. The main reason for this is the limited research in various machine learning (ML) approaches. Accordingly, educators need to explore effective tools for modelling and assessing student performance while recognizing weaknesses to improve educational outcomes. The existing ML approaches and key features for predicting student performance were investigated in this work. Related studies published between 2015 and 2021 were identified through a systematic search of various online databases. Thirty-nine studies were selected and evaluated. The results showed that six ML models were mainly used: decision tree (DT), artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), linear regression (LinR), and Naive Bayes (NB). Our results also indicated that ANN outperformed other models and had higher accuracy levels. Furthermore, academic, demographic, internal assessment, and family/personal attributes were the most predominant input variables (e.g., predictive features) used for predicting student performance. Our analysis revealed an increasing number of research in this domain and a broad range of ML algorithms applied. At the same time, the extant body of evidence suggested that ML can be beneficial in identifying and improving various academic performance areas.


Introduction
Student academic performance is the most critical indication of educational advancement in any country. Essentially, students' academic achievement is influenced by gender, age, teaching staff, and students' learning. Predicting student academic success has gained a great deal of interest in education. In other words, student performance refers to the extent to which students achieve both immediate and longterm learning objectives [1]. Excellent academic record is an essential factor for a high-quality university based on its rankings. As a result, its ranking improves when an institution has a strong track record and academic achievements. From the student's perspective, maintaining outstanding academic performance increases the possibilities of securing employment, as excellent academic achievement is one of the primary aspects evaluated by employers [2]. e use of information technology (IT) in education can support institutions to achieve an improved educational outcome. For instance, in learning, artificial intelligence (AI) has a wide range of applications. AI-based technologies in education have grown in popularity to attract attention while improving quality and enhancing traditional teaching methods. For example, it facilitates gathering vast amounts of student data from multiple sources such as web-based education system (WBS) and intelligent tutorial system (ITS). Besides, these technological systems can provide data regarding students' grades, academic progress, online activities, and class attendance. Despite this, it is still challenging for educators to effectively apply these techniques to their specific academic problems due to the high volumes of data and rising complexity. As a result, it becomes difficult to accurately assess students' performance [3]. erefore, the data obtained should be examined appropriately to identify factors that predict student success in the future.
Predicting and analyzing student performance are critical to assisting educators in recognizing students' weaknesses while helping them improve their grades. Likewise, students can improve their learning activities, and administrators can improve their operations [3,4]. e timely prediction of student performance allows educators to identify low-performing individuals and intervene early in the learning process to apply the necessary interventions. ML is a novel approach with numerous applications that can make predictions on data [5]. ML techniques in educational data mining aim to model and detect meaningful hidden patterns and useable information from educational contexts [6]. Moreover, in the academic field, the ML approaches are applied to large datasets to represent a wide range of student characteristics as data points. ese strategies can benefit various fields by achieving various goals, including extracting patterns, predicting behavior, or identifying trends [7], which allow educators to deliver the most effective methods for learning and to track and monitor the students' progress.
Our study was mainly motivated due to the lack of systematic and comprehensive surveys to assess the prediction of student academic performance using different ML models. erefore, the main purpose of this work was to survey and summarize the key predictive features and the ML algorithms used to predict students' academic performance. e study's findings support mapping and assessing existing knowledge, research gaps, and future suggestions on further research carried out in this context. e next section focuses on the methodology used in the systematic survey. Section 2 provides a detailed summary of the results, while Section 4 discusses them. Lastly, the conclusion and future work are outlined in Section 5.

Methods and Materials
is work is conducted to assess the main ML algorithms and key attributes in student performance prediction. Several approaches [8][9][10][11][12][13] were followed, along with various strategies and steps proposed by references [10,11] in performing this survey work. ese include (a) formulation of research questions, (b) eligibility criteria, (c) information source/search strategy, and finally (d) study selection.

Research Questions.
Forming the right research question is important to ascertain the key studies that are related to the prediction of student performance. Steps proposed in reference [13] were followed in order to formulate the right research questions (e.g., PICO framework), which represents the population, intervention, context, and outcome. Table 1 summarizes the criteria of research questions.
Accordingly, this work is conducted to answer the following research questions: (i) Q1: What are the key predictive features used in assessing the student performance? (ii) Q2: What are the key ML algorithms used in the prediction of student performance? (iii) Q3: What are the outcomes and accuracies of those ML algorithms?

Eligibility Criteria.
We included studies that were (a) written in English, (b) published between 2015 and 2021, (c) from both conference proceedings and academic journals, (d) directly related to the prediction student performance focusing on ML, and (e) at any educational levels (Table 1). Furthermore, we excluded studies that were (a) not written in English, (b) in a form of traditional, conceptual, and systematic reviews, (c) other artificial intelligence (AI) methods such as deep learning (DL), and finally (d) not having empirical or experimental data.

Information Source and Search Strategy.
A systematic and comprehensive search was performed to address the formulated research questions. For this objective, six online databases were searched in August 2021, including IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, Web of Science, and Google Scholar. A follow-up search was conducted at the beginning of October 2021 to identify any recently published works. We used different terms of keywords, developed by Kitchenham et al. [14], and combined appropriately as follows: "prediction" OR "forecasting" OR "estimation" AND "student performance" OR "student academic performance" OR "academic achievement" OR "academic outcome" AND "machine learning" OR "ML" OR "data mining" OR "educational data mining."

Study Selection.
Two stages were performed for the screening and selection of the studies. Firstly, the selection of studies was based on the title and abstract screening, with regards to the eligibility criteria. Secondly, the selection of studies was based on a full-text assessment (see Figure 1).
We considered studies for full-text evaluation whenever there were any doubts. Disagreements between co-authors were reached by consensus. Furthermore, EndNote X20 software was utilized to remove duplicates and manage all citations.
Our search yielded 1128 papers. After eliminating duplicates, 767 papers remained. Six hundred of them were excluded based on title and abstract screening. e full text of the remaining 102 articles was considered and evaluated. Of these, 58 failed to meet the inclusion and exclusion criteria. e remaining thirty-nine relevant studies were evaluated for this review. Figure 1 illustrates the screening and selection procedures.

Characteristics of the Included Studies.
A total of twentysix articles (66.7%) were published in academic journals, and thirteen articles (33.3%) were published in conference proceedings.
e number of articles has significantly increased in recent years; this indicates that predicting students' performance through ML methods is attracting the attention of various scholars. As shown in Figure 2, most of the included articles were published between 2018 (n � 9, 23%) and 2019 (n � 14, 35%).
According to the authors' affiliation countries, most published research was from India (n = 13, 33.3%), Saudi Arabia (n = 5, 12.8%), Pakistan (n = 4, 10.6%), and the other countries are between 1 and 2 articles (see Figure 3). Notably, over half of the studies (n = 36, 58%) on academic achievement in higher education analyzed data from an individual university.
irty-one percent (n = 14) of the ML methods used in predicting the student performance were artificial neural networks and support vector machine (n = 7, 15%). e remaining articles used decision tree, Naive Bayes, and K-nearest neighbor (n = 6, 13%). Figure 4 represents the distribution of ML approaches used in the prediction. Regarding the classifiers used, most of the selected studies applied only one classifier and did not compare with others methods. Besides, six studies each tested four, three, and two classifiers. e highest number of classifiers used in studies wasten (n = 3). e majority of studies involving ANN mainly used one classifier.
Furthermore, the dataset applied in the studies ranged from 22 ([15]) to 20,000 ( [16]). Especially, five studies ( [17][18][19][20][21]) did not report the number of datasets used in their experiments. In most studies (n = 34), the datasets were divided and applied in both training and testing phases. However, five studies did not report the stages employed in their experiments.

Key Attributes Used in Predicting Student Performance.
We grouped the attributes into seven categories: demographic, academic, internal assessment, communication, behavioral, psychological, and family/personal attributes (see Table 2). e most frequently used attributes were attendance and CGPA, which fall under the academic group. Twenty out of thirty articles have utilized the academic group to predict the performance of the students. is is because CGPA has significant academic potential. e second most used attributes were gender, age, and nationality, which fall under the demographic group. Eighteen out of thirty-nine articles have used demographic attributes such as gender. e rationale behind thisis because male and female students have different learning styles [53]. Various studies have found that female students possess a more optimistic style of learning, positive attitudes, more discipline, and were self-motivated [54,55]. erefore, it is noticeable that gender has more significant influence on academic performance prediction.

Computational Intelligence and Neuroscience
Parent's status, survey, satisfaction, education, and income on the contrary, were the third most frequent attributes used in the prediction. ese attributes fall under family/personal group, which has been used in eleven articles. Table 2shows the remaining attributes by category, name, and frequency.

ML Models Used in Predicting Student Performance.
Accurate predictive modelling can be achieved by several techniques such as regression, classification, and clustering. However, we observed that classification is one of the most popular techniques used in predicting the academic performance. Several methods under a classifier have been used as listed in Table 3. Among these were artificial neural network (ANN), decision tree (DT), support vector machine (SVM), K-nearest neighbor (KNN), Naive Bayes (NB), and linear regression (LinR). e algorithms are highlighted in the subsections.

Decision Tree (DT).
DT is often used due to its clarity and simplicity in discovering and predicting data. Many researchers noted that decision trees are easy to comprehend because they are built on IF-THEN rules [16,61]. DT was used in six studies. e highest accuracy was 98.2% ( [41]), while the lowest accuracy was 66% ( [31]). e accuracy results of DT models are listed in Table 4.

Linear Regression (LinR).
Linear regression defines the relationship of two variables through the data's adaptation of the regression line. As listed in Table 5, all seven articles had an average level of accuracy in predicting the student's performance. e highest accuracy level was 76.2% [51], and the lowest was 50% [48] in using LinR models.

Artificial Neural Networks (ANNs).
e nonlinear and complex interaction between different input and output variables can be solved by using ANNs [62]. Our search yielded fourteen articles that used the ANN approach to predict the academic performance, as shown in Table 6. All ANN models in this work gave good results, with the maximum accuracy of 98.3% [18] and the lowest accuracy of 64.4%.

Naive Bayes (NB)
. Naive Bayes is highly scalable and requires several linear attributes to learn certain problems. We found six articles that applied the NB method in predicting the academic performance. e highest accuracy was 96.9% [49] and the lowest was 65.1% [42]). Table 7shows the accuracy results of NB methods. (KNN). KNN stores and classifies classes based on a certain measure of similarity, such as distance function. As listed in Table 8, all six articles produced a high level of accuracy in predicting the student's performance. Notably, the highest accuracy was 95.8% [50], and the lowest was 69% [42].

Support Vector Machine (SVM)
. SVM is suitable for handling small datasets and has a greater generalization ability compared with other methods. Our search yielded seven articles that used the SVM approach. e maximum accuracy of the seven studies was 91.3% [40], and the lowest accuracy was 66% [20]. Futhermore, the accuracy of SVM is presented in Table 9. Figure 5 illustrates the level of accuracy achieved by each approach in predicting student performance from 2015 to 2021. e maximum level of accuracy was achieved by using ANN models (98.3%).   Computational Intelligence and Neuroscience   Computational Intelligence and Neuroscience e DTon the contrary, produced the second-highest accuracy (98.2%), followed by NB (97%) and KNN Deleted (95.8%). Furthermore, SVM, produced an accuracy of 91.3%. While, LinR had the lowest prediction accuracy compared to other methods (76%).

Discussions
is systematic survey focused on the existing ML techniques and critical variables used in predicting the academic performance of students, as well as the most accurate prediction algorithms. Table 3shows the prediction accuracy using classification methods grouped by algorithms for all selected studies from 2015 to 2021. Based on the data gathered in this work, supervised learning was the most extensively employed technique for predicting student performance, as it produces accurate and consistent findings. e ANN model, for instance, was the most widely applied by various scholars in fourteen studies and delivered the most reliable predictions. Furthermore, SVM, DT, LR, NB, and RF were well-studied algorithmic methods that produced good results. Similar to reference [64], unsupervised learning remains an unappealing approach for researchers, given their low accuracy in predicting students' performance in the current literature.
ANN demonstrated a remarkable accuracy (98.3%) in predicting student performance when combined with critical variables such as CGPA, gender, age, parent status, parent income, and family size. As a result, family status, parent's income, and family size can significantly affect student achievement. e DT is rated second with an average performance accuracy of 98.2%. GPA, grades, and demographics are the factors that led to the highest accuracy in predicting students' success in most of the studies that used DT. It can be concluded that DT can handle both forms of data and perform well in massive datasets, and the relationship between variables is simple to understand [65,66].
NB has a performance accuracy of about 97%. According to these findings, demographic and academic characteristics are the best predictors of students' academic achievements, utilizing this approach. As a result, while using NB to predict student academic success, criteria such as gender, grades, results, and attendance should be addressed. e relevant variables included assignment course/subject and grades,   Computational Intelligence and Neuroscience 7 while KNN had an average accuracy of 95%.
e grade variable appears in ANN and DT as well. When applying Naive Bayes as a prediction method, the attributes used were significant. Furthermore, SVM has a performance accuracy of around 91%. From our analysis, the most appropriate attributes for predicting students' academic achievement using SVM are motivation, personality, learning tactics, and results.
ese criteria are considered significant in determining student academic success.
Finally, the method with the lowest prediction accuracy, with an average of 76%, was linear regression. Even though multiple factors were used in several studies, no significant variableswere identified. Gender, age, and final grades used in LinRstudies were also employed in KNN, DT, ANN, and NB. We presume that age and final grades were significant predictors of student performance.
To sum, prediction accuracy is determined by the traits or features employed throughout the prediction process [2]. As a result, we assume that ANN and DT approaches provided the best prediction accuracy due to the influence of primary qualities. According to earlier research [2], the CGPA factor increased accuracy in forecasting students' performance using the DT approach. Although the work of [15] has demonstrated that additional factors can influence a student's CGPA, more research is needed to identify the factors that substantially impact the CGPA. Academic features were the most commonly used variables, obtaining a score of 81% accuracy. It demonstrates that summative performance criteria such as CGPA, final grades, program, attendance, and topic are essential in forecasting student performance. is varies from a recent review by [64], revealing that GPA scores or ranges were employed less frequently in studies predicting student performance despite its importance.

Conclusions
Student performance prediction can assist educators in identifying student deficiencies towards improving their scores and enhancing learning. is study aimed to look at the latest ML algorithms and variables used to predict student academic performance. In our analysis, we identified 39studies from 2015 to 2021. Accordingly, the study findings showed a considerable rise of studies in this context recently. Furthermore, academics variables (e.g., CGPA and attendance), internal evaluations (e.g., quiz and assignment), demographics (e.g., gender), and family/personal characteristics significantly affect the prediction of students' performance.
Based on performance metrics, we conclude that the KNN classifier is an outstanding predictor of student achievement, followed by the DT technique. Predicting student academic achievement with high accuracy, on the other hand, demands a thorough grasp of the aspects and characteristics influencing student achievement. Given this, it is demonstrated that there are numerous potential areas for improvement in the design of the measurement devices used in instructor performance evaluation. Overall, this is still a developing subject, and future studies are expected to include more algorithms for greater accuracy.
Our analysis suggests that first, a new set of inputs and a more robust and extensive dataset are necessary for greater accuracy. Second, it is suggested that data to be gathered from multiple institutions to combine the environmentdependent qualities are not addressed in the extant literature.
ird, for a more efficient classification technique, improving the ideal selection of qualities is necessary based on their connection. Finally, to thoroughly assess a model's performance, precision and recall need to be measured.

Data Availability
e data supporting this review are from previously reported studies and datasets, which have been cited. e processed data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.