Grade Prediction in Blended Learning Using Multisource Data

,


Introduction
In recent years, online education platforms have increased with the rapid development of Internet technology; and the mixed teaching mode combining offline face-to-face teaching and online learning has also emerged. Mixed teaching, aided with computers, mobile phones, and other electronic equipment, guides students to carry out personalized and adaptive learning according to teachers' preset learning goals and driving tasks. Under the mixed teaching mode, students self-regulate the learning process, and teachers supervise students' learning progress as well as efficiency in real time. Integrating online evaluation with offline evaluation, the students' learning autonomy increases while learning efficiency is guaranteed. At the same time, combining with the analysis of students' online learning behavior trajectory and offline classroom performance, we can infer the students' phased learning situation to adjust the teaching pace in time and further improve the teaching quality. erefore, it is worth studying how to transform students' learning data such as their online learning behavior track and offline classroom performance into their phased learning situation and take corresponding measures to adjust and guide their personalized learning to achieve better teaching effect. e structure of this paper is organized as follows. Section 2 reviews related literature. Section 3 describes the setting of the curriculum and data from the platforms. Section 4 provides a brief introduction to error-correcting output codes (ECOC) and genetic algorithms (GA) and then further illustrates the design and implementation of the TBCGA-ECOC (Ternary Bitwise Calculator based GA for ECOC) algorithm. Section 5 describes the experimental parameters and analyzes the experimental results of various algorithms. Section 6 summarizes the work.

Literature Review
Under the upsurge of mixed teaching, many scholars have conducted researches on related fields. Yousef and Sumner conducted detailed analysis and discussion on more than 200 articles on MOOC. ey proposed possible future research directions according to the main achievements and MOOC research direction in recent years [1]. For emerging ways of education big data, Ang et al. introduced and discussed six aspects including overview and classification of education big data, different data analysis methods, social challenges and technical challenges, technical problems, systematic data sources, data mining, and database [2]. To explore the underlying mechanisms behind the features, Xie conducted a survival analysis of the viewing duration of large-scale open online courses, in which two stochastic differential equations were used to describe the growth of two classes. e research helps in understanding the role of memory in the complexity of learning behavior [3]. Based on SPOC and flipped classes, Xu et al. also launched research on online platforms. ey created experiments on the computer network mixed classes, stating that students' online behavior data can be used to predict their results and the results are more stable and reliable as classes advance [4]. Guided by Moore's trading distance, Hew et al. defined MOOC success as student satisfaction with the course [5]. ey proposed a unique methodology framework (supervised machine learning and sentiment analysis) to examine user review generated datasets and identify specific learners level and course level factors. It can predict MOOC learners' satisfaction and estimate their relative impact. e research extended the theoretical understanding of the factors affecting learner MOOC satisfaction and presented specific opinions for the design of MOOC. To adjust the video curriculum and arrangement of MOOC better, Hu et al. put forward the analysis method of students' video viewing behavior based on the Spark platform and verified it with the data of the cauX platform. e experimental results show that the method can analyze the characteristics of the viewing behavior quickly and accurately [6]. Yu et al. have noted that SPL-based methods can be used to establish a practical MOOC learning analysis framework, which increased the complexity and reusability of learning analysis [7]. Zhang et al. constructed a SPOC-based flipped classroom teaching model based on the OBE concept and verified its effectiveness [8]. In order to build a better MOOC teaching platform and provide theoretical guidance and policy advice for college managers, Wang et al. combined the technology acceptance model (TAM) and plan behavior theory (TPB) to construct a theoretical model to analyze the mechanism behind MOOC learning performance [9]. It forms a scientific basis for the MOOC teaching setting and provides theoretical guidance and policy advice for college managers. Meanwhile Gardner and Brooks classified MOOC studies by predictors, predictions, and underlying theoretical models [10]. en they critically investigated the work of each category and provided data on raw data sources, feature engineering, statistical models, evaluation methods, prediction architectures, and other aspects of these experiments, revealing several critical approach gaps and exploring different features and modeling technology spaces. e results make valuable contributions to construct accuracy, operability, and student success models for theoretical construction. A document clustering model based on weighted word embeddings was developed by Onan and Toçoglu to identify the post problem on MOOC forums [11]. Additionally, Toçoglu and Onan designed a long short-term memory network (LSTM) which can classify around 70,000 MOOC reviews more accurately [12]. ey also investigated the sentiment analysis of Twitter data and student evaluation data [13][14][15].
In addition to the research on MOOC data and teaching mode, many scholars focused on students' behavior models, researched student performance prediction, and explored factors affecting student study. Xu et al. developed a twolayer structure model consisting of multiple underlying predictors and cascading integrated predictors [16]. ey also proposed a data-driven approach based on latent factor models and probability matrix decomposition to discover the curriculum relevance to construct more efficient underlying predictors. It is demonstrated that the proposed method performs the benchmark method. In addition, Ulloa-Cazarez et al. proposed genetic planning (GP) algorithm to predict whether students can pass the final exam for early warning of their learning status [17]. Xu and Yang designed classification algorithms based on support vector machine (SVM) to summarize students' learning motivation through their MOOC activity log to predict whether they can obtain a certificate [18]. Similar work was being done by Meier et al. [19]. ey predicted possible student performance in subsequent learning through historical teaching data from the course and took instructional intervention when necessary. Qiu et al. proposed the latent dynamic factor graph (LadFG) to predict student homework completion and whether students can get a certificate successfully [20]. Based on the MOOC data established by Peking University on Coursera, Zhang et al. for the first time classified and analyzed student behavior characteristics in Chinese MOOC and predicted learning results with three classification models: linear discriminant analysis (LDA), logical regression (LR), and linear nuclear support vector machine (l-SVM) [21]. Yu et al. identified seven cognitive participation models based on students' video clickstream logs. Also, they designed classification algorithms based on K-nearest neighbors (KNN), SVM, and artificial neural networks (ANN) to predict whether students can pass the course exam [22]. Mubarak et al. employed a dual analysis method combining visual and predictive analysis to visualize the data and model the behavior with an RNN-LSTM model, which improved the prediction accuracy of learning performance [23]. In addition, Wen et al. proposed a new simple feature matrix to maintain local correlation information with learning behavior and a novel Convolutional Neural Network (CNN) model to improve the completion rate of MOOC [24]. Hussain et al. analyzed data recorded in digital electronic Education and Design Suite (Contract) using five machine learning algorithms: artificial neural network (ANN), support vector machine (SVM), logical regression, naive Bayesian classifier, and decision tree [25]. ViSeq, an interactive visual analysis system available to visualize the learning order of different groups of learners proposed by Chen et al., helped users explore the learning sequence of MOOC from multiple granularity levels [26]. It includes four-link maps: projection views of learner-identifying groups, pattern views showing the overall sequence patterns within the selected group, sequence views describing transformations between ongoing events, and individual views with an extended sequence chain. Cobo and Ruiz-Garcia presented the edX-LIS Learning Intervention System, which provides intervention strategies to improve learning performance, positively affecting learning motivation, persistence, and participation [27]. García-Molina et al. presented an automatic scoring algorithm based on the data dimension and correlation of learners' contributions [28]. It provides teachers with visual and digital representation for each learner, offering opinions for adjusting the expected behavior in the MOOC forum. In response to the rapid change in knowledge in the IT domain, Chen et al. proposed a framework for in-based expression through decision tree expression to evaluate the relationship between the curriculum and employment [29]. ey designed the algorithm recommended by the course group based on the decision to provide the basis for adjusting the course configuration. e results indicate that the designed algorithms and recommended curriculum groups have a significant role in promoting student employment.
In essence, these studies are the integration of multiple classification problems. ey analyzed the collected data (behavioral trajectory data) with different models or algorithm frameworks and finally offered predicting outcomes. Many studies have also proved the feasibility and effectiveness of relevant algorithms and models in result prediction, which provided a solid theoretical basis for the application and improvement of algorithms. is paper mainly collects students' learning behavior track data and divides their grades into excellent, qualified, and unqualified grades. We add feature selection based on GA and based classifier selection to the ECOC algorithm. We also optimize the ECOC coding matrix, feature subset, and choice of binary classifiers, taking accuracy as the evaluation index. By screening the learning behavior characteristics of raw data in the data preprocessing stage, the multiclassified accuracy can be improved. Next, we get the learning behavior features that influenced student performance to analyze through the finegrained results of feature selection and correlation analysis. Finally, according to the character selection and correlation analysis of fine-grained results, the teaching methods can be improved and the teaching rhythm can be adjusted, which will be helpful in guiding the students' personalized learning and improve their learning performance.

Course Settings.
e datasets for this analysis were collected from a blended course of Java Language Programming in a college for four consecutive years. To make students understand the basic knowledge of Java and master the Java programming technology, this course integrates the characteristics of Java and introduces the programming foundation, object-oriented programming, GUI programming, and so forth; and it is scheduled for 68 class hours. Unlike traditional classes, it adopts the current popular online learning platforms such as rain classroom and MOOC for auxiliary teaching. In addition to participating in conventional offline classes, students also need to use online platforms to preview, review, and complete tests or do some other online learning activities and take the final exam offline in the end.

Data Description.
e course data are the learning track and data records collected from a blended course of Java Language Programming in a college for four consecutive years (the four different grades are represented by grade-1, grade-2, grade-3, and grade-4). e data source consists of four popular online education platforms: MOOC Platform, Rain Classroom, Programming Teaching Assistant (PTA), and Cnblogs.

Data of MOOC.
e MOOC Platform data mainly includes the viewing of 102 teaching videos, the scores of 6 mutual-evaluation homework and 8 chapter tests, the cumulative number of discussions in the current semester, the learning times of each chapter, and the online test results as well as submission time. e data field is summarized in Table 1.

Data of Rain Classroom.
Rain Classroom is an intelligent online teaching platform covering a variety of preclass-classroom-after-class teaching scenes. e collected data mainly include preclass preview situations and classroom performance. Tables 2 and 3, respectively, list the data composition and field description.

Data of Cnblogs.
Cnblogs is a knowledge-sharing community for developers where students can consolidate knowledge and regularly make a summary by writing blogs. At the same time, teachers and teaching assistants can review students' blogs to understand their phased learning results and current situations. e main components of the scores include chapter knowledge summary, PTA exercise score situation, PTA exercise algorithm analysis, code reading, and learning feelings. Figure 2 portrays the composition of the Cnblogs' score.

Data Integration.
In addition to the three categories of data sources, PTA online tests were added to integrating data. When processing the data, we need to align the data from different sources with the student ID as the standard and then combine the data to get four complete datasets. At the same time, the student's grades are divided into three grades, excellent, qualified, and unqualified, for the subsequent grades and grade prediction.
After integration, the pretreatment of the data (filtering, feature conversion) was conducted to obtain the final datasets. Table 4 lists the four datasets.
Scientific Programming 3

The ECOC Framework Based on GA and TBC
e multiple classification problem in the field of machine learning is often divided into multiple double classification problems. Finally, the multiple classifications are accomplished by integrating all the results. is paper focuses on ECOC classification-based algorithm by splitting multiple classification problems and combining multiple two classifiers.

Introduction to ECOC.
ECOC algorithm, originated in the field of digital communication, is an ensemble learning method framework for multiclass classification. However, it is widely used in face recognition, traffic signal recognition, microarray data analysis, and other fields nowadays. In the multiclassification problem, by designing proper coding for each category, the ECOC ensemble system can achieve the error correction function by using the classifier to classify the samples into the correct category [30].   e ECOC algorithm mainly includes three basic steps [31]: encoding, training, and decoding. e process of constructing a coding matrix is called coding. e coding method is roughly divided into two categories: data-dependent and data-independent. e former refers to the model or distribution of the datasets during the coding process, typically including ECOC-ONE, DECOC (Discriminant ECOOC), and Data-Driven ECOC. e data-independent coding process only considers the number of categories. It does not analyze the model and distribution of the data, typically including OVO (One Versus One) [32], OVR (One Versus Rest), OVA (One Versus All), DR (Dense Random) [33], SR (Sparse Random) [34], and other coding methods [35]. e coding matrix consists of "+1," "−1," or "0." Each coding matrix represents a category, and each column represents a binary classifier. Two classifiers train only the non-0 encoding corresponding category data and take "+1," encoding corresponding category training data as positive classes and "−1" corresponding category data as negative classes. Figure 3 portrays a classical oneto-many encoding (One vs All, OVA) for the encoding matrix. e ECOC algorithm is essentially an integration framework that improves model performance by integrating different classifiers and achieves classifier complementarity [36,37]. e process opposite to coding is called decoding. e key to decoding is to determine the final category of the unknown sample based on the decision results of each base classifier. Decoding mainly consists of three types: decoding strategies based on output coding and target coding distance such as Hamming distance decoding, probability-based decoding such as Bayesian criterion, and mode space-based decoding strategy.

Feature Selection Method.
Feature selection is a primary data processing method in machine learning [38,39]. It improves the accuracy of prediction models and constructs  faster and less consuming prediction models and gives the model for better understanding and interpretation by selecting some features from all features for training models, enhancing the generalization ability of the model, highlighting the essence of problems, and simplifying the mathematical model. e origins of GA date back to the early 1960s. GA searches for the optimal solution by simulating a natural evolutionary process. e problem-solving process is converted to the crossover and variation of chromosome genes in biological evolution through computer simulation operation. e algorithm has three basic operators: selection, crossover, and variation. Compared with traditional algorithms, GA is based on biological evolution with good convergence, less computational time, and high robustness in computational accuracy requirements. Meanwhile, the algorithm has scalability and is easy to combine with other algorithms [40].

4.3.
e ECOC Framework Based on GA and TBC. e classical ECOC algorithm takes all the features as the input variables and uses a collection of base classifiers to train and predict the model. In this paper, based on ECOC and GA for performance grade prediction, we design a multiclassification algorithm framework. e algorithm performs the optimization process to exchange their information by the vector operator computing code words between a pair of encoded columns; and it employs a collection of heterogeneous classifiers via GA for better solutions.

Gene.
Gene is a feature in the solution of the problem. In this paper, a gene can guide encoding columns to computation to generate a new encoding column. e gene design mainly includes the index of the two selected matrix columns, the feature selection sequence, and two columns for calculation. Gene structure is illustrated in Figure 4.

4.3.2.
Individual. An individual is consisted of a series of genes. It is a unit that needs to evolve or a solution to the original problem. In this algorithm, a new ECOC coding matrix can finally be generated through an individual by integrating new generated coding columns of genes; and the number of two classifiers determines the number of genes in the individual. e design of the individual is shown in Figure 5.

Operator.
e algorithm has five types of operators. It is mainly used to operate the selected characteristic matrix column according to the chosen operator and generate a new characteristic matrix.

Legality Check.
e ECOC encoding matrix has its unique legitimacy constraints. Illegal cases mainly include duplicate rows, reverse columns, and rows with only 0 symbols. For column levels, it means that the column contains only 0 or contains only +1 or −1 or that +1 or −1 is missing.
It is necessary to conduct legality testing before performing the evaluation operation, correct illegal operations, and discard some illegal columns.

Crossover.
Crossover and mutation are the basic steps in genetic algorithms. e primary purpose of variation is to increase the diversity of the population, thus avoiding local optimality due to gene monopoly after superior individuals. e algorithm requires the exchange of genes or parts between individuals with a certain crossover probability. e process of gene crossing is shown in Figures 6 and 7: select parent individuals to cross first and then exchange the same type of bits in two selected gene individuals.

Mutation.
Mutations are used to maintain and introduce diversity in genetic groups; and the mutation probability determines the number of genes in the  Scientific Programming individuals selected for the mutation. Each gene has a low mutant mutation probability. For the chosen gene, each of them will change randomly with a mutation probability. If the matrix index is selected to change, the original number will be changed within all the indexes of the encoding matrix pool. If the column index in the selection matrix is changed, do so within the number of columns. If the feature selection sequence is selected, some of the bits are selected in the digital sequence for mutation representing the feature selection index mutation of the classification model. If the operator is selected, change it in the five operator index ranges. Figure 8 describes the example of mutation.

Elite Retention.
e algorithm adopts an elite retention strategy to ensure that the classification effect rises during the iteration process and to avoid the optimal individual being destroyed by crossover and mutation. Suppose that the individuals with the worst adaptability in the new generation population are worse than the optimal individuals in the previous generation population. In that case, the worst individuals in the new generation will be replaced with the optimal individuals of the last generation to ensure the upward trend of evolution. In the evolution of genetic algorithms, the optimal individuals appearing to date are not lost or destroyed by the selection, crossover, and variation operations. Elite retention strategies play a significant role in improving the global convergence capabilities of standard genetic algorithms.

Optimization Method of the Binary Classifiers.
e algorithm uses a hybrid classifier that places multiple classifiers into the chromosome and jointly iterates with the population to select the optimal base classifier corresponding to each column of each matrix. Each column is trained with the most optimum base classifier and combines the corresponding matrix column into a new matrix to obtain an optimal matrix. e algorithm is verified by the 5-fold cross and based on ECOC for model training and prediction. e average accuracy is used as the fitness function to measure the performance improvement of the multiclassification problem. In the iterative process of the algorithm, we select the more adaptable feature matrix and base classifier sequence for subsequent iterations. e final individual is the optimal individual. e algorithm's flowchart is shown in Figure 9. All the experiments employ the 5-fold crossover validation method. e average accuracy value is used as the evaluation index of the classification performance and the fitness in the GA.

Comparison with the Classical Ensemble Learning
Algorithms. TBCGA-ECOC proposed in this paper is compared with Random Forest and XGBoost, two classical ensemble learning algorithms in machine learning. As shown in Figure 10, the TBCGA-ECOC significantly improves the accuracy of predicting students' performance compared with the two classic algorithms. erefore, we conclude that the TBCGA-ECOC offers better performance compared to the traditional ensemble learning algorithms.

Comparison with the ECOC Algorithms.
On the basis of the ECOC algorithm, the TBCGA-ECOC algorithm utilizes the GA-based feature selection method and base Scientific Programming classifier selection to improve the prediction performance of the algorithm further. e comparative experimental results of the two classes of algorithms will be offered. Table 5 reports the experimental results on different datasets when the OVO coding method is employed. As for grade-1 dataset: the accuracy of the TBCGA-ECOC algorithm increases about 3.53% and 3.58% over the results of the ECOC (LR) and ECOC (Bayes) algorithms, respectively; and it is pretty close to that obtained from the ECOC (SVM) algorithm. For the other three datasets, the TBCGA-ECOC algorithm results are better than those of ECOC (SVM), ECOC (LR), and ECOC (Bayes) algorithms. Among them, the algorithm results of the grade-2 dataset have increased by about 2.01%, 1.43%, and 7.11% based on the three algorithm results of ECOC (SVM), ECOC (LR), and ECOC (Bayes). e grade-3 dataset's accuracy is enhanced by 0.29%, 3.82%, and 13.15%. As for the grade-4 dataset, it also has an improvement of 1.42%, 2.53%, and 9.85%, respectively. e above results demonstrate that the proposed algorithm outperforms classical ECOC algorithms on all four datasets and shows significant improvement.
Similar results are also observed in experiments by using the DECOC encoding method. As shown in Table 6, the experimental results on different datasets are as follows.
Compared with the three algorithm results of ECOC (SVM), ECOC (LR), and ECOC (Bayes), the overall result of the TBCGA-ECOC algorithm has been improved. For example, the results of the grade-1 dataset are optimized by 1.97%, 4.42%, and 1.23%; and the grade-2 dataset's results are improved by about 3.17%, 3.17%, and 7.66%, respectively, based on other algorithms. For the two datasets, grade-3 and grade-4, the algorithm's results slightly rise on the basis of ECOC (SVM) and ECOC (LR) by 0.29% and 2.64% as well as 0.84% and 2.56%, respectively. Meanwhile, comparing the ECOC (Bayes) algorithm results, there is a significant enhancement, with an increase of 11.97% and 10.43%, respectively. Furthermore, it is observed that the accuracy of the algorithm on the grade-1 dataset is slightly low. e possible reasons are that the characteristic dimensions of the grade-1 dataset are fewer than those of the other three datasets, and the data have fewer dimensions. So the evolution is excessive under the same algorithm and parameters, and the overfitting phenomenon appears. us the accuracy is reduced to some extent. But, overall, TBCGA-ECOC improves to various degrees over ECOC accuracy on all four datasets. ese results conclude that the accuracy obtained from TBCGA-ECOC using different datasets or coding methods is higher than that from ECOC algorithms. us, the TBCGA algorithm based on GA feature selection and classifier traverse has excellent performance.

Comparison of Different Methods considering Prerequisite Courses.
We tried to add the results of the prerequisite course to the dataset as a new feature; and, after data alignment, which can be achieved by eliminating the data missing features of prerequisite courses, we generated three new datasets with prerequisite courses and three without prerequisite courses corresponding to levels grade-2, grade-3, and grade-4.
To begin with, we utilize Random Forest and XGBoost, two traditional ensemble algorithms, to analyze and predict the six datasets with and without prerequisite courses. Figure 11 plots the prediction results.
ere are six cases in total, and the prediction methods with prerequisite courses can obtain better results than those without prerequisite courses in 5 cases of 6 cases. e two algorithms correspond to three grades, with six examples in total. In the five samples, the predicted results of the dataset with prerequisite courses were higher than or equal to those without prerequisite courses.   In addition, the classical ECOC algorithm and TBCGA-ECOC algorithm are tested in the two kinds of datasets, and the results are listed in Figures 12-14.
When adopting the ECOC algorithm with the OVO coding method, three classifiers (SVM, Logi, and Bayes) correspond to three different datasets, for 9 cases in total.
e results of the dataset incorporating the characteristics of prerequisite courses win or flat in 8 samples, as shown in Figure 12. A similar phenomenon is found when using the ECOC algorithm with the DECOC encoding method, with 8 of the 9 samples winning, as shown in Figure 13. Furthermore, when using the TBCGA-ECOC algorithm, the dataset with prerequisite courses can get better results in all the cases, shown in Figure 14.
In general, the algorithm that adds the prerequisite courses as a new feature cannot consistently achieve better results compared to the original algorithm. Still, the performance is usually better than the average performance of the original algorithm, and it has good robustness. erefore, it can be seen that the characteristics of the prerequisite course are helpful to the prediction of grades and have a better classification effect. Moreover, as shown in Figure 15, the prediction accuracy of the TBCGA-ECOC algorithm with the features of prerequisite     dimensions and sample sizes. So the running time is less than that of the grade-4 dataset but more than that of the grade-1 dataset.

Conclusion
In this paper, we propose a novel ECOC multiclassification framework to predict students' grades. We try to add the prerequisite course as the new feature to improve the performance of the algorithm. e findings can be used to evaluate students' performance, providing valuable information for improving teaching in order to optimize individualized teaching, improve student performance, implement instructional interventions, and adjust the pace of teaching when necessary. e main conclusions are as follows: (1) Experimental results show that the proposed TBCGA-ECOC algorithm can obtain higher accuracy compared to several classical ECOC algorithms and the traditional ensemble learning algorithms, i.e., XGBoost and the Random Forest. (2) e prediction performance of various methods has been slightly or significantly improved by adding the grade of the prerequisites courses, i.e., C language and Data Structure, as the new features. On the contrary, the proposed method can provide valuable advice for curriculum and teaching planning. According to the predicted results, we can recommend suitable prerequisites to students.
In future work, we can follow the line of this work. As for the dataset, we can collect other multisource learning track data or add other prerequisite courses. As for the method, the ECOC algorithm can be improved, especially its adaptability with high-dimensional small sample data; or the selection scheme of hybrid binary classifiers can be optimized so that the better binary classifiers can be selected to enhance accuracy.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.