While MOOCs offer educational data on a new scale, many educators find great potential of the big data including detailed activity records of every learner. A learner’s behavior such as if a learner will drop out from the course can be predicted. How to provide an effective, economical, and scalable method to detect cheating on tests such as surrogate exam-taker is a challenging problem. In this paper, we present a grade predicting method that uses student activity features to predict whether a learner may get a certification if he/she takes a test. The method consists of two-step classifications: motivation classification (MC) and grade classification (GC). The MC divides all learners into three groups including certification earning, video watching, and course sampling. The GC then predicts a certification earning learner may or may not obtain a certification. Our experiment shows that the proposed method can fit the classification model at a fine scale and it is possible to find a surrogate exam-taker.
Over the past three years, benefiting from innovative cloud computing technologies, Massive Open Online Courses (MOOCs) bring us many top courses which are provided by top academic institutions. Tens of millions of students from all over the world were attracted to join these courses. This also led to changes of higher education [
MOOCs allow anyone in the world to study a course by accessing course resources or watching videos freely, and the great attraction brought together a large number of learners in a short time. In the largest MOOCs platform Coursera, for example, the current total registered users have exceeded 14 million and the number is still increasing every second.
Unlike traditional online courses which just provide curriculum materials for downloading, MOOCs integrate teaching process into learning. For example, instructors will release teaching video according to a planned schedule, and learners should submit their homework by deadline. After finishing final exams, the course will be closed. After that, registers can only watch archived videos or discuss them in course forum but can not submit assignments any more. For those learners who pass final exams, they may receive a certification from course institutions. Compared with the traditional university course, most of these courses are totally free and open for everyone even if he/she is only a middle school student. Grading every learner is a difficult task because of the huge number of learners involved in the courses and many problems in homework or exam cannot be graded by program automatically. Peer assessment is one option to deal with this problem. However, how to guarantee the justice of every assessment or how to find out surrogate exam-takers is essential for all learners.
Fortunately, in addition to providing excellent learning resources, MOOCs platform also saves detailed activity records of massive learners during the learning process. As the saying “no pain, no gain,” the score of one learner is relative to his/her learning engagement. Based on this rule, the score can be predicted by analyzing his/her learning activities. An abnormal assessment can be detected by comparing the predicted score with final score. However, how to deal with the massive data is still a challenge. The learning behaviors of large numbers of learners including clickstream of video watching are fully tracked in the data. The record of one course is even more than 10 GB [
Some researchers [
This paper aims to find out the difference between different learners in activity features by analyzing learning activities data from MOOCs. The Person-Course Dataset AY2013 contains 16 courses including course information such as course ID, open date, launch date, and learner activities such as video play activity and course forum activity. Six courses are excluded from 16 courses because of insufficient activity data. Table
Courses information.
Course | Semester | Registration open date | Launch date | Wrap date |
---|---|---|---|---|
HealthStat | Fall 2012 | 2012/7/24 | 2012/10/15 | 2013/1/30 |
Circuits-1 | Fall 2012 | 2012/7/24 | 2012/9/5 | 2012/12/25 |
Circuits-2 | Spring 2013 | 2012/12/20 | 2013/3/3 | 2013/7/1 |
Poverty | Spring 2013 | 2012/12/19 | 2013/2/12 | 2013/5/21 |
SSChem-1 | Fall 2012 | 2012/7/24 | 2012/10/9 | 2013/1/15 |
SSChem-2 | Spring 2013 | 2012/12/20 | 2013/2/5 | 2013/6/21 |
CS-1 | Fall 2012 | 2012/7/24 | 2012/9/26 | 2013/1/15 |
CS-2 | Spring 2013 | 2012/12/19 | 2013/2/4 | 2013/6/4 |
Biology | Spring 2013 | 2013/1/30 | 2013/3/5 | 2013/6/6 |
E&M | Spring 2013 | 2013/1/17 | 2013/2/18 | 2013/6/18 |
Although the number of enrolled learners is enormous, only a small partial of them completed the whole course from the beginning to the end and took the final exam. As reported, the average complete rate is less than 10% [
The number of exam-takers and who really get certified.
Due to the great diversity of learners in age, education background, region, motivation, and learning habits, grade predicting is a big challenge in MOOCs. 13 different kinds of motivation are listed in paper [
Aiming at earning a certification, these learners always complete the course to the end and take final exams. Some learners do not take final exams even if they complete the course. This explains that their target is not a certification.
Aiming at acquisition of course knowledge, these learners always have high video playing activities. They may select some of videos which they are interested in to study. If possible, they may submit some homework. Some of them take the course as supplement to their college courses. Some of them will take final exams if possible.
These learners just come to check the content of the course or make sure if the course meets their need. Some of them may watch one or two videos but they may not submit any homework.
Video watching is the most important way of learning in MOOCs. Therefore, we focus on video playback activity of learners. In given dataset, every activity is collected as an event. When a learner logs in to a course or watches a video or posts a message in course forum, this will generate a new event. For comparison, Figure
The number of learners with same activity.
Total events
Video watching activity
Both total event activity and video watching activity show consistent decline in the number of learners. We adopt an exponential function
Curve fitting parameters for total event activity.
Course |
|
|
|
|
RMSE |
|
---|---|---|---|---|---|---|
HealthStat | 11650 | 0.7143 | 400 | 0.028 | 6.976 | 0.9959 |
Circuits-1 | 8109 | 0.5785 | 402 | 0.024 | 6.821 | 0.9956 |
Circuits-2 | 6441 | 0.6945 | 215 | 0.029 | 5.115 | 0.9966 |
Poverty | 7396 | 0.8315 | 287 | 0.029 | 4.306 | 0.9963 |
SSChem-1 | 1864 | 0.7852 | 58 | 0.023 | 2.747 | 0.9981 |
SSChem-2 | 2098 | 0.9321 | 62 | 0.024 | 2.829 | 0.9925 |
CS-1 | 21170 | 0.7453 | 447 | 0.018 | 10.52 | 0.9959 |
CS-2 | 8144 | 0.6076 | 701 | 0.031 | 10.9 | 0.9892 |
Biology | 4932 | 0.8031 | 212 | 0.028 | 4.514 | 0.9931 |
E&M | 4257 | 0.7996 | 263 | 0.019 | 5.922 | 0.9860 |
For all 10 courses,
Figure
Learner’s activities versus grade of course HealthStat.
Total event activities versus grade
Vides played activities versus grade
Course forum activities versus grade
Figure
Average activities of exam-taker and non-exam-taker.
Based on the above analysis, learners can be divided into different categories according to their activities. Learners in category A (certification earning) have highest activities while learners in category C (course sampling) have lowest activities. An activity index value was proposed to measure the engagement of a learner. According to above statistics, if a learner spend more time (days) in one course or with higher event activities especially video playing activities, he/she should obtain a higher activity value, while if a learner enrolled on too many courses, the engagement in one course will be less. After repeated attempts, the activity index
Figure
Number of students of different activity index (Course: CS-1).
Eventually, all learners in a course are divided into three categories. For example, in course CS-1 as shown in Figure
Comparison of the number of three different categories.
For MOOCs, we hope to predict the final grade for learners. This may be helpful for saving instructor’s time or finding surrogate exam-takers. Depending on the activities of learners, their final grade can be predicted before final exams. According to previous classifications, learners with different motivation will lead to different activity features. For example, the video playing activities of a certification earning learner always occurs at a fixed interval while a video watching learner may play videos at random time and not last for the whole course period. Therefore, different groups should be predicted with different models. Group C (
Support Vector Machine (SVM) is a wide accepted supervised learning model with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. A SVM-based model is proposed to classify all certification learners into two classes: a learner may or may not obtain a certification. The prediction problem can be represented as follows.
For
For comparison, different SVM kernels are selected for prediction including linear kernel, poly kernel, and sigmoid kernel. Accuracy was calculated from predicted grade and true grade to measure performance of grade prediction:
Prediction accuracy of different SVM kernels.
Based on their activities in MOOCs, learners are classified into different groups by their motivation. After that, grade prediction is applied to those certification earning learners. Prediction accuracy is improved due to the fact that the parameters of classification model can be tuned in a finer scale to fit more learners. However, if we want to predict specific grade value but not only if a learner will earn a certification or not, the classifier should be resigned to fit more targets classification application such as predicting learners into several levels, for example, to predict a leaner’s grade as five levels such as A, B, C, D, and E. On the other hand, learners may join a course with the motivation to persist for some or the entire course, but various factors, such as attrition or lack of satisfaction, can lead them to disengage or totally drop out. How to deal with the motivation transition is also a problem to be solved in future.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the Fundamental Research Funds for the Central Universities of China under Grant no. N130404004 and the Liaoning Province Science and Technique Foundation under Grant no. 2013217004-1, and the Ministry of Education-Intel Special Research Foundation under Grant no. MOE-INTEL-2012-06.