Abstract:
The proliferation of digital learning platforms has generated vast amounts of student
interaction data, creating opportunities to leverage machine learning for early predic-
tion of student performance and timely intervention for at-risk learners. This thesis
investigates the effectiveness of traditional machine learning algorithms (Random Forest,
Logistic Regression, Decision Tree, KNN) versus sequential deep learning models (LSTM,
GRU, RNN) in forecasting student outcomes, with particular focus on capturing temporal
patterns in learning behaviors.
The models were trained and evaluated on two real-world datasets: the UK OULAD
Dataset and the Chinese TsinghuaX MOOC Dataset, using comprehensive performance
metrics including accuracy, precision, recall, F1-score, and AUC-ROC across different
course timeline stages.
Results show that sequential models, especially GRU, outperform traditional meth-
ods, achieving 93.09% accuracy on the OULAD dataset and 85.84% on the TsinghuaX
dataset, particularly excelling in mid-course predictions. Temporal analysis highlights
that predictive accuracy improves as more sequential data accumulates, emphasizing the
value of temporal modeling in educational data mining.
These findings demonstrate the effectiveness of deep learning methods in modeling
sequential educational data. They also support the development of more accurate early
warning systems and adaptive learning interventions, ultimately enhancing student re-
tention and success in online education environments.