TY - JOUR
T1 - ClickTree
T2 - a tree-based method for predicting math students’ performance based on clickstream data
AU - Rohani, Narges
AU - Rohani, Behnam
AU - Manataki, Areti
N1 - This work was supported by the Medical Research Council [grant number MR/N013166/1].
PY - 2024/10/17
Y1 - 2024/10/17
N2 - The prediction of student performance and the analysis of students’ learning behaviour play an important role in enhancing online courses. By analysing a massive amount of clickstream data that captures student behaviour, educators can gain valuable insights into the factors that influence students’ academic outcomes and identify areas of improvement in courses. In this study, we developed ClickTree, a treebased methodology, to predict student performance in mathematical problems in end-unit assignments based on students’ clickstream data. Utilising extensive clickstream data, we extracted a novel set of features at three levels, including problem level, assignment-level and student-level, and we trained a CatBoost tree to predict whether a student will successfully answer a problem in an end-unit assignment or not. The developed method achieved an Area under the ROC Curve (AUC) of approximately 79% in the Educational Data Mining Cup 2023 and ranked second in the competition. Our results indicate that students who performed well in end-unit assignment problems engaged more with in-unit assignments and answered more problems correctly, while those who struggled had higher tutoring request rates. We also found that students face more difficulties with “check all that apply” types of problems. Moreover, Algebra II was the most difficult subject for students. The proposed method can be utilised to improve students’ learning experiences, and the insights from this study can be integrated into mathematics courses to enhance students’ learning outcomes. The code and implementation is available at https://www.kaggle.com/code/nargesrohani/clicktree/notebook.
AB - The prediction of student performance and the analysis of students’ learning behaviour play an important role in enhancing online courses. By analysing a massive amount of clickstream data that captures student behaviour, educators can gain valuable insights into the factors that influence students’ academic outcomes and identify areas of improvement in courses. In this study, we developed ClickTree, a treebased methodology, to predict student performance in mathematical problems in end-unit assignments based on students’ clickstream data. Utilising extensive clickstream data, we extracted a novel set of features at three levels, including problem level, assignment-level and student-level, and we trained a CatBoost tree to predict whether a student will successfully answer a problem in an end-unit assignment or not. The developed method achieved an Area under the ROC Curve (AUC) of approximately 79% in the Educational Data Mining Cup 2023 and ranked second in the competition. Our results indicate that students who performed well in end-unit assignment problems engaged more with in-unit assignments and answered more problems correctly, while those who struggled had higher tutoring request rates. We also found that students face more difficulties with “check all that apply” types of problems. Moreover, Algebra II was the most difficult subject for students. The proposed method can be utilised to improve students’ learning experiences, and the insights from this study can be integrated into mathematics courses to enhance students’ learning outcomes. The code and implementation is available at https://www.kaggle.com/code/nargesrohani/clicktree/notebook.
KW - Student performance prediction
KW - Educational data mining
KW - Mathematics
KW - Learning behaviour
KW - Learning analytics
U2 - 10.5281/zenodo.13627655
DO - 10.5281/zenodo.13627655
M3 - Article
SN - 2157-2100
VL - 16
SP - 32
EP - 57
JO - Journal of Educational Data Mining
JF - Journal of Educational Data Mining
IS - 2
ER -