Main Article Content

Abstract

The dropout rates in the European countries is one of the major issues to be faced in a near future as stated in the Europe 2020 strategy. In 2017, an average of 10.6% of young people (aged 18-24) in the EU-28 were early leavers from education and training according to Eurostat’s statistics. The main aim of this review is to identify studies which uses educational data mining techniques to predict university dropout in traditional courses. In Scopus and Web of Science (WoS) catalogues, we identified 241 studies related to this topic from which we selected 73, focusing on what data mining techniques are used for predicting university dropout. We identified 6 data mining classification techniques, 53 data mining algorithms and 14 data mining tools.

Keywords

Learning Analytics Data Mining

Article Details

How to Cite
Agrusti, F., Bonavolontà, G., & Mezzini, M. (2019). University Dropout Prediction through Educational Data Mining Techniques: A Systematic Review. Journal of E-Learning and Knowledge Society, 15(3), 161-182. https://doi.org/10.20368/1971-8829/1135017

References

  1. Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61–75.
  2. Adil, M., Tahir, F., & Maqsood, S. (2018). Predictive Analysis for Student Retention by Using Neuro-Fuzzy Algorithm. In 2018 10th Computer Science and Electronic Engineering Conference (ceec) (pp. 41–45).
  3. Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. 2017 Fourth International Conference on Image Information Processing (ICIIP), 1–4. https://doi.org/10.1109/ICIIP.2017.8313763
  4. Al-shargabi, A. A., & Nusari, A. N. (2010). Discovering Vital Patterns From UST Students Data by Applying Data Mining Techniques. In V. Mahadevan & Z. Jianhong (Eds.), 2010 2nd International Conference on Computer and Automation Engineering (iccae 2010), Vol 2 (pp. 547–551).
  5. Alban, M., & Mauricio, D. (2019). Neural networks to predict dropout at the universities. International Journal of Machine Learning and Computing, 9(2), 149–153. https://doi.org/10.18178/ijmlc.2019.9.2.779
  6. Alban, M., & Mauricio, D. (2019). Predicting University Dropout through Data Mining: A systematic Literature. Indian Journal of Science and Technology, 12(4), 1–12. https://doi.org/10.17485/ijst/2019/v12i4/139729
  7. Astin, A. W. (1971). Predicting academic performance in college: Selectivity data for 2300 American colleges.
  8. Bala, M., & Ojha, D. D. B. (2012). STUDY OF APPLICATIONS OF DATA MINING TECHNIQUES IN EDUCATION. Vol. No., (1), 10.
  9. Bean, J. P. (1990). Using retention research in enrollment management. The Strategic Management of College Enrollments, 170–185.
  10. Bernardo, A., Cervero, A., Esteban, M., Tuero, E., Casanova, J. R., & Almeida, L. S. (2017). Freshmen program withdrawal: Types and recommendations. Frontiers in Psychology, 8(SEP). https://doi.org/10.3389/fpsyg.2017.01544
  11. Burova, S., Meyer, D., Doube, W., & Apputhurai, P. (2014). Predicting Undergraduate Onsite Student Withdrawals Based On Enrolment, Progress, And Online Student Data (T. Loster & T. Pavelka, Eds.).
  12. Castro R., L. F., Espitia P., E., & Montilla, A. F. (2018). Applying CRISP-DM in a KDD process for the analysis of student attrition. Communications in Computer and Information Science, 885, 386–401. https://doi.org/10.1007/978-3-319-98998-3_30
  13. Costa, E. B., Fonseca, B., Santana, M. A., de Araujo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. https://doi.org/10.1016/j.chb.2017.01.047
  14. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4), 303–314. https://doi.org/10.1007/BF02551274
  15. Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting students drop out: A case study. 41–50. Retrieved from Scopus.
  16. Delen, D. (2011). Predicting student attrition with data mining methods. Journal of College Student Retention: Research, Theory and Practice, 13(1), 17–35. https://doi.org/10.2190/CS.13.1.b
  17. Delen, D., Topuz, K., & Eryarsoy, E. (2019). Development of a Bayesian Belief Network-based DSS for predicting and understanding freshmen student attrition. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2019.03.037
  18. Delen, Dursun. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003
  19. Dharmawan, T., Ginardi, H., & Munif, A. (2018). Dropout Detection Using Non-Academic Data. Presented at the Proceedings - 2018 4th International Conference on Science and Technology, ICST 2018. https://doi.org/10.1109/ICSTC.2018.8528619
  20. Gopalakrishnan, A., Kased, R., Yang, H., Love, M. B., Graterol, C., & Shada, A. (2017). A Multifaceted Data Mining Approach to Understanding what Factors Lead College Students to Persist and Graduate.
  21. Guner, N., Yaldir, A., Gunduz, G., Comak, E., Tokat, S., & Iplikci, S. (2014). Predicting Academically At-Risk Engineering Students: A Soft Computing Application. Acta Polytechnica Hungarica, 11(5), 199–216.
  22. Gustian, D., & Hundayani, R. D. (2017). Combination of AHP Method With C4.5 in The Level Classification Level Out Students.
  23. Hagedorn, L. S. (2005). How to define retention. College Student Retention Formula for Student Success, 90–105.
  24. Hasbun, T., Araya, A., & Villalon, J. (2016). Extracurricular activities as dropout prediction factors in higher education using decision trees. In J. M. Spector, C. C. Tsai, D. G. Sampson, Kinshuk, R. Huang, N. S. Chen, & P. Resta (Eds.), 2016 Ieee 16th International Conference on Advanced Learning Technologies (icalt) (pp. 242–244).
  25. Hegde, V., & Prageeth, P. P. (2018). Higher Education Student Dropout Prediction and Analysis through Educational Data Mining.
  26. Hoffait, A.-S., & Schyns, M. (2017). Early detection of university students with potential difficulties. Decision Support Systems, 101, 1–11. https://doi.org/10.1016/j.dss.2017.05.003
  27. Iam-On, N., & Boongoen, T. (2014). Using Cluster Ensemble to Improve Classification of Student Dropout in Thai University. In 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (scis) and 15th International Symposium on Advanced Intelligent Systems (isis) (pp. 452–457).
  28. Iam-On, N., & Boongoen, T. (2017). Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. International Journal of Machine Learning and Cybernetics, 8(2), 497–510. https://doi.org/10.1007/s13042-015-0341-x
  29. Karamouzis, S. T., & Vrettos, A. (2008). An Artificial Neural Network for Predicting Student Graduation Outcomes. In Wcecs 2008: World Congress on Engineering and Computer Science (pp. 991–994).
  30. Khasanah, A. U., & Harwati. (2017). A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques. 215. https://doi.org/10.1088/1757-899X/215/1/012036
  31. Kitchenham, B. (2004). Procedures for Performing Systematic Reviews (Keele University. Technical Report TR/SE-0401).
  32. Koedinger, K. R., D’Mello, S., McLaughlin, E. A., Pardos, Z. A., & Rosé, C. P. (2015). Data mining and education: Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), 333–353. https://doi.org/10.1002/wcs.1350
  33. Kondo, N., Okubo, M., & Hatanaka, T. (2017). Early Detection of At-Risk Students Using Machine Learning Based on LMS Log Data. 198–201. https://doi.org/10.1109/IIAI-AAI.2017.51
  34. Lacave, C., & Molina, A. I. (2018). Using Bayesian Networks for Learning Analytics in Engineering Education: A Case Study on Computer Science Dropout at UCLM. International Journal of Engineering Education, 34(3), 879–894.
  35. Lacave, C., Molina, A. I., & Cruz-Lemus, J. A. (2018). Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behaviour & Information Technology, 37(10–11), 993–1007. https://doi.org/10.1080/0144929X.2018.1485053
  36. M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt and B. Scholkopf (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13, 18-28. doi: 10.1109/5254.708428
  37. Malvestuto, F. M., Mezzini, M., & Moscarini, M. (2011). Computing simple-path convex hulls in hypergraphs. Information Processing Letters, 111(5), 231–234. https://doi.org/10.1016/j.ipl.2010.11.026
  38. Manhães, L. M. B., Da Cruz, S. M. S., & Zimbrão, G. (2014a). The impact of high dropout rates in a large public brazilian university a quantitative approach using educational data mining. 3, 124–129. Retrieved from Scopus.
  39. Manhães, L. M. B., Da Cruz, S. M. S., & Zimbrão, G. (2014b). WAVE: An architecture for predicting dropout in undergraduate courses using EDM. 243–245. https://doi.org/10.1145/2554850.2555135
  40. Manrique, R., Nunes, B. P., Marino, O., Casanova, M. A., & Nurmikko-Fuller, T. (2019). An analysis of student representation, representative features and classification algorithms to predict degree dropout. 401–410. https://doi.org/10.1145/3303772.3303800
  41. Martinho, V. R. C., Nunes, C., & Minussi, C. R. (2013a). An intelligent system for prediction of school dropout risk group in higher education classroom based on artificial neural networks. 159–166. https://doi.org/10.1109/ICTAI.2013.33
  42. Martinho, V. R. C., Nunes, C., & Minussi, C. R. (2013b). Prediction of school dropout risk group using neural network. 111–114. Retrieved from Scopus.
  43. Martins, L. C. B., Carvalho, R. N., Carvalho, R. S., Victorino, M. C., & Holanda, M. (2017). Early prediction of college attrition using data mining (X. Chen, B. Luo, F. Luo, V. Palade, & M. A. Wani, Eds.).
  44. Mashiloane, L., & Mchunu, M. (2013). Mining for marks: A comparison of classification algorithms when predicting academic performance to identify ‘students at risk’. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8284 LNAI, 541–552. https://doi.org/10.1007/978-3-319-03844-5_54
  45. Mason, C., Twomey, J., Wright, D., & Whitman, L. (2018). Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression. Research in Higher Education, 59(3), 382–400. https://doi.org/10.1007/s11162-017-9473-z
  46. Massa, S., & Puliafito, P. P. (1999). An application of data mining to the problem of the University students’ dropout using Markov chains. In J. M. Zytkow & J. Rauch (Eds.), Principles of Data Mining and Knowledge Discovery (Vol. 1704, pp. 51–60).
  47. Massa, S., & Puliafito, P. P. (2000). Data mining in temporal sequences: A technique based on MC. In N. Ebecken & C. A. Brebbia (Eds.), Data Mining Ii (Vol. 2, pp. 289–298).
  48. Mayra, A., & Mauricio, D. (2018). Factors to Predict Dropout at the Universities: A case of study in Ecuador. In Proceedings of 2018 Ieee Global Engineering Education Conference (educon). Emerging Trends and Challenges of Engineering Education (pp. 1238–1242).
  49. Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of Student Dropout Using Personal Profile and Data Mining Approach. In K. Lavangnananda, S. PhonAmnuaisuk, W. Engchuan, & J. H. Chan (Eds.), Intelligent and Evolutionary Systems, Ies 2015 (Vol. 5, pp. 143–155).
  50. Mezzini, M. (2010). On the complexity of finding chordless paths in bipartite graphs and some interval operators in graphs and hypergraphs. Theoretical Computer Science, 411(7), 1212–1220. https://doi.org/10.1016/j.tcs.2009.12.017
  51. Mezzini, M. (2011). Fast minimal triangulation algorithm using minimum degree criterion. Theoretical Computer Science, 412(29), 3775–3787. https://doi.org/10.1016/j.tcs.2011.04.022
  52. Mezzini, M. (2012). Fully dynamic algorithm for chordal graphs with O(1) query-time and O(n2) update-time. Theoretical Computer Science, 445, 82–92. https://doi.org/10.1016/j.tcs.2012.05.002
  53. Mezzini, M. (2016). On the geodetic iteration number of the contour of a graph. Discrete Applied Mathematics, 206, 211–214. https://doi.org/10.1016/j.dam.2016.02.012
  54. Mezzini, M. (2018). Polynomial time algorithm for computing a minimum geodetic set in outerplanar graphs. Theoretical Computer Science, 745, 63–74. https://doi.org/10.1016/j.tcs.2018.05.032
  55. Mezzini, M., & Moscarini, M. (2015). On the geodeticity of the contour of a graph. Discrete Applied Mathematics, 181, 209–220. https://doi.org/10.1016/j.dam.2014.08.028
  56. Mezzini, M., & Moscarini, M. (2016). The contour of a bridged graph is geodetic. Discrete Applied Mathematics, 204, 213–215. https://doi.org/10.1016/j.dam.2015.10.007
  57. Mezzini, M., Bonavolontà, G., & Agrusti, F. (2019). Predicting university dropout by using convolutional neural networks. In INTED2019.
  58. Mohamad, S. K., & Tasir, Z. (2013). Educational Data Mining: A Review. Procedia - Social and Behavioral Sciences, 97, 320–324. https://doi.org/10.1016/j.sbspro.2013.10.240.
  59. Mollica, C., & Petrella, L. (2017). Bayesian binary quantile regression for the analysis of Bachelor-to-Master transition. Journal of Applied Statistics, 44(15), 2791–2812. https://doi.org/10.1080/02664763.2016.1263835
  60. Moscoso-Zea, O., Vizcaino, M., & Luján-Mora, S. (2017). Evaluation of methods and algorithms of educational data mining. Presented at the 2017 Research in Engineering Education Symposium, REES 2017. Retrieved from Scopus.
  61. Moseley, L. G., & Mead, D. M. (2008). Predicting who will drop out of nursing courses: A machine learning exercise. Nurse Education Today, 28(4), 469–475. https://doi.org/10.1016/j.nedt.2007.07.012
  62. Murakami, K., Takamatsu, K., Kozaki, Y., Kishida, A., Kenya, B., Noda, I., … Nakata, Y. (2019). Predicting the Probability of Student Dropout through EMIR Using Data from Current and Graduate Students. 478–481. https://doi.org/10.1109/IIAI-AAI.2018.00103
  63. Mustafa, M. N., Chowdhury, L., & Kamal, M. S. (2012). Students Dropout Prediction for Intelligent System from Tertiary Level in Developing Country.
  64. Nagy, M., & Molontay, R. (2018). Predicting Dropout in Higher Education Based on Secondary School Performance. 000389–000394. https://doi.org/10.1109/INES.2018.8523888
  65. Nandeshwar, A., Menzies, T., & Nelson, A. (2011). Learning patterns of university student retention. Expert Systems with Applications, 38(12), 14984–14996. https://doi.org/10.1016/j.eswa.2011.05.048
  66. Olinsky, A., Schumacher, P., & Quinn, J. (2016). An Expanded Assessment of Data Mining Approaches for Analyzing Actuarial Student Success Rate. International Journal of Business Analytics, 3(1), 22–44. https://doi.org/10.4018/IJBAN.2016010102
  67. Oviedo, B., Moral, S., & Puris, A. (2016). A hierarchical clustering method: Applications to educational data. Intelligent Data Analysis, 20(4), 933–951. https://doi.org/10.3233/IDA-160839
  68. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  69. Perez, B., Castellanos, C., & Correal, D. (2018). Applying Data Mining Techniques to Predict Student Dropout: A Case Study. Presented at the 2018 IEEE 1st Colombian Conference on Applications in Computational Intelligence, ColCACI 2018 - Proceedings. https://doi.org/10.1109/ColCACI.2018.8484847
  70. Pérez, B., Castellanos, C., & Correal, D. (2018). Predicting student drop-out rates using data mining techniques: A case study. Communications in Computer and Information Science, 833, 111–125. https://doi.org/10.1007/978-3-030-03023-0_10
  71. Ram, S., Wang, Y., Currim, F., & Currim, S. (2015). Using big data for predicting freshmen retention. Presented at the 2015 International Conference on Information Systems: Exploring the Information Frontier, ICIS 2015. Retrieved from Scopus.
  72. Ramentol, E., Madera, J., & Rodríguez, A. (2019). Early detection of possible undergraduate drop out using a new method based on probabilistic rough set theory. Studies in Fuzziness and Soft Computing, 377, 211–232. https://doi.org/10.1007/978-3-030-10463-4_12
  73. Rocha, C. F., Zelaya, Y. F., Sánchez, D. M., & Pérez, A. F. (2017). Prediction of university desertion through hybridization of classification algorithms. 2029, 215–222. Retrieved from Scopus.
  74. Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. https://doi.org/10.1016/j.eswa.2006.04.005
  75. Sajjadi, S., Shapiro, B., McKinlay, C., Sarkisyan, A., Shubin, C., & Osoba, E. (2017). Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifier.
  76. Santoso, L. W., & Yulia. (2019). The Analysis of Student Performance Using Data Mining. Advances in Intelligent Systems and Computing, 924, 559–573. https://doi.org/10.1007/978-981-13-6861-5_48
  77. Sarker, F., Tiropanis, T., & Davis, H. C. (2014). Linked Data, Data Mining and External Open Data for Better Prediction of at-risk Students (I. Kacem, P. Laroche, & Z. Roka, Eds.).
  78. Sarra, A., Fontanella, L., & Di Zio, S. (2018). Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework. https://doi.org/10.1007/s11205-018-1901-8
  79. Serra, A., Perchinunno, P., & Bilancia, M. (2018). Predicting student dropouts in higher education using supervised classification algorithms. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10962 LNCS, 18–33. https://doi.org/10.1007/978-3-319-95168-3_2
  80. Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157
  81. Shiratori, N. (2017). Modeling dropout behavior patterns using Bayesian Networks in Small-Scale Private University (T. Matsuo, N. Fukuta, M. Mori, K. Hashimoto, & S. Hirokawa, Eds.). New York: Ieee.
  82. Shyamala, K., & Rajagopalan, S. P. (2007). Mining student data to characterize drop out feature using clustering and decision tree techniques. International Journal of Soft Computing, 2(1), 150–156. Retrieved from Scopus.
  83. Siri, A. (2014). Predicting students’ academic dropout using artificial neural networks. Scopus.
  84. Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology, 9(4), 1–5. https://doi.org/10.17485/ijst/2016/v9i4/87032
  85. Solis, M., Moreira, T., Gonzalez, R., Fernandez, T., & Hernandez, M. (2018). Perspectives to Predict Dropout in University Students with Machine Learning. Presented at the 2018 IEEE International Work Conference on Bioinspired Intelligence, IWOBI 2018 - Proceedings. https://doi.org/10.1109/IWOBI.2018.8464191
  86. Sultana, S., Khan, S., & Abbas, M. A. (2017). Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts. International Journal of Electrical Engineering & Education, 54(2), 105–118. https://doi.org/10.1177/0020720916688484
  87. Tan, P.-N., Steinbach, M., Kumar, V. (2005). Introduction to Data Mining. Addison Wesley. ISBN: 0321321367.
  88. Timaran Pereira, R., & Caicedo Zambrano, J. (2017). Aplication of Decision Trees for Detection of Student Dropout Profiles (X. Chen, B. Luo, F. Luo, V. Palade, & M. A. Wani, Eds.).
  89. Tinto, V. (1987). Leaving college: Rethinking the causes and cures of student attrition. ERIC.
  90. Vila, D., Cisneros, S., Granda, P., Ortega, C., Posso-Yépez, M., & García-Santillán, I. (2019). Detection of desertion patterns in university students using data mining techniques: A case study. Communications in Computer and Information Science, 895, 420–429. https://doi.org/10.1007/978-3-030-05532-5_31
  91. Villwock, R., Appio, A., & Andreta, A. A. (2015). Educational Data Mining with Focus on Dropout Rates. International Journal of Computer Science and Network Security, 15(3), 17–23.
  92. Vossensteyn, J. J., Kottmann, A., Jongbloed, B. W. A., Kaiser, F., Cremonini, L., Stensaker, B., … Wollscheid, S. (2015). Dropout and completion in higher education in Europe: Main report. https://doi.org/10.2766/826962
  93. Wang, Z., Zhu, C., Ying, Z., Zhang, Y., Wang, B., Jin, X., & Yang, H. (2018). Design and Implementation of Early Warning System Based on Educational Big Data. In 2018 5th International Conference on Systems and Informatics (icsai) (pp. 549–553).
  94. Zea, L. D. F., Reina, Y. F. P., & Molano, J. I. R. (2019). Machine Learning for the Identification of Students at Risk of Academic Desertion. Communications in Computer and Information Science, 1011, 462–473. https://doi.org/10.1007/978-3-030-20798-4_40
  95. Zhang, L., & Rangwala, H. (2018). Early identification of at-risk students using iterative logistic regression. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10947 LNAI, 613–626. https://doi.org/10.1007/978-3-319-93843-1_45
  96. Zhang, Y., Li, Y., You, F., & Xu, X. (2010). Withdrawal prediction using the blackboard learning management system through SOM. 340–344. Retrieved from Scopus.
  97. Zhuhadar, L., Daday, J., Marklin, S., Kessler, B., & Helbig, T. (2019). Using survival analysis to discovering pathways to success in mathematics. Computers in Human Behavior, 92, 487–495. https://doi.org/10.1016/j.chb.2017.12.016
  98. Zuka, R., Krasts, J., & Rozevskis, U. (2017). Using Data Mining Technology For Student Data Analysis.