Learn about technology, digital transformation and many more topics with our publications.


The role of Deep Learning in virtual student dropout in Colombia

¿Ser StartUp?

Julio Cesar Martínez Zarate

I like to spend time with my family, my favorite color is red, my favorite food is Salmon.

July 03, 2020

In this time of confinement, universities in the country have seen the need to implement e-learning systems to teach their classes, but perhaps they have not questioned or taken into account the dropout rate in the country in virtual higher education for it, in the country there is a dropout rate of approximately 22% for 2016 in university virtual education for different cases, although over time this modality has been increasing.

Student desertion is a reality in the country and is a complex phenomenon. A predictive model can help mitigate and avoid future dropouts, taking historical data and with them yield expected results to be analyzed and support decision making. This proposal is developed by taking historical events with different variables of social, academic, personal, labor, income to E-learning platforms, etc. and then applying Deep Learning algorithms to these variables. The prediction of dropout probabilities for each student is expected, and then, with this information, early preventive measures can be alerted and applied to the student population.

In previous research, predictive models have been created and implemented in some higher education institutions, using different techniques and algorithms such as: Data Mining, decision trees, the ID3 algorithm (Induction Decision Trees), etc. Amaya and Barrientos (2014), proposed a predictive model of student dropout using Data Mining techniques, characterizing the students of a university with the aim of predicting the probability of student dropout with different algorithms to classify variables.

Quintana and Trinidad (2013), for their part, presented a comparison of classification algorithms for the prediction of school dropout in a university center and with this, determine the most appropriate algorithm, based on the accuracy of the classification. Patiño and Cardona (2012), develop a study on dropout levels in Colombia and Latin America to identify the factors that influence the increase of dropout in universities, but does not establish a model to predict such dropout and investigates on dropout in traditional, not virtual, modality. In this sense, Cuji and Gavilanes (2017), propose the construction of a model to predict the probability of dropout by students of their university academic program, but are not students in virtual modality and uses classification techniques based on decision trees. (Vila and Granda, 2018) perform similar work at a university in Ecuador, but using Data Mining techniques. And finally, Forero and Piñeros (2019), show a case study of identification of industrial engineering at risk of dropout in certain academic semesters with Machine Learning.

For the construction of this predictive model is proposed to solve with supervised Deep Learning techniques, which is a technique based on Artificial Neural Networks, which has been emerging in recent years as a powerful tool for machine learning, This promises to reshape the future of Artificial Intelligence. Which consist of emulating human learning in order to obtain certain knowledge from a set of data examples, a training set is sent as input to the system during the training phase, each input is labeled with a desired output value and thus the system knows what the output is like when an input arrives.

There are categorical data types, for example: gender, marital status, employment status, tutor’s attention to students, student’s perception of the virtual tutor, among others; and numerical data (continuous and discrete), for example: age, weight, number of children, number of siblings, number of times of weekly virtual campus access (student and tutor), average number of weekly visits to pages in the virtual campus, average response time of the tutor to the student, etc. “Diaz, 2008”

The variables or attributes to be extracted and taken into account for this model are numerical:

  • Individual factors: age
  • Academic factors: student average, ICFES average for admission, student’s weekly access to the virtual campus
  • Factors internal to the University: semester cost
  • Socioeconomic factors: stratum, economic income

Numerical type factors are taken to make the initial proposal of the prediction model in which the “input variables” or labels are numerical to employ a supervised Deep learning regression model, in which digits are obtained as an output. This does not exclude the way to transform non-numeric variables to a value for processing as can be done by the powerful Tensorflow library, which is a free software library owned by Google, used to perform numerical calculations using data flow diagrams and others, but as an initial proposal it will be done with the type of values mentioned.

Technological factors such as Internet access and familiarity with ICTs are very important in the present context, since we are dealing with virtual education, but will not be taken into account as a variable in the model. Other factors that will not be included in this initial proposal, but are important are: marital status and employment status of the parents.

The following figure illustrates the standard Deep Learning model process used in this proposal, here the relationship between the “features or attributes” and the “labels” is defined; the definition of the objective is made, normalization of data that exist in place; then the creation of a set of training and test data, to reach the creation of the model, training and finally the predictions from the previous sub-processes.


Forero, L. & Piñeros, Y. (2019). Propuesta de Arquitectura y Aplicación del Aprendizaje Automático (Ml) como Estrategia para la Reducción de los Niveles de Deserción Universitaria por Factores Académicos

Díaz, C. (2008). Modelo conceptual para la deserción estudiantil universitaria chilena. (U. C. Concepción, Ed.) Estudios Pedagógicos, XXXIV. http://dx.doi.org/10.4067/S0718-07052008000200004

Tensorflow (2020). Tensorflow. Recuperado de: https://www.tensorflow.org/?hl=es (Abril, 2020)

Ravì, D. (2017). Aprendizaje profundo para la informática sanitaria. IEEE Journal of Biomedical and Health Informatics, vol. 21, nº 1, pp. 4-21. https://doi.org/10.1109/JBHI.2016.2636665

Rate this post


Submit a Comment

Your email address will not be published.

You may also be interested in reading