Explaining Individual and Collective Programming Students’ Behavior by Interpreting a Black-Box Predictive Model

Predicting student performance as early as possible and analysing to which extent initial student behaviour could lead to failure or success is critical in introductory programming (CS1) courses, for allowing prompt intervention in a move towards alleviating their high failure rate. However, in CS1...

Full description

Bibliographic Details
Main Authors: Filipe Dwan Pereira, Samuel C. Fonseca, Elaine H. T. Oliveira, Alexandra I. Cristea, Henrik Bellhauser, Luiz Rodrigues, David B. F. Oliveira, Seiji Isotani, Leandro S. G. Carvalho
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
CS1
Online Access:https://ieeexplore.ieee.org/document/9517104/
Description
Summary:Predicting student performance as early as possible and analysing to which extent initial student behaviour could lead to failure or success is critical in introductory programming (CS1) courses, for allowing prompt intervention in a move towards alleviating their high failure rate. However, in CS1 performance prediction, there is a serious lack of studies that interpret the predictive model&#x2019;s decisions. In this sense, we designed a long-term study using very fine-grained log-data of 2056 students, collected from the first two weeks of CS1 courses. We extract features that measure how students deal with deadlines, how they fix errors, how much time they spend programming, and so forth. Subsequently, we construct a predictive model that achieved cutting-edge results with area under the curve (AUC) of.89, and an average accuracy of 81.3&#x0025;. To allow an effective intervention and to facilitate human-AI collaboration towards prescriptive analytics, we, for the first time, to the best of our knowledge, go a step further than the prediction itself and leverage this field by proposing an approach to explaining our predictive model decisions individually and collectively using a game-theory based framework (SHAP), (Lundberg <italic>et al.</italic>, 2020) that allows interpreting our black-box non-linear model linearly. In other words, we explain the feature effects, clearly by visualising and analysing individual predictions, the overall importance of features, and identification of typical prediction paths. This method can be further applied to other emerging competitive models, as the CS1 prediction field progresses, ensuring transparency of the process for key stakeholders: administrators, teachers, and students.
ISSN:2169-3536