Summary: | Intelligent tutors are becoming more popular with the increased use of computersand hand held devices in the education sphere. An area of research isinvestigating how machine learning can be used to improve the precision andfeedback of the tutor. This thesis compares machine learning clustering algorithmswith various distance functions in an attempt to cluster together codesnapshots of students solving a programming task. It investigates whethera general non-problem specific implementation of a distance function canbe used to identify when a student is stuck solving an assignment. Themachine learning algorithms compared are k-medoids, the randomly initializedalgorithm that produces a pre-defined number of clusters and affinitypropagation, a two phase algorithm with dynamic cluster sizes. Distancefunctions tried are based on the Bag of Words approach, lower level APIcalls and a problem specific distance function. This thesis could not find agood algorithm to achieve the sought goal, and lists a number of possibleerror sources linked to the data, preprocessing and algorithm. The methodologyis promising but requires a controlled environment at every level toassure data quality does not detract from the analysis in later stages.
|