A university map of course knowledge.

Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and ot...

Full description

Bibliographic Details
Main Authors: Zachary A Pardos, Andrew Joo Hun Nam
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0233207
id doaj-d9d37fcd52c8445fbb9d78df4ab1d94d
record_format Article
spelling doaj-d9d37fcd52c8445fbb9d78df4ab1d94d2021-03-03T22:04:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01159e023320710.1371/journal.pone.0233207A university map of course knowledge.Zachary A PardosAndrew Joo Hun NamKnowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the text descriptions of elements onto this behaviorally informed representation. In this study, we use the course enrollment histories of 124,000 students at a public university to learn vector representations of its courses. From these course selection informed representations, a notable 88% of course attribute information was recovered, as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Honors Math 1B as Physics 7B is to Honors Physics 7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions via regression. We find that representations learned from enrollment histories resolved courses to a level of semantic fidelity exceeding that of their catalog descriptions, revealing nuanced content differences between similar courses, as well as accurately describing departments the dataset had no course descriptions for. We end with a discussion of the possible mechanisms by which this semantic structure may be informed and implications for the nascent research and practice of data science.https://doi.org/10.1371/journal.pone.0233207
collection DOAJ
language English
format Article
sources DOAJ
author Zachary A Pardos
Andrew Joo Hun Nam
spellingShingle Zachary A Pardos
Andrew Joo Hun Nam
A university map of course knowledge.
PLoS ONE
author_facet Zachary A Pardos
Andrew Joo Hun Nam
author_sort Zachary A Pardos
title A university map of course knowledge.
title_short A university map of course knowledge.
title_full A university map of course knowledge.
title_fullStr A university map of course knowledge.
title_full_unstemmed A university map of course knowledge.
title_sort university map of course knowledge.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the text descriptions of elements onto this behaviorally informed representation. In this study, we use the course enrollment histories of 124,000 students at a public university to learn vector representations of its courses. From these course selection informed representations, a notable 88% of course attribute information was recovered, as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Honors Math 1B as Physics 7B is to Honors Physics 7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions via regression. We find that representations learned from enrollment histories resolved courses to a level of semantic fidelity exceeding that of their catalog descriptions, revealing nuanced content differences between similar courses, as well as accurately describing departments the dataset had no course descriptions for. We end with a discussion of the possible mechanisms by which this semantic structure may be informed and implications for the nascent research and practice of data science.
url https://doi.org/10.1371/journal.pone.0233207
work_keys_str_mv AT zacharyapardos auniversitymapofcourseknowledge
AT andrewjoohunnam auniversitymapofcourseknowledge
AT zacharyapardos universitymapofcourseknowledge
AT andrewjoohunnam universitymapofcourseknowledge
_version_ 1714813640642658304