Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.

Classification schemes for scientific activity and publications underpin a large swath of research evaluation practices at the organizational, governmental, and national levels. Several research classifications are currently in use, and they require continuous work as new classification techniques b...

Full description

Bibliographic Details
Main Authors: Maxime Rivest, Etienne Vignola-Gagné, Éric Archambault
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0251493
id doaj-ef5e9785bac64205b0cce35ac24ff034
record_format Article
spelling doaj-ef5e9785bac64205b0cce35ac24ff0342021-05-29T04:32:33ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01165e025149310.1371/journal.pone.0251493Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.Maxime RivestEtienne Vignola-GagnéÉric ArchambaultClassification schemes for scientific activity and publications underpin a large swath of research evaluation practices at the organizational, governmental, and national levels. Several research classifications are currently in use, and they require continuous work as new classification techniques becomes available and as new research topics emerge. Convolutional neural networks, a subset of "deep learning" approaches, have recently offered novel and highly performant methods for classifying voluminous corpora of text. This article benchmarks a deep learning classification technique on more than 40 million scientific articles and on tens of thousands of scholarly journals. The comparison is performed against bibliographic coupling-, direct citation-, and manual-based classifications-the established and most widely used approaches in the field of bibliometrics, and by extension, in many science and innovation policy activities such as grant competition management. The results reveal that the performance of this first iteration of a deep learning approach is equivalent to the graph-based bibliometric approaches. All methods presented are also on par with manual classification. Somewhat surprisingly, no machine learning approaches were found to clearly outperform the simple label propagation approach that is direct citation. In conclusion, deep learning is promising because it performed just as well as the other approaches but has more flexibility to be further improved. For example, a deep neural network incorporating information from the citation network is likely to hold the key to an even better classification algorithm.https://doi.org/10.1371/journal.pone.0251493
collection DOAJ
language English
format Article
sources DOAJ
author Maxime Rivest
Etienne Vignola-Gagné
Éric Archambault
spellingShingle Maxime Rivest
Etienne Vignola-Gagné
Éric Archambault
Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.
PLoS ONE
author_facet Maxime Rivest
Etienne Vignola-Gagné
Éric Archambault
author_sort Maxime Rivest
title Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.
title_short Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.
title_full Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.
title_fullStr Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.
title_full_unstemmed Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.
title_sort article-level classification of scientific publications: a comparison of deep learning, direct citation and bibliographic coupling.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2021-01-01
description Classification schemes for scientific activity and publications underpin a large swath of research evaluation practices at the organizational, governmental, and national levels. Several research classifications are currently in use, and they require continuous work as new classification techniques becomes available and as new research topics emerge. Convolutional neural networks, a subset of "deep learning" approaches, have recently offered novel and highly performant methods for classifying voluminous corpora of text. This article benchmarks a deep learning classification technique on more than 40 million scientific articles and on tens of thousands of scholarly journals. The comparison is performed against bibliographic coupling-, direct citation-, and manual-based classifications-the established and most widely used approaches in the field of bibliometrics, and by extension, in many science and innovation policy activities such as grant competition management. The results reveal that the performance of this first iteration of a deep learning approach is equivalent to the graph-based bibliometric approaches. All methods presented are also on par with manual classification. Somewhat surprisingly, no machine learning approaches were found to clearly outperform the simple label propagation approach that is direct citation. In conclusion, deep learning is promising because it performed just as well as the other approaches but has more flexibility to be further improved. For example, a deep neural network incorporating information from the citation network is likely to hold the key to an even better classification algorithm.
url https://doi.org/10.1371/journal.pone.0251493
work_keys_str_mv AT maximerivest articlelevelclassificationofscientificpublicationsacomparisonofdeeplearningdirectcitationandbibliographiccoupling
AT etiennevignolagagne articlelevelclassificationofscientificpublicationsacomparisonofdeeplearningdirectcitationandbibliographiccoupling
AT ericarchambault articlelevelclassificationofscientificpublicationsacomparisonofdeeplearningdirectcitationandbibliographiccoupling
_version_ 1721422735475736576