Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews

Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract and keywords of each study. In...

Full description

Bibliographic Details
Main Authors: Ramon Abilio, Flávio Morais, Gustavo Vale, Claudiane Oliveira, Denilson Pereira, Heitor Costa
Format: Article
Language:English
Published: Centro Latinoamericano de Estudios en Informática 2015-08-01
Series:CLEI Electronic Journal
Subjects:
Online Access:http://www.clei.org/cleiej/papers/v18i2p2.pdf
id doaj-36b6c5b02d7b4051ac9d1e24379d09b7
record_format Article
spelling doaj-36b6c5b02d7b4051ac9d1e24379d09b72020-11-24T21:44:54ZengCentro Latinoamericano de Estudios en InformáticaCLEI Electronic Journal0717-50000717-50002015-08-011822:12:24Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature ReviewsRamon Abilio0Flávio Morais1Gustavo Vale2Claudiane Oliveira3Denilson Pereira4Heitor Costa5IT Department - Federal University of Lavras, Lavras, MG, Brazil, 37200-000IT Department - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Department of Computer Science - Federal University of Minas Gerais, Belo Horizonte, MG, Brazil, 31270-010Department of Computer Science - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Department of Computer Science - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Department of Computer Science - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract and keywords of each study. In the last years, the number of published scientific studies has grown increasing the effort to perform this sort of reviews. In this paper, we proposed strategies to detect non-papers and duplicated references in results exported by search engines, and strategies to rank the references in decreasing order of importance for an SLR, regarding the terms in the search string. These strategies are based on Information Retrieval techniques. We implemented the strategies and carried out an experimental evaluation of their applicability using two real datasets. As results, the strategy to detect non-papers presented 100% of precision and 50% of recall; the strategy to detect duplicates detected more duplicates than the manual inspection; and one of the strategies to rank relevant references presented 50% of precision and 80% of recall. Therefore, the results show that the proposed strategies can minimize the effort in the Primary Selection stage of an SLR.http://www.clei.org/cleiej/papers/v18i2p2.pdfSystematic Literature ReviewInformation Retrieval; Vector Model; Primary SelectionInformation RetrievalVector ModelPrimary Selection
collection DOAJ
language English
format Article
sources DOAJ
author Ramon Abilio
Flávio Morais
Gustavo Vale
Claudiane Oliveira
Denilson Pereira
Heitor Costa
spellingShingle Ramon Abilio
Flávio Morais
Gustavo Vale
Claudiane Oliveira
Denilson Pereira
Heitor Costa
Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
CLEI Electronic Journal
Systematic Literature Review
Information Retrieval; Vector Model; Primary Selection
Information Retrieval
Vector Model
Primary Selection
author_facet Ramon Abilio
Flávio Morais
Gustavo Vale
Claudiane Oliveira
Denilson Pereira
Heitor Costa
author_sort Ramon Abilio
title Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
title_short Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
title_full Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
title_fullStr Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
title_full_unstemmed Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
title_sort applying information retrieval techniques to detect duplicates and to rank references in the preliminary phases of systematic literature reviews
publisher Centro Latinoamericano de Estudios en Informática
series CLEI Electronic Journal
issn 0717-5000
0717-5000
publishDate 2015-08-01
description Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract and keywords of each study. In the last years, the number of published scientific studies has grown increasing the effort to perform this sort of reviews. In this paper, we proposed strategies to detect non-papers and duplicated references in results exported by search engines, and strategies to rank the references in decreasing order of importance for an SLR, regarding the terms in the search string. These strategies are based on Information Retrieval techniques. We implemented the strategies and carried out an experimental evaluation of their applicability using two real datasets. As results, the strategy to detect non-papers presented 100% of precision and 50% of recall; the strategy to detect duplicates detected more duplicates than the manual inspection; and one of the strategies to rank relevant references presented 50% of precision and 80% of recall. Therefore, the results show that the proposed strategies can minimize the effort in the Primary Selection stage of an SLR.
topic Systematic Literature Review
Information Retrieval; Vector Model; Primary Selection
Information Retrieval
Vector Model
Primary Selection
url http://www.clei.org/cleiej/papers/v18i2p2.pdf
work_keys_str_mv AT ramonabilio applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews
AT flaviomorais applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews
AT gustavovale applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews
AT claudianeoliveira applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews
AT denilsonpereira applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews
AT heitorcosta applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews
_version_ 1725907999673810944