Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews
Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract and keywords of each study. In...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Centro Latinoamericano de Estudios en Informática
2015-08-01
|
Series: | CLEI Electronic Journal |
Subjects: | |
Online Access: | http://www.clei.org/cleiej/papers/v18i2p2.pdf |
id |
doaj-36b6c5b02d7b4051ac9d1e24379d09b7 |
---|---|
record_format |
Article |
spelling |
doaj-36b6c5b02d7b4051ac9d1e24379d09b72020-11-24T21:44:54ZengCentro Latinoamericano de Estudios en InformáticaCLEI Electronic Journal0717-50000717-50002015-08-011822:12:24Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature ReviewsRamon Abilio0Flávio Morais1Gustavo Vale2Claudiane Oliveira3Denilson Pereira4Heitor Costa5IT Department - Federal University of Lavras, Lavras, MG, Brazil, 37200-000IT Department - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Department of Computer Science - Federal University of Minas Gerais, Belo Horizonte, MG, Brazil, 31270-010Department of Computer Science - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Department of Computer Science - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Department of Computer Science - Federal University of Lavras, Lavras, MG, Brazil, 37200-000Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract and keywords of each study. In the last years, the number of published scientific studies has grown increasing the effort to perform this sort of reviews. In this paper, we proposed strategies to detect non-papers and duplicated references in results exported by search engines, and strategies to rank the references in decreasing order of importance for an SLR, regarding the terms in the search string. These strategies are based on Information Retrieval techniques. We implemented the strategies and carried out an experimental evaluation of their applicability using two real datasets. As results, the strategy to detect non-papers presented 100% of precision and 50% of recall; the strategy to detect duplicates detected more duplicates than the manual inspection; and one of the strategies to rank relevant references presented 50% of precision and 80% of recall. Therefore, the results show that the proposed strategies can minimize the effort in the Primary Selection stage of an SLR.http://www.clei.org/cleiej/papers/v18i2p2.pdfSystematic Literature ReviewInformation Retrieval; Vector Model; Primary SelectionInformation RetrievalVector ModelPrimary Selection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ramon Abilio Flávio Morais Gustavo Vale Claudiane Oliveira Denilson Pereira Heitor Costa |
spellingShingle |
Ramon Abilio Flávio Morais Gustavo Vale Claudiane Oliveira Denilson Pereira Heitor Costa Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews CLEI Electronic Journal Systematic Literature Review Information Retrieval; Vector Model; Primary Selection Information Retrieval Vector Model Primary Selection |
author_facet |
Ramon Abilio Flávio Morais Gustavo Vale Claudiane Oliveira Denilson Pereira Heitor Costa |
author_sort |
Ramon Abilio |
title |
Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews |
title_short |
Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews |
title_full |
Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews |
title_fullStr |
Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews |
title_full_unstemmed |
Applying Information Retrieval Techniques to Detect Duplicates and to Rank References in the Preliminary Phases of Systematic Literature Reviews |
title_sort |
applying information retrieval techniques to detect duplicates and to rank references in the preliminary phases of systematic literature reviews |
publisher |
Centro Latinoamericano de Estudios en Informática |
series |
CLEI Electronic Journal |
issn |
0717-5000 0717-5000 |
publishDate |
2015-08-01 |
description |
Systematic Literature Review (SLR) is a means to synthesize relevant and high quality studies related to a specific topic or research questions. In the Primary Selection stage of an SLR, the selection of studies is usually performed manually by reading title, abstract and keywords of each study. In the last years, the number of published scientific studies has grown increasing the effort to perform this sort of reviews. In this paper, we proposed strategies to detect non-papers and duplicated references in results exported by search engines, and strategies to rank the references in decreasing order of importance for an SLR, regarding the terms in the search string. These strategies are based on Information Retrieval techniques. We implemented the strategies and carried out an experimental evaluation of their applicability using two real datasets. As results, the strategy to detect non-papers presented 100% of precision and 50% of recall; the strategy to detect duplicates detected more duplicates than the manual inspection; and one of the strategies to rank relevant references presented 50% of precision and 80% of recall. Therefore, the results show that the proposed strategies can minimize the effort in the Primary Selection stage of an SLR. |
topic |
Systematic Literature Review Information Retrieval; Vector Model; Primary Selection Information Retrieval Vector Model Primary Selection |
url |
http://www.clei.org/cleiej/papers/v18i2p2.pdf |
work_keys_str_mv |
AT ramonabilio applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews AT flaviomorais applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews AT gustavovale applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews AT claudianeoliveira applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews AT denilsonpereira applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews AT heitorcosta applyinginformationretrievaltechniquestodetectduplicatesandtorankreferencesinthepreliminaryphasesofsystematicliteraturereviews |
_version_ |
1725907999673810944 |