LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming la...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
UMP Publisher
2018-02-01
|
Series: | International Journal of Software Engineering and Computer Systems |
Subjects: | |
Online Access: | http://ijsecs.ump.edu.my/images/archive/vol4-1/ijsecs.4.1.2018.1.0036.pdf |
id |
doaj-bab93291f68f485e8dfe85df72d5d476 |
---|---|
record_format |
Article |
spelling |
doaj-bab93291f68f485e8dfe85df72d5d4762020-11-25T00:40:06ZengUMP PublisherInternational Journal of Software Engineering and Computer Systems2289-85222180-06502018-02-0141294710.15282/ijsecs.4.1.2018.3.0036LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERNOscar Karnalim0Maranatha Christian UniversityDespite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming language dependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies.http://ijsecs.ump.edu.my/images/archive/vol4-1/ijsecs.4.1.2018.1.0036.pdfsource code retrievallanguage-agnostic approachlexical patterndomainspecific ranking |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Oscar Karnalim |
spellingShingle |
Oscar Karnalim LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN International Journal of Software Engineering and Computer Systems source code retrieval language-agnostic approach lexical pattern domainspecific ranking |
author_facet |
Oscar Karnalim |
author_sort |
Oscar Karnalim |
title |
LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN |
title_short |
LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN |
title_full |
LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN |
title_fullStr |
LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN |
title_full_unstemmed |
LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN |
title_sort |
language-agnostic source code retrieval using keyword & identifier lexical pattern |
publisher |
UMP Publisher |
series |
International Journal of Software Engineering and Computer Systems |
issn |
2289-8522 2180-0650 |
publishDate |
2018-02-01 |
description |
Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming language dependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies. |
topic |
source code retrieval language-agnostic approach lexical pattern domainspecific ranking |
url |
http://ijsecs.ump.edu.my/images/archive/vol4-1/ijsecs.4.1.2018.1.0036.pdf |
work_keys_str_mv |
AT oscarkarnalim languageagnosticsourcecoderetrievalusingkeywordidentifierlexicalpattern |
_version_ |
1725291377900650496 |