LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN

Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming la...

Full description

Bibliographic Details
Main Author: Oscar Karnalim
Format: Article
Language:English
Published: UMP Publisher 2018-02-01
Series:International Journal of Software Engineering and Computer Systems
Subjects:
Online Access:http://ijsecs.ump.edu.my/images/archive/vol4-1/ijsecs.4.1.2018.1.0036.pdf
id doaj-bab93291f68f485e8dfe85df72d5d476
record_format Article
spelling doaj-bab93291f68f485e8dfe85df72d5d4762020-11-25T00:40:06ZengUMP PublisherInternational Journal of Software Engineering and Computer Systems2289-85222180-06502018-02-0141294710.15282/ijsecs.4.1.2018.3.0036LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERNOscar Karnalim0Maranatha Christian UniversityDespite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming language dependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies.http://ijsecs.ump.edu.my/images/archive/vol4-1/ijsecs.4.1.2018.1.0036.pdfsource code retrievallanguage-agnostic approachlexical patterndomainspecific ranking
collection DOAJ
language English
format Article
sources DOAJ
author Oscar Karnalim
spellingShingle Oscar Karnalim
LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
International Journal of Software Engineering and Computer Systems
source code retrieval
language-agnostic approach
lexical pattern
domainspecific ranking
author_facet Oscar Karnalim
author_sort Oscar Karnalim
title LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
title_short LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
title_full LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
title_fullStr LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
title_full_unstemmed LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN
title_sort language-agnostic source code retrieval using keyword & identifier lexical pattern
publisher UMP Publisher
series International Journal of Software Engineering and Computer Systems
issn 2289-8522
2180-0650
publishDate 2018-02-01
description Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming language dependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies.
topic source code retrieval
language-agnostic approach
lexical pattern
domainspecific ranking
url http://ijsecs.ump.edu.my/images/archive/vol4-1/ijsecs.4.1.2018.1.0036.pdf
work_keys_str_mv AT oscarkarnalim languageagnosticsourcecoderetrievalusingkeywordidentifierlexicalpattern
_version_ 1725291377900650496