Source Code Similarity Detection

The objective of this thesis is to design and implement a tool usable for detecting similar code in different projects. The tool should be able to locate code pasted from one project to another and should be able to cope with average attempts to thwart the detection such as symbol renaming, changing...

Full description

Bibliographic Details
Main Author: Lano, Radek
Other Authors: Parízek, Pavel
Format: Dissertation
Language:English
Published: 2009
Online Access:http://www.nusl.cz/ntk/nusl-295455
id ndltd-nusl.cz-oai-invenio.nusl.cz-295455
record_format oai_dc
spelling ndltd-nusl.cz-oai-invenio.nusl.cz-2954552017-06-27T04:42:13Z Source Code Similarity Detection Source Code Similarity Detection Parízek, Pavel Lano, Radek Tůma, Petr The objective of this thesis is to design and implement a tool usable for detecting similar code in different projects. The tool should be able to locate code pasted from one project to another and should be able to cope with average attempts to thwart the detection such as symbol renaming, changing the order of unrelated entities, moving entities to different files, adding or removing comments, etc. The tool is implemented in language C++ and is ready to compare source files written in languages C and C++. The tool also enables the comparison of source code written in different languages, which can be compiled by the GNU C Compiler. To obtain good results in these cases, new modules should be added (this is necessitated due to different representations of the GNU C Compiler inner form for different languages). The first part of this thesis focuses on describing the problem domain, the architecture design and the tools usable for implementation. The second part centers on the implemented solution, a description of data structures and possibilities for application expansion using additional modules. The last part of the thesis sums up the results and outlines future possibilities of implementation. 2009 info:eu-repo/semantics/masterThesis http://www.nusl.cz/ntk/nusl-295455 eng info:eu-repo/semantics/restrictedAccess
collection NDLTD
language English
format Dissertation
sources NDLTD
description The objective of this thesis is to design and implement a tool usable for detecting similar code in different projects. The tool should be able to locate code pasted from one project to another and should be able to cope with average attempts to thwart the detection such as symbol renaming, changing the order of unrelated entities, moving entities to different files, adding or removing comments, etc. The tool is implemented in language C++ and is ready to compare source files written in languages C and C++. The tool also enables the comparison of source code written in different languages, which can be compiled by the GNU C Compiler. To obtain good results in these cases, new modules should be added (this is necessitated due to different representations of the GNU C Compiler inner form for different languages). The first part of this thesis focuses on describing the problem domain, the architecture design and the tools usable for implementation. The second part centers on the implemented solution, a description of data structures and possibilities for application expansion using additional modules. The last part of the thesis sums up the results and outlines future possibilities of implementation.
author2 Parízek, Pavel
author_facet Parízek, Pavel
Lano, Radek
author Lano, Radek
spellingShingle Lano, Radek
Source Code Similarity Detection
author_sort Lano, Radek
title Source Code Similarity Detection
title_short Source Code Similarity Detection
title_full Source Code Similarity Detection
title_fullStr Source Code Similarity Detection
title_full_unstemmed Source Code Similarity Detection
title_sort source code similarity detection
publishDate 2009
url http://www.nusl.cz/ntk/nusl-295455
work_keys_str_mv AT lanoradek sourcecodesimilaritydetection
_version_ 1718470823791034368