Source Code Similarity Detection
The objective of this thesis is to design and implement a tool usable for detecting similar code in different projects. The tool should be able to locate code pasted from one project to another and should be able to cope with average attempts to thwart the detection such as symbol renaming, changing...
Main Author: | |
---|---|
Other Authors: | |
Format: | Dissertation |
Language: | English |
Published: |
2009
|
Online Access: | http://www.nusl.cz/ntk/nusl-295455 |
id |
ndltd-nusl.cz-oai-invenio.nusl.cz-295455 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-nusl.cz-oai-invenio.nusl.cz-2954552017-06-27T04:42:13Z Source Code Similarity Detection Source Code Similarity Detection Parízek, Pavel Lano, Radek Tůma, Petr The objective of this thesis is to design and implement a tool usable for detecting similar code in different projects. The tool should be able to locate code pasted from one project to another and should be able to cope with average attempts to thwart the detection such as symbol renaming, changing the order of unrelated entities, moving entities to different files, adding or removing comments, etc. The tool is implemented in language C++ and is ready to compare source files written in languages C and C++. The tool also enables the comparison of source code written in different languages, which can be compiled by the GNU C Compiler. To obtain good results in these cases, new modules should be added (this is necessitated due to different representations of the GNU C Compiler inner form for different languages). The first part of this thesis focuses on describing the problem domain, the architecture design and the tools usable for implementation. The second part centers on the implemented solution, a description of data structures and possibilities for application expansion using additional modules. The last part of the thesis sums up the results and outlines future possibilities of implementation. 2009 info:eu-repo/semantics/masterThesis http://www.nusl.cz/ntk/nusl-295455 eng info:eu-repo/semantics/restrictedAccess |
collection |
NDLTD |
language |
English |
format |
Dissertation |
sources |
NDLTD |
description |
The objective of this thesis is to design and implement a tool usable for detecting similar code in different projects. The tool should be able to locate code pasted from one project to another and should be able to cope with average attempts to thwart the detection such as symbol renaming, changing the order of unrelated entities, moving entities to different files, adding or removing comments, etc. The tool is implemented in language C++ and is ready to compare source files written in languages C and C++. The tool also enables the comparison of source code written in different languages, which can be compiled by the GNU C Compiler. To obtain good results in these cases, new modules should be added (this is necessitated due to different representations of the GNU C Compiler inner form for different languages). The first part of this thesis focuses on describing the problem domain, the architecture design and the tools usable for implementation. The second part centers on the implemented solution, a description of data structures and possibilities for application expansion using additional modules. The last part of the thesis sums up the results and outlines future possibilities of implementation. |
author2 |
Parízek, Pavel |
author_facet |
Parízek, Pavel Lano, Radek |
author |
Lano, Radek |
spellingShingle |
Lano, Radek Source Code Similarity Detection |
author_sort |
Lano, Radek |
title |
Source Code Similarity Detection |
title_short |
Source Code Similarity Detection |
title_full |
Source Code Similarity Detection |
title_fullStr |
Source Code Similarity Detection |
title_full_unstemmed |
Source Code Similarity Detection |
title_sort |
source code similarity detection |
publishDate |
2009 |
url |
http://www.nusl.cz/ntk/nusl-295455 |
work_keys_str_mv |
AT lanoradek sourcecodesimilaritydetection |
_version_ |
1718470823791034368 |