Summary: | 碩士 === 中原大學 === 資訊工程研究所 === 106 === From source code plagiarism among large software in enterprises to duplicates of programming assignments among students, code plagiarism detection have been an important issue at all times. The methods of code plagiarism detection can be roughly divided into two categories: textual analysis and structural analysis. Most of textual analysis methods adopt one single algorithm to extract a portion of strings from source code, compute the similarity between every two programs and then assess the possibility of plagiarism accordingly. Structural analysis methods mainly record the structural syntax in a program as a tree structure, discover the similar parts between every two trees and then estimate the similarity among programs accordingly. Every algorithm has its own pros and cons. Detection of code plagiarism by only one single algorithm is not comprehensive. Therefore, this thesis proposes an approach to integrate the methods of two categories in order to detect code plagiarism from different aspects. To verify the effectiveness, our experiments take into account the source codes from actual student assignments and evaluate the accuracy of our results by using a plagiarism list confirmed manually. Compared with the existing tools, our approach performs better in each of the accuracy measures.
|