Plagiarism detection based on word semantic clustering

碩士 === 國立中山大學 === 電機工程學系研究所 === 106 === Plagiarism is a common problem in current years. With the advance of Internet, it is more and more easy to obtain other people''s writings. When someone uses the content without citation, he may cause the problem of plagiarism. Plagiarisms wi...

Full description

Bibliographic Details
Main Authors: Chia-Yang Chang, 張家揚
Other Authors: Shie-jue Lee
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/3w54sj
id ndltd-TW-106NSYS5442130
record_format oai_dc
spelling ndltd-TW-106NSYS54421302019-10-31T05:22:28Z http://ndltd.ncl.edu.tw/handle/3w54sj Plagiarism detection based on word semantic clustering 基於文字語意分群之文章抄襲偵測 Chia-Yang Chang 張家揚 碩士 國立中山大學 電機工程學系研究所 106 Plagiarism is a common problem in current years. With the advance of Internet, it is more and more easy to obtain other people''s writings. When someone uses the content without citation, he may cause the problem of plagiarism. Plagiarisms will infringe the intellectual property rights. So plagiarism detection is a serious problem in nowadays.Current plagiarism detection methods are similar to near-duplicate detection methods, like VSM(vector space model) or bag-of-words. These methods can''t handle the complex plagiarized technique very well, e.g. word substitution and sentence rewriting. Therefore, we focus on the semantic of words. In this paper, we propose a new method for plagiarism detection by analyzing the semantic of words.Word2vec is a word embedding model proposed by Google group. It can use a vector to represent a word. We use Word2vec to obtain the vector of words and use PCA for dimension reduction. After that, we use spherical K-means to cluster the words into concepts. By using Word2vec, we can consider the semantic of words and cluster the words into concepts in order to deal with the complex plagiarized technique.Finally, we will show our experimental results and compare with other methods. The experimental results show that our method is well performance. Shie-jue Lee 李錫智 2018 學位論文 ; thesis 44 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 電機工程學系研究所 === 106 === Plagiarism is a common problem in current years. With the advance of Internet, it is more and more easy to obtain other people''s writings. When someone uses the content without citation, he may cause the problem of plagiarism. Plagiarisms will infringe the intellectual property rights. So plagiarism detection is a serious problem in nowadays.Current plagiarism detection methods are similar to near-duplicate detection methods, like VSM(vector space model) or bag-of-words. These methods can''t handle the complex plagiarized technique very well, e.g. word substitution and sentence rewriting. Therefore, we focus on the semantic of words. In this paper, we propose a new method for plagiarism detection by analyzing the semantic of words.Word2vec is a word embedding model proposed by Google group. It can use a vector to represent a word. We use Word2vec to obtain the vector of words and use PCA for dimension reduction. After that, we use spherical K-means to cluster the words into concepts. By using Word2vec, we can consider the semantic of words and cluster the words into concepts in order to deal with the complex plagiarized technique.Finally, we will show our experimental results and compare with other methods. The experimental results show that our method is well performance.
author2 Shie-jue Lee
author_facet Shie-jue Lee
Chia-Yang Chang
張家揚
author Chia-Yang Chang
張家揚
spellingShingle Chia-Yang Chang
張家揚
Plagiarism detection based on word semantic clustering
author_sort Chia-Yang Chang
title Plagiarism detection based on word semantic clustering
title_short Plagiarism detection based on word semantic clustering
title_full Plagiarism detection based on word semantic clustering
title_fullStr Plagiarism detection based on word semantic clustering
title_full_unstemmed Plagiarism detection based on word semantic clustering
title_sort plagiarism detection based on word semantic clustering
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/3w54sj
work_keys_str_mv AT chiayangchang plagiarismdetectionbasedonwordsemanticclustering
AT zhāngjiāyáng plagiarismdetectionbasedonwordsemanticclustering
AT chiayangchang jīyúwénzìyǔyìfēnqúnzhīwénzhāngchāoxízhēncè
AT zhāngjiāyáng jīyúwénzìyǔyìfēnqúnzhīwénzhāngchāoxízhēncè
_version_ 1719284621805355008