Plagiarism detection based on word semantic clustering
碩士 === 國立中山大學 === 電機工程學系研究所 === 106 === Plagiarism is a common problem in current years. With the advance of Internet, it is more and more easy to obtain other people''s writings. When someone uses the content without citation, he may cause the problem of plagiarism. Plagiarisms wi...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/3w54sj |
id |
ndltd-TW-106NSYS5442130 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NSYS54421302019-10-31T05:22:28Z http://ndltd.ncl.edu.tw/handle/3w54sj Plagiarism detection based on word semantic clustering 基於文字語意分群之文章抄襲偵測 Chia-Yang Chang 張家揚 碩士 國立中山大學 電機工程學系研究所 106 Plagiarism is a common problem in current years. With the advance of Internet, it is more and more easy to obtain other people''s writings. When someone uses the content without citation, he may cause the problem of plagiarism. Plagiarisms will infringe the intellectual property rights. So plagiarism detection is a serious problem in nowadays.Current plagiarism detection methods are similar to near-duplicate detection methods, like VSM(vector space model) or bag-of-words. These methods can''t handle the complex plagiarized technique very well, e.g. word substitution and sentence rewriting. Therefore, we focus on the semantic of words. In this paper, we propose a new method for plagiarism detection by analyzing the semantic of words.Word2vec is a word embedding model proposed by Google group. It can use a vector to represent a word. We use Word2vec to obtain the vector of words and use PCA for dimension reduction. After that, we use spherical K-means to cluster the words into concepts. By using Word2vec, we can consider the semantic of words and cluster the words into concepts in order to deal with the complex plagiarized technique.Finally, we will show our experimental results and compare with other methods. The experimental results show that our method is well performance. Shie-jue Lee 李錫智 2018 學位論文 ; thesis 44 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 電機工程學系研究所 === 106 === Plagiarism is a common problem in current years. With the advance of Internet, it is more and more easy to obtain other people''s writings. When someone uses the content without citation, he may cause the problem of plagiarism. Plagiarisms will infringe the intellectual property rights. So plagiarism detection is a serious problem in nowadays.Current plagiarism detection methods are similar to near-duplicate detection methods, like VSM(vector space model) or bag-of-words. These methods can''t handle the complex plagiarized technique very well, e.g. word substitution and sentence rewriting. Therefore, we focus on the semantic of words. In this paper, we propose a new method for plagiarism detection by analyzing the semantic of words.Word2vec is a word embedding model proposed by Google group. It can use a vector to represent a word. We use Word2vec to obtain the vector of words and use PCA for dimension reduction. After that, we use spherical K-means to cluster the words into concepts. By using Word2vec, we can consider the semantic of words and cluster the words into concepts in order to deal with the complex plagiarized technique.Finally, we will show our experimental results and compare with other methods. The experimental results show that our method is well performance.
|
author2 |
Shie-jue Lee |
author_facet |
Shie-jue Lee Chia-Yang Chang 張家揚 |
author |
Chia-Yang Chang 張家揚 |
spellingShingle |
Chia-Yang Chang 張家揚 Plagiarism detection based on word semantic clustering |
author_sort |
Chia-Yang Chang |
title |
Plagiarism detection based on word semantic clustering |
title_short |
Plagiarism detection based on word semantic clustering |
title_full |
Plagiarism detection based on word semantic clustering |
title_fullStr |
Plagiarism detection based on word semantic clustering |
title_full_unstemmed |
Plagiarism detection based on word semantic clustering |
title_sort |
plagiarism detection based on word semantic clustering |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/3w54sj |
work_keys_str_mv |
AT chiayangchang plagiarismdetectionbasedonwordsemanticclustering AT zhāngjiāyáng plagiarismdetectionbasedonwordsemanticclustering AT chiayangchang jīyúwénzìyǔyìfēnqúnzhīwénzhāngchāoxízhēncè AT zhāngjiāyáng jīyúwénzìyǔyìfēnqúnzhīwénzhāngchāoxízhēncè |
_version_ |
1719284621805355008 |