SMURF: A Cross-lingual Co-derivative Detection System
碩士 === 國立清華大學 === 科技管理研究所 === 95 === An automatic approach to detect content overlapping will mitigate the workload on the repetitiveness and tedious nature of manually checking the originality of a large pool of documents. The objective of this research is to design and evaluate a novel algorithm,...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/51824082648164983618 |
Summary: | 碩士 === 國立清華大學 === 科技管理研究所 === 95 === An automatic approach to detect content overlapping will mitigate the workload on the repetitiveness and tedious nature of manually checking the originality of a large pool of documents. The objective of this research is to design and evaluate a novel algorithm, SMURF –Semantic MUltilingual Related-Document Finder, aimed to find pairs of documents in different languages that share a common source (co-derivative) which may be used to facilitate the protection of intellectual property. We demonstrate SMURF on identifying English co-derivatives on the Web of Spanish documents on several textual domains with a sentence-level precision of 88.75%. Although SMURF’s design focused
on English and Spanish, the concepts applied could be easily implemented on other languages where the constituent technologies have been studied.
|
---|