SMURF: A Cross-lingual Co-derivative Detection System

碩士 === 國立清華大學 === 科技管理研究所 === 95 === An automatic approach to detect content overlapping will mitigate the workload on the repetitiveness and tedious nature of manually checking the originality of a large pool of documents. The objective of this research is to design and evaluate a novel algorithm,...

Full description

Bibliographic Details
Main Authors: Jose P. Gonzalez-Brenes, 鞏和平
Other Authors: Fu-Ren Lin
Format: Others
Language:en_US
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/51824082648164983618
Description
Summary:碩士 === 國立清華大學 === 科技管理研究所 === 95 === An automatic approach to detect content overlapping will mitigate the workload on the repetitiveness and tedious nature of manually checking the originality of a large pool of documents. The objective of this research is to design and evaluate a novel algorithm, SMURF –Semantic MUltilingual Related-Document Finder, aimed to find pairs of documents in different languages that share a common source (co-derivative) which may be used to facilitate the protection of intellectual property. We demonstrate SMURF on identifying English co-derivatives on the Web of Spanish documents on several textual domains with a sentence-level precision of 88.75%. Although SMURF’s design focused on English and Spanish, the concepts applied could be easily implemented on other languages where the constituent technologies have been studied.