SMURF: A Cross-lingual Co-derivative Detection System

碩士 === 國立清華大學 === 科技管理研究所 === 95 === An automatic approach to detect content overlapping will mitigate the workload on the repetitiveness and tedious nature of manually checking the originality of a large pool of documents. The objective of this research is to design and evaluate a novel algorithm,...

Full description

Bibliographic Details
Main Authors:	Jose P. Gonzalez-Brenes, 鞏和平
Other Authors:	Fu-Ren Lin
Format:	Others
Language:	en_US
Published:	2007
Online Access:	http://ndltd.ncl.edu.tw/handle/51824082648164983618

Description
Summary:	碩士 === 國立清華大學 === 科技管理研究所 === 95 === An automatic approach to detect content overlapping will mitigate the workload on the repetitiveness and tedious nature of manually checking the originality of a large pool of documents. The objective of this research is to design and evaluate a novel algorithm, SMURF –Semantic MUltilingual Related-Document Finder, aimed to find pairs of documents in different languages that share a common source (co-derivative) which may be used to facilitate the protection of intellectual property. We demonstrate SMURF on identifying English co-derivatives on the Web of Spanish documents on several textual domains with a sentence-level precision of 88.75%. Although SMURF’s design focused on English and Spanish, the concepts applied could be easily implemented on other languages where the constituent technologies have been studied.

SMURF: A Cross-lingual Co-derivative Detection System

Similar Items