Deep Learning for Printed Document Source Identification

碩士 === 國立交通大學 === 資訊管理研究所 === 105 === Due to the rapid development of the information technology and wide use of the Internet, Information is easily to be obtained in the form of digital format. Digital content can be freely printed into documents since the convenience and accessibility of the print...

Full description

Bibliographic Details
Main Authors: Chen, Heng-Yi, 陳恒顗
Other Authors: Tsai, Min-Jen
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/b7h3e9
Description
Summary:碩士 === 國立交通大學 === 資訊管理研究所 === 105 === Due to the rapid development of the information technology and wide use of the Internet, Information is easily to be obtained in the form of digital format. Digital content can be freely printed into documents since the convenience and accessibility of the printers. On the other hand, printed documents are also illegally manipulated by some criminal issues such as: forged documents, counterfeit currency, copyright infringement, and so on. Therefore, how to develop an efficient and appropriate safety testing tool to identify the source of printed documents is an important task in the meantime. Currently, the forensic system using the statistical methods and support vector machine technology has been able to identify the source printer for the text and the image documents. Such an approach belongs to the category of shallow machine learning with human interaction during the stages of feature extraction, feature selection and data pre-processing. In this study, a novel forensic system to solve the complex image classification problem is developed by Convolutional Neural Networks (CNNs) of deep learning which can learn the features automatically. We expect to implement CNNs for the printer source identification without too often human involvement with high accuracy. Through the experimental comparison of two above mentioned systems, the issues of whether the deep learning can do better than the shallow machine learning, whether the aging of printers can affect the identification, whether different region of the samples can affect the identification capability are all discussed.