Confidential Data Identification Using Text Summarization Technique in Data Leakage Prevention System

碩士 === 國立中正大學 === 通訊工程研究所 === 106 === Data Leakage Prevention (DLP) as a key element for intelligent property protection techniques is expected to provide privacy preservation benefits for multiple stakeholders. One of the key factors that will determine the success of DLP is confidential document c...

Full description

Bibliographic Details
Main Authors: WU, WEI-ZI, 吳威志
Other Authors: ZHENG,BO-ZHAO
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/772atp
Description
Summary:碩士 === 國立中正大學 === 通訊工程研究所 === 106 === Data Leakage Prevention (DLP) as a key element for intelligent property protection techniques is expected to provide privacy preservation benefits for multiple stakeholders. One of the key factors that will determine the success of DLP is confidential document classification, which deals with identifying sensitive information that is critical to stakeholders. However, with the support of traditional DLP (either based on features or statistics), the manager is not able accurately to identify the confidential documents due to variant attacks (such as rephrase or embedded in confidential documents ). In this study, we propose Gemini methods, an automatic and intellignet DLP system. Gemini is the first such system that removes irrelevant document contents and reserves key points by summary techniques, measures the rest of key features, and evaluates the documents’s categories. The applicability of our approach was demonstrated on two different datasets and three possible scenarios from real-world. Extensive experiments have shown that Gemini is superior to other methods in classifying confidential documents, where the confidential documents are different from the original texts at the training phase.