Topic Chains for Determining Risk of Unauthorized Information Transfer

abstract: Corporations invest considerable resources to create, preserve and analyze their data; yet while organizations are interested in protecting against unauthorized data transfer, there lacks a comprehensive metric to discriminate what data are at risk of leaking. This thesis motivates t...

Full description

Bibliographic Details
Other Authors: Wright, Jeremy Lee (Author)
Format: Dissertation
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.27506
Description
Summary:abstract: Corporations invest considerable resources to create, preserve and analyze their data; yet while organizations are interested in protecting against unauthorized data transfer, there lacks a comprehensive metric to discriminate what data are at risk of leaking. This thesis motivates the need for a quantitative leakage risk metric, and provides a risk assessment system, called Whispers, for computing it. Using unsupervised machine learning techniques, Whispers uncovers themes in an organization's document corpus, including previously unknown or unclassified data. Then, by correlating the document with its authors, Whispers can identify which data are easier to contain, and conversely which are at risk. Using the Enron email database, Whispers constructs a social network segmented by topic themes. This graph uncovers communication channels within the organization. Using this social network, Whispers determines the risk of each topic by measuring the rate at which simulated leaks are not detected. For the Enron set, Whispers identified 18 separate topic themes between January 1999 and December 2000. The highest risk data emanated from the legal department with a leakage risk as high as 60%. === Dissertation/Thesis === Masters Thesis Computer Science 2014