Assessing operational impact in enterprise systems with dependency discovery and usage mining

A framework for monitoring the dependencies between users, applications, and other system components, combined with the actual access times and frequencies, was proposed. Operating system commands were used to extract event information from the end-user workstations about the dependencies between sy...

Full description

Bibliographic Details
Main Author: Moss, Mark Bomi
Published: Georgia Institute of Technology 2010
Subjects:
Online Access:http://hdl.handle.net/1853/31795
Description
Summary:A framework for monitoring the dependencies between users, applications, and other system components, combined with the actual access times and frequencies, was proposed. Operating system commands were used to extract event information from the end-user workstations about the dependencies between system, application and infrastructure components. Access times of system components were recorded, and data mining tools were leveraged to detect usage patterns. This information was integrated and used to predict whether or not the failure of a component would cause an operational impact during certain time periods. The framework was designed to minimize installation and management overhead, to consume minimal system resources (e.g. network bandwidth), and to be deployable on a variety of enterprise systems, including those with low-bandwidth and partial-connectivity characteristics. The framework was implemented in a test environment to demonstrate the feasibility of this approach. The system was tested on small-scale (6 computers in the GT CERCS Laboratory over 35 days) and large-scale (76 CPR nodes across the entire GT campus over 4 months) data sets. The average size of the impact topology was shown to be approximately 4% of the complete topology, and this size reduction was related to providing system administrators the capability to better identify those users and resources most likely to be affected by a designated set of component failures during a designated time period.