Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems

Bibliographic Details
Main Author: Grayson, Marisa Rose
Language:English
Published: The Ohio State University / OhioLINK 2018
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142
id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1543495231467142
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu15434952314671422021-08-03T07:09:05Z Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems Grayson, Marisa Rose Cognitive Psychology Engineering Systems Design anomaly response saturation software systems resilience This thesis captures patterns and challenges in anomaly response in the domain of web engineering and operations by analyzing a corpus of four actual cases. Web production software systems operate at an unprecedented scale today, requiring extensive automation to develop and maintain services. The systems are designed to regularly adapt to dynamic load to avoid the consequences of overloading portions of the network. As the software systems scale and complexity grows, it becomes more difficult to observe, model, and track how the systems function and malfunction. Anomalies inevitably arise, challenging groups of responsible engineers to recognize and understand anomalous behaviors as they plan and execute interventions to mitigate or resolve the threat of service outages.The thesis applies process tracing techniques to a corpus of four cases and extends them to capture the interplay between the human and machine agents when anomalies arise in web operations. The cases were elicited from expert practitioners dealing with anomalies that risked saturating the capacity of the system to handle continuing load across multiple elements in the interconnected network. The analysis is based on a framework distinguishing parts of the system above the line of representation, the human engineers and operators, from the automated processes and components below the computer interfaces – the Above the Line / Below the Line framework (ABL). The analysis of the incidents directly links the cascade of disturbances below the line with the cognitive work of anomaly response above the line. Recorded digital text-based communications were artifacts used to construct several new representations of the cases from two perspectives: 1) tracing the evolving hypotheses from anomalous signs and interventions, and 2) charting the four basic coping strategies used in response to mitigating overload. The hypothesis generation timelines for the cases supported findings about the importance of updating mental models during the incident response and that the activity happened explicitly and frequently between the distributed engineers. Diverse perspectives expanded the hypotheses considered and beneficially broadened the scope of investigation. Strange loops and weak representations focusing on primitive event changes hindered observability for the responders. Effects at a distance from their driving source also complicated hypothesis exploration, which is a natural consequence of the network complexity. Furthermore, the response timelines demonstrated the tendency of the automation to respond with tactical, local strategies; whereas, the humans used a variety of mostly strategic responses to fill in the gaps left by the automation. New forms of tooling and monitoring could be designed to support diagnostic search across functional relationships, as well as broaden awareness of the hypothesis exploration space. The Above the Line / Below the Line (ABL) framework provided a beneficial frame of reference for analyzing the relevant parts of the system in each incident and could be the basis for future work in decision support tool design. The case study research demonstrated specific and general patterns for complications to the cognitive work of anomaly response by the autonomy in complex web operation systems. 2018 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142 http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Cognitive Psychology
Engineering
Systems Design
anomaly response
saturation
software systems
resilience
spellingShingle Cognitive Psychology
Engineering
Systems Design
anomaly response
saturation
software systems
resilience
Grayson, Marisa Rose
Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
author Grayson, Marisa Rose
author_facet Grayson, Marisa Rose
author_sort Grayson, Marisa Rose
title Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
title_short Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
title_full Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
title_fullStr Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
title_full_unstemmed Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
title_sort approaching overload: diagnosis and response to anomalies in complex and automated production software systems
publisher The Ohio State University / OhioLINK
publishDate 2018
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142
work_keys_str_mv AT graysonmarisarose approachingoverloaddiagnosisandresponsetoanomaliesincomplexandautomatedproductionsoftwaresystems
_version_ 1719454914946531328