Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems
Main Author: | |
---|---|
Language: | English |
Published: |
The Ohio State University / OhioLINK
2018
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu1543495231467142 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-osu15434952314671422021-08-03T07:09:05Z Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems Grayson, Marisa Rose Cognitive Psychology Engineering Systems Design anomaly response saturation software systems resilience This thesis captures patterns and challenges in anomaly response in the domain of web engineering and operations by analyzing a corpus of four actual cases. Web production software systems operate at an unprecedented scale today, requiring extensive automation to develop and maintain services. The systems are designed to regularly adapt to dynamic load to avoid the consequences of overloading portions of the network. As the software systems scale and complexity grows, it becomes more difficult to observe, model, and track how the systems function and malfunction. Anomalies inevitably arise, challenging groups of responsible engineers to recognize and understand anomalous behaviors as they plan and execute interventions to mitigate or resolve the threat of service outages.The thesis applies process tracing techniques to a corpus of four cases and extends them to capture the interplay between the human and machine agents when anomalies arise in web operations. The cases were elicited from expert practitioners dealing with anomalies that risked saturating the capacity of the system to handle continuing load across multiple elements in the interconnected network. The analysis is based on a framework distinguishing parts of the system above the line of representation, the human engineers and operators, from the automated processes and components below the computer interfaces – the Above the Line / Below the Line framework (ABL). The analysis of the incidents directly links the cascade of disturbances below the line with the cognitive work of anomaly response above the line. Recorded digital text-based communications were artifacts used to construct several new representations of the cases from two perspectives: 1) tracing the evolving hypotheses from anomalous signs and interventions, and 2) charting the four basic coping strategies used in response to mitigating overload. The hypothesis generation timelines for the cases supported findings about the importance of updating mental models during the incident response and that the activity happened explicitly and frequently between the distributed engineers. Diverse perspectives expanded the hypotheses considered and beneficially broadened the scope of investigation. Strange loops and weak representations focusing on primitive event changes hindered observability for the responders. Effects at a distance from their driving source also complicated hypothesis exploration, which is a natural consequence of the network complexity. Furthermore, the response timelines demonstrated the tendency of the automation to respond with tactical, local strategies; whereas, the humans used a variety of mostly strategic responses to fill in the gaps left by the automation. New forms of tooling and monitoring could be designed to support diagnostic search across functional relationships, as well as broaden awareness of the hypothesis exploration space. The Above the Line / Below the Line (ABL) framework provided a beneficial frame of reference for analyzing the relevant parts of the system in each incident and could be the basis for future work in decision support tool design. The case study research demonstrated specific and general patterns for complications to the cognitive work of anomaly response by the autonomy in complex web operation systems. 2018 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142 http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Cognitive Psychology Engineering Systems Design anomaly response saturation software systems resilience |
spellingShingle |
Cognitive Psychology Engineering Systems Design anomaly response saturation software systems resilience Grayson, Marisa Rose Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems |
author |
Grayson, Marisa Rose |
author_facet |
Grayson, Marisa Rose |
author_sort |
Grayson, Marisa Rose |
title |
Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems |
title_short |
Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems |
title_full |
Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems |
title_fullStr |
Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems |
title_full_unstemmed |
Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems |
title_sort |
approaching overload: diagnosis and response to anomalies in complex and automated production software systems |
publisher |
The Ohio State University / OhioLINK |
publishDate |
2018 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142 |
work_keys_str_mv |
AT graysonmarisarose approachingoverloaddiagnosisandresponsetoanomaliesincomplexandautomatedproductionsoftwaresystems |
_version_ |
1719454914946531328 |