Summary: | Every year, large numbers of patients in National Health Service (NHS) care suffer because of a patient safety incident. The National Patient Safety Agency (NPSA) collects large amounts of data describing individual incidents. As well as being described by categorical and numerical variables, each incident is described using free text. The aim of the work was to find quite small groups of similar incidents, which were of types that were previously unknown to the NPSA. A model of the text was produced, such that the position of each incident reflected its meaning to the greatest extent possible. The basic model was the vector space model. Dimensionality reduction was carried out in two stages: unsupervised dimensionality reduction was carried out using principal component analysis, and supervised dimensionality reduction using linear discriminant analysis. It was then possible to look for groups of incidents that were more tightly packed than would be expected given the overall distribution of the incidents. The process for assessing these groups had three stages. Firstly, a quantitative measure was used, allowing a large number of parameter combinations to be examined. The groups found for an ‘optimum’ parameter combination were then divided into categories using a qualitative filtering method. Finally, clinical experts assessed the groups qualitatively. The transition probabilities model was also examined: this model was based on the empirical probabilities that two word sequences were seen in the text. An alternative method for dimensionality reduction was to use information about the subjective meaning of a small sample of incidents elicited from experts, producing a mapping between high and low dimensional models of the text. The analysis also included the direct use of the categorical variables to model the incidents, and empirical analysis of the behaviour of high dimensional spaces.
|