Identifying diagnosis evidence of cardiogenic stroke from Chinese echocardiograph reports

Abstract Background Cardiogenic stroke has increasing morbidity in China and brought economic burden to patient families. In cardiogenic stroke diagnosis, echocardiograph examination is one of the most important examinations. Sonographers will investigate patients’ heart via echocardiograph, and des...

Full description

Bibliographic Details
Main Authors: Lu Qin, Xiaowei Xu, Lingling Ding, Zixiao Li, Jiao Li
Format: Article
Language:English
Published: BMC 2020-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-020-1106-3
Description
Summary:Abstract Background Cardiogenic stroke has increasing morbidity in China and brought economic burden to patient families. In cardiogenic stroke diagnosis, echocardiograph examination is one of the most important examinations. Sonographers will investigate patients’ heart via echocardiograph, and describe them in the echocardiograph reports. In this study, we developed a machine learning model to automatically identify diagnosis evidences of cardiogenic stroke providing to neurologist for clinical decision making. Methods We collected 4188 Chinese echocardiograph reports of 4018 patients, with average length 177 Chinese characters in free-text style. Collaborating with neurologists and sonographers, we summarized 149 phrases on diagnosis evidence of cardiogenic stroke such as “二尖瓣重度狭窄” (severe mitral stenosis), “主动脉瓣退行性变” (aortic valve degeneration) and so on. Furthermore, we developed an annotated corpus via mapping 149 phrases to the 4188 reports. We selected 11 most frequent diagnosis evidence types such as “二尖瓣狭窄” (mitral stenosis) for further identifying. The generated corpus is divided into training set and testing set in the ratio of 8:2, which is used to train and validate a machine learning model to identify the evidence of cardiogenic stroke using BiLSTM-CRF algorithm. Results Our machine learning method achieved the average performance on the diagnosis evidence identification is 98.03, 90.17 and 93.94% respectively. In addition, our method is capable to identify the novel diagnosis evidence of cardiogenic stroke description such as “二尖瓣中-重度狭窄” (mitral stenosis), “主动脉瓣退行性病变” (aortic valve calcification) et al. Conclusions In this study, we analyze the structure of the echocardiograph reports and summarized 149 phrases on diagnosis evidence of cardiogenic stroke. We use the phrases to generate an annotated corpus automatically, which greatly reduces the cost of manual annotation. The model trained based on the corpus also has a good performance on the testing set. The method of automatically identifying diagnosis evidence of cardiogenic stroke proposed in this study will be further refined in the practice.
ISSN:1472-6947