Evaluating global and local sequence alignment methods for comparing patient medical records

Abstract Background Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment help...

Full description

Bibliographic Details
Main Authors:	Ming Huang, Nilay D. Shah, Lixia Yao
Format:	Article
Language:	English
Published:	BMC 2019-12-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Patient similarity Electronic health record Sequence alignment Temporal sequence Dynamic time warping Needleman-Wunsch algorithm
Online Access:	https://doi.org/10.1186/s12911-019-0965-y

id	doaj-52f7baf310644082a1b9d9b7d06f45e1
record_format	Article
spelling	doaj-52f7baf310644082a1b9d9b7d06f45e12020-12-20T12:35:12ZengBMCBMC Medical Informatics and Decision Making1472-69472019-12-0119S611310.1186/s12911-019-0965-yEvaluating global and local sequence alignment methods for comparing patient medical recordsMing Huang0Nilay D. Shah1Lixia Yao2Department of Health Sciences Research, Mayo ClinicDepartment of Health Sciences Research, Mayo ClinicDepartment of Health Sciences Research, Mayo ClinicAbstract Background Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. Methods We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. Results For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. Conclusions DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.https://doi.org/10.1186/s12911-019-0965-yPatient similarityElectronic health recordSequence alignmentTemporal sequenceDynamic time warpingNeedleman-Wunsch algorithm
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ming Huang Nilay D. Shah Lixia Yao
spellingShingle	Ming Huang Nilay D. Shah Lixia Yao Evaluating global and local sequence alignment methods for comparing patient medical records BMC Medical Informatics and Decision Making Patient similarity Electronic health record Sequence alignment Temporal sequence Dynamic time warping Needleman-Wunsch algorithm
author_facet	Ming Huang Nilay D. Shah Lixia Yao
author_sort	Ming Huang
title	Evaluating global and local sequence alignment methods for comparing patient medical records
title_short	Evaluating global and local sequence alignment methods for comparing patient medical records
title_full	Evaluating global and local sequence alignment methods for comparing patient medical records
title_fullStr	Evaluating global and local sequence alignment methods for comparing patient medical records
title_full_unstemmed	Evaluating global and local sequence alignment methods for comparing patient medical records
title_sort	evaluating global and local sequence alignment methods for comparing patient medical records
publisher	BMC
series	BMC Medical Informatics and Decision Making
issn	1472-6947
publishDate	2019-12-01
description	Abstract Background Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. Methods We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. Results For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. Conclusions DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.
topic	Patient similarity Electronic health record Sequence alignment Temporal sequence Dynamic time warping Needleman-Wunsch algorithm
url	https://doi.org/10.1186/s12911-019-0965-y
work_keys_str_mv	AT minghuang evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords AT nilaydshah evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords AT lixiayao evaluatingglobalandlocalsequencealignmentmethodsforcomparingpatientmedicalrecords
_version_	1724376368097001472

Evaluating global and local sequence alignment methods for comparing patient medical records

Similar Items