Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

Abstract Background A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. Methods We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patien...

Full description

Bibliographic Details
Main Authors:	Ni Wang, Yanqun Huang, Honglei Liu, Zhiqiang Zhang, Lan Wei, Xiaolu Fei, Hui Chen
Format:	Article
Language:	English
Published:	BMC 2021-07-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Patient similarity Electronic medical records Semi-supervised learning k-nearest neighbors Liver diseases
Online Access:	https://doi.org/10.1186/s12911-021-01432-x

id	doaj-2769dfc18b074091a157d9a294265bff
record_format	Article
spelling	doaj-2769dfc18b074091a157d9a294265bff2021-08-01T11:32:13ZengBMCBMC Medical Informatics and Decision Making1472-69472021-07-0121S211310.1186/s12911-021-01432-xStudy on the semi-supervised learning-based patient similarity from heterogeneous electronic medical recordsNi Wang0Yanqun Huang1Honglei Liu2Zhiqiang Zhang3Lan Wei4Xiaolu Fei5Hui Chen6School of Biomedical Engineering, Capital Medical UniversitySchool of Biomedical Engineering, Capital Medical UniversitySchool of Biomedical Engineering, Capital Medical UniversitySchool of Biomedical Engineering, Capital Medical UniversityInformation Center, Xuanwu Hospital, Capital Medical UniversityInformation Center, Xuanwu Hospital, Capital Medical UniversitySchool of Biomedical Engineering, Capital Medical UniversityAbstract Background A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. Methods We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. Results As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. Conclusions This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.https://doi.org/10.1186/s12911-021-01432-xPatient similarityElectronic medical recordsSemi-supervised learningk-nearest neighborsLiver diseases
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ni Wang Yanqun Huang Honglei Liu Zhiqiang Zhang Lan Wei Xiaolu Fei Hui Chen
spellingShingle	Ni Wang Yanqun Huang Honglei Liu Zhiqiang Zhang Lan Wei Xiaolu Fei Hui Chen Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records BMC Medical Informatics and Decision Making Patient similarity Electronic medical records Semi-supervised learning k-nearest neighbors Liver diseases
author_facet	Ni Wang Yanqun Huang Honglei Liu Zhiqiang Zhang Lan Wei Xiaolu Fei Hui Chen
author_sort	Ni Wang
title	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_short	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_full	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_fullStr	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_full_unstemmed	Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
title_sort	study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records
publisher	BMC
series	BMC Medical Informatics and Decision Making
issn	1472-6947
publishDate	2021-07-01
description	Abstract Background A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. Methods We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. Results As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. Conclusions This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.
topic	Patient similarity Electronic medical records Semi-supervised learning k-nearest neighbors Liver diseases
url	https://doi.org/10.1186/s12911-021-01432-x
work_keys_str_mv	AT niwang studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT yanqunhuang studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT hongleiliu studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT zhiqiangzhang studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT lanwei studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT xiaolufei studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords AT huichen studyonthesemisupervisedlearningbasedpatientsimilarityfromheterogeneouselectronicmedicalrecords
_version_	1721245753298386944

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

Similar Items