Comparison of Word Embeddings for Extraction from Medical Records

This paper is an extension of the work originally presented in the 16th International Conference on Wearable, Micro and Nano Technologies for Personalized Health. Despite using electronic medical records, free narrative text is still widely used for medical records. To make data from texts available...

Full description

Bibliographic Details
Main Authors:	Aleksei Dudchenko, Georgy Kopanitsa
Format:	Article
Language:	English
Published:	MDPI AG 2019-11-01
Series:	International Journal of Environmental Research and Public Health
Subjects:	word embedding data extraction machine learning medical records
Online Access:	https://www.mdpi.com/1660-4601/16/22/4360

id	doaj-67bd78dfb28e4a8f981c6506fbdbfeb7
record_format	Article
spelling	doaj-67bd78dfb28e4a8f981c6506fbdbfeb72020-11-25T01:55:55ZengMDPI AGInternational Journal of Environmental Research and Public Health1660-46012019-11-011622436010.3390/ijerph16224360ijerph16224360Comparison of Word Embeddings for Extraction from Medical RecordsAleksei Dudchenko0Georgy Kopanitsa1National Center for Cognitive Technologies, ITMO University, 197101 Saint-Petersburg, RussiaNational Center for Cognitive Technologies, ITMO University, 197101 Saint-Petersburg, RussiaThis paper is an extension of the work originally presented in the 16th International Conference on Wearable, Micro and Nano Technologies for Personalized Health. Despite using electronic medical records, free narrative text is still widely used for medical records. To make data from texts available for decision support systems, supervised machine learning algorithms might be successfully applied. In this work, we developed and compared a prototype of a medical data extraction system based on different artificial neural network architectures to process free medical texts in the Russian language. Three classifiers were applied to extract entities from snippets of text. Multi-layer perceptron (MLP) and convolutional neural network (CNN) classifiers showed similar results to all three embedding models. MLP exceeded convolutional network on pipelines that used the embedding model trained on medical records with preliminary lemmatization. Nevertheless, the highest F-score was achieved by CNN. CNN slightly exceeded MLP when the biggest word2vec model was applied (F-score 0.9763).https://www.mdpi.com/1660-4601/16/22/4360word embeddingdata extractionmachine learningmedical records
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Aleksei Dudchenko Georgy Kopanitsa
spellingShingle	Aleksei Dudchenko Georgy Kopanitsa Comparison of Word Embeddings for Extraction from Medical Records International Journal of Environmental Research and Public Health word embedding data extraction machine learning medical records
author_facet	Aleksei Dudchenko Georgy Kopanitsa
author_sort	Aleksei Dudchenko
title	Comparison of Word Embeddings for Extraction from Medical Records
title_short	Comparison of Word Embeddings for Extraction from Medical Records
title_full	Comparison of Word Embeddings for Extraction from Medical Records
title_fullStr	Comparison of Word Embeddings for Extraction from Medical Records
title_full_unstemmed	Comparison of Word Embeddings for Extraction from Medical Records
title_sort	comparison of word embeddings for extraction from medical records
publisher	MDPI AG
series	International Journal of Environmental Research and Public Health
issn	1660-4601
publishDate	2019-11-01
description	This paper is an extension of the work originally presented in the 16th International Conference on Wearable, Micro and Nano Technologies for Personalized Health. Despite using electronic medical records, free narrative text is still widely used for medical records. To make data from texts available for decision support systems, supervised machine learning algorithms might be successfully applied. In this work, we developed and compared a prototype of a medical data extraction system based on different artificial neural network architectures to process free medical texts in the Russian language. Three classifiers were applied to extract entities from snippets of text. Multi-layer perceptron (MLP) and convolutional neural network (CNN) classifiers showed similar results to all three embedding models. MLP exceeded convolutional network on pipelines that used the embedding model trained on medical records with preliminary lemmatization. Nevertheless, the highest F-score was achieved by CNN. CNN slightly exceeded MLP when the biggest word2vec model was applied (F-score 0.9763).
topic	word embedding data extraction machine learning medical records
url	https://www.mdpi.com/1660-4601/16/22/4360
work_keys_str_mv	AT alekseidudchenko comparisonofwordembeddingsforextractionfrommedicalrecords AT georgykopanitsa comparisonofwordembeddingsforextractionfrommedicalrecords
_version_	1724982552171642880

Comparison of Word Embeddings for Extraction from Medical Records

Similar Items