DiMeX: A Text Mining System for Mutation-Disease Association Extraction.
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2016-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0152725 |
id |
doaj-c1e0e9aa3bbb4d628d1e50dbe6e2326c |
---|---|
record_format |
Article |
spelling |
doaj-c1e0e9aa3bbb4d628d1e50dbe6e2326c2021-03-03T19:56:34ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01114e015272510.1371/journal.pone.0152725DiMeX: A Text Mining System for Mutation-Disease Association Extraction.A S M Ashique MahmoodTsung-Jung WuRaja MazumderK Vijay-ShankerThe number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.https://doi.org/10.1371/journal.pone.0152725 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
A S M Ashique Mahmood Tsung-Jung Wu Raja Mazumder K Vijay-Shanker |
spellingShingle |
A S M Ashique Mahmood Tsung-Jung Wu Raja Mazumder K Vijay-Shanker DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS ONE |
author_facet |
A S M Ashique Mahmood Tsung-Jung Wu Raja Mazumder K Vijay-Shanker |
author_sort |
A S M Ashique Mahmood |
title |
DiMeX: A Text Mining System for Mutation-Disease Association Extraction. |
title_short |
DiMeX: A Text Mining System for Mutation-Disease Association Extraction. |
title_full |
DiMeX: A Text Mining System for Mutation-Disease Association Extraction. |
title_fullStr |
DiMeX: A Text Mining System for Mutation-Disease Association Extraction. |
title_full_unstemmed |
DiMeX: A Text Mining System for Mutation-Disease Association Extraction. |
title_sort |
dimex: a text mining system for mutation-disease association extraction. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2016-01-01 |
description |
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases. |
url |
https://doi.org/10.1371/journal.pone.0152725 |
work_keys_str_mv |
AT asmashiquemahmood dimexatextminingsystemformutationdiseaseassociationextraction AT tsungjungwu dimexatextminingsystemformutationdiseaseassociationextraction AT rajamazumder dimexatextminingsystemformutationdiseaseassociationextraction AT kvijayshanker dimexatextminingsystemformutationdiseaseassociationextraction |
_version_ |
1714824980036845568 |