GOLD standard dataset for Alzheimer genes
Alzheimer disease is a genetically complex multigenic neurodegenerative disorder, resulting from the interaction between multiple genes. Most of the earlier studies reported only few specific genes that have involvement in Alzheimer. However more than hundreds of susceptible genes have been observed...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2020-06-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340920303334 |
id |
doaj-f0be1e603bc546bcadcf2bd0ab44832e |
---|---|
record_format |
Article |
spelling |
doaj-f0be1e603bc546bcadcf2bd0ab44832e2020-11-25T02:58:06ZengElsevierData in Brief2352-34092020-06-0130105439GOLD standard dataset for Alzheimer genesSushrutha Raj0Anchal Vishnoi1Alok Srivastava2Amity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon 122413, IndiaInstitute of Bioinformatics and Computational Biology, Visakhapatnam, Andhra Pradesh 530017, IndiaAmity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon 122413, India; Institute of Bioinformatics and Computational Biology, Visakhapatnam, Andhra Pradesh 530017, India; Corresponding author at: Amity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon 122413, India.Alzheimer disease is a genetically complex multigenic neurodegenerative disorder, resulting from the interaction between multiple genes. Most of the earlier studies reported only few specific genes that have involvement in Alzheimer. However more than hundreds of susceptible genes have been observed, that have significant role in the development and progression of Alzheimer. Among all the existing data resources, Genetic association database is the most popular data source that contains information about genes, their association classes into positive, negative and neutral class and supporting reference. However, it contains lot of false positives and negatives associations. We have taken this data as reference and performed the double fold cross validation to compile the comprehensive list of Alzheimer genes, their association class viz, positive, negative or ambiguous with the disease and reference sentence confirming the association. The data generated will be used as a GOLD standard reference data set for the training of machine learning classifier to predict the classification of published literature not only in Alzheimer but in other diseases as well. In addition, positive associated genes data can also be used for the system level modelling or meta analysis of Alzheimer.http://www.sciencedirect.com/science/article/pii/S2352340920303334Alzheimer genesCross validationGOLD standardMeta analysisSystem modelingText classification |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sushrutha Raj Anchal Vishnoi Alok Srivastava |
spellingShingle |
Sushrutha Raj Anchal Vishnoi Alok Srivastava GOLD standard dataset for Alzheimer genes Data in Brief Alzheimer genes Cross validation GOLD standard Meta analysis System modeling Text classification |
author_facet |
Sushrutha Raj Anchal Vishnoi Alok Srivastava |
author_sort |
Sushrutha Raj |
title |
GOLD standard dataset for Alzheimer genes |
title_short |
GOLD standard dataset for Alzheimer genes |
title_full |
GOLD standard dataset for Alzheimer genes |
title_fullStr |
GOLD standard dataset for Alzheimer genes |
title_full_unstemmed |
GOLD standard dataset for Alzheimer genes |
title_sort |
gold standard dataset for alzheimer genes |
publisher |
Elsevier |
series |
Data in Brief |
issn |
2352-3409 |
publishDate |
2020-06-01 |
description |
Alzheimer disease is a genetically complex multigenic neurodegenerative disorder, resulting from the interaction between multiple genes. Most of the earlier studies reported only few specific genes that have involvement in Alzheimer. However more than hundreds of susceptible genes have been observed, that have significant role in the development and progression of Alzheimer. Among all the existing data resources, Genetic association database is the most popular data source that contains information about genes, their association classes into positive, negative and neutral class and supporting reference. However, it contains lot of false positives and negatives associations. We have taken this data as reference and performed the double fold cross validation to compile the comprehensive list of Alzheimer genes, their association class viz, positive, negative or ambiguous with the disease and reference sentence confirming the association. The data generated will be used as a GOLD standard reference data set for the training of machine learning classifier to predict the classification of published literature not only in Alzheimer but in other diseases as well. In addition, positive associated genes data can also be used for the system level modelling or meta analysis of Alzheimer. |
topic |
Alzheimer genes Cross validation GOLD standard Meta analysis System modeling Text classification |
url |
http://www.sciencedirect.com/science/article/pii/S2352340920303334 |
work_keys_str_mv |
AT sushrutharaj goldstandarddatasetforalzheimergenes AT anchalvishnoi goldstandarddatasetforalzheimergenes AT aloksrivastava goldstandarddatasetforalzheimergenes |
_version_ |
1724708516151689216 |