GOLD standard dataset for Alzheimer genes

Alzheimer disease is a genetically complex multigenic neurodegenerative disorder, resulting from the interaction between multiple genes. Most of the earlier studies reported only few specific genes that have involvement in Alzheimer. However more than hundreds of susceptible genes have been observed...

Full description

Bibliographic Details
Main Authors: Sushrutha Raj, Anchal Vishnoi, Alok Srivastava
Format: Article
Language:English
Published: Elsevier 2020-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340920303334
id doaj-f0be1e603bc546bcadcf2bd0ab44832e
record_format Article
spelling doaj-f0be1e603bc546bcadcf2bd0ab44832e2020-11-25T02:58:06ZengElsevierData in Brief2352-34092020-06-0130105439GOLD standard dataset for Alzheimer genesSushrutha Raj0Anchal Vishnoi1Alok Srivastava2Amity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon 122413, IndiaInstitute of Bioinformatics and Computational Biology, Visakhapatnam, Andhra Pradesh 530017, IndiaAmity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon 122413, India; Institute of Bioinformatics and Computational Biology, Visakhapatnam, Andhra Pradesh 530017, India; Corresponding author at: Amity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon 122413, India.Alzheimer disease is a genetically complex multigenic neurodegenerative disorder, resulting from the interaction between multiple genes. Most of the earlier studies reported only few specific genes that have involvement in Alzheimer. However more than hundreds of susceptible genes have been observed, that have significant role in the development and progression of Alzheimer. Among all the existing data resources, Genetic association database is the most popular data source that contains information about genes, their association classes into positive, negative and neutral class and supporting reference. However, it contains lot of false positives and negatives associations. We have taken this data as reference and performed the double fold cross validation to compile the comprehensive list of Alzheimer genes, their association class viz, positive, negative or ambiguous with the disease and reference sentence confirming the association. The data generated will be used as a GOLD standard reference data set for the training of machine learning classifier to predict the classification of published literature not only in Alzheimer but in other diseases as well. In addition, positive associated genes data can also be used for the system level modelling or meta analysis of Alzheimer.http://www.sciencedirect.com/science/article/pii/S2352340920303334Alzheimer genesCross validationGOLD standardMeta analysisSystem modelingText classification
collection DOAJ
language English
format Article
sources DOAJ
author Sushrutha Raj
Anchal Vishnoi
Alok Srivastava
spellingShingle Sushrutha Raj
Anchal Vishnoi
Alok Srivastava
GOLD standard dataset for Alzheimer genes
Data in Brief
Alzheimer genes
Cross validation
GOLD standard
Meta analysis
System modeling
Text classification
author_facet Sushrutha Raj
Anchal Vishnoi
Alok Srivastava
author_sort Sushrutha Raj
title GOLD standard dataset for Alzheimer genes
title_short GOLD standard dataset for Alzheimer genes
title_full GOLD standard dataset for Alzheimer genes
title_fullStr GOLD standard dataset for Alzheimer genes
title_full_unstemmed GOLD standard dataset for Alzheimer genes
title_sort gold standard dataset for alzheimer genes
publisher Elsevier
series Data in Brief
issn 2352-3409
publishDate 2020-06-01
description Alzheimer disease is a genetically complex multigenic neurodegenerative disorder, resulting from the interaction between multiple genes. Most of the earlier studies reported only few specific genes that have involvement in Alzheimer. However more than hundreds of susceptible genes have been observed, that have significant role in the development and progression of Alzheimer. Among all the existing data resources, Genetic association database is the most popular data source that contains information about genes, their association classes into positive, negative and neutral class and supporting reference. However, it contains lot of false positives and negatives associations. We have taken this data as reference and performed the double fold cross validation to compile the comprehensive list of Alzheimer genes, their association class viz, positive, negative or ambiguous with the disease and reference sentence confirming the association. The data generated will be used as a GOLD standard reference data set for the training of machine learning classifier to predict the classification of published literature not only in Alzheimer but in other diseases as well. In addition, positive associated genes data can also be used for the system level modelling or meta analysis of Alzheimer.
topic Alzheimer genes
Cross validation
GOLD standard
Meta analysis
System modeling
Text classification
url http://www.sciencedirect.com/science/article/pii/S2352340920303334
work_keys_str_mv AT sushrutharaj goldstandarddatasetforalzheimergenes
AT anchalvishnoi goldstandarddatasetforalzheimergenes
AT aloksrivastava goldstandarddatasetforalzheimergenes
_version_ 1724708516151689216