Size distribution of function-based human gene sets and the split–merge model
The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclat...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The Royal Society
2016-01-01
|
Series: | Royal Society Open Science |
Subjects: | |
Online Access: | https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.160275 |
id |
doaj-bfe3e9569cfa442d93692ee857c8ca08 |
---|---|
record_format |
Article |
spelling |
doaj-bfe3e9569cfa442d93692ee857c8ca082020-11-25T03:09:37ZengThe Royal SocietyRoyal Society Open Science2054-57032016-01-013810.1098/rsos.160275160275Size distribution of function-based human gene sets and the split–merge modelWentian LiOscar FontanelliPedro MiramontesThe sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.160275gene family sizesgene set sizespower-lawbeta rank function |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Wentian Li Oscar Fontanelli Pedro Miramontes |
spellingShingle |
Wentian Li Oscar Fontanelli Pedro Miramontes Size distribution of function-based human gene sets and the split–merge model Royal Society Open Science gene family sizes gene set sizes power-law beta rank function |
author_facet |
Wentian Li Oscar Fontanelli Pedro Miramontes |
author_sort |
Wentian Li |
title |
Size distribution of function-based human gene sets and the split–merge model |
title_short |
Size distribution of function-based human gene sets and the split–merge model |
title_full |
Size distribution of function-based human gene sets and the split–merge model |
title_fullStr |
Size distribution of function-based human gene sets and the split–merge model |
title_full_unstemmed |
Size distribution of function-based human gene sets and the split–merge model |
title_sort |
size distribution of function-based human gene sets and the split–merge model |
publisher |
The Royal Society |
series |
Royal Society Open Science |
issn |
2054-5703 |
publishDate |
2016-01-01 |
description |
The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families. |
topic |
gene family sizes gene set sizes power-law beta rank function |
url |
https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.160275 |
work_keys_str_mv |
AT wentianli sizedistributionoffunctionbasedhumangenesetsandthesplitmergemodel AT oscarfontanelli sizedistributionoffunctionbasedhumangenesetsandthesplitmergemodel AT pedromiramontes sizedistributionoffunctionbasedhumangenesetsandthesplitmergemodel |
_version_ |
1724661471706611712 |