DISCOVERING DRIVER MUTATIONS IN BIOLOGICAL DATA

Background Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which muta...

Full description

Bibliographic Details
Main Author: Bokhari, Yahya
Format: Others
Published: VCU Scholars Compass 2018
Subjects:
Online Access:https://scholarscompass.vcu.edu/etd/5637
https://scholarscompass.vcu.edu/cgi/viewcontent.cgi?article=6724&context=etd
Description
Summary:Background Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which mutations are drivers - play a role in oncogenesis, and which are passengers - do not play a role. One way of addressing this question is through inspection of somatic mutations in DNA of cancer samples from a cohort of patients and detection of patterns that differentiate driver from passenger mutations. Results We propose QuaDMutEx an QuadMutNetEx, a method that incorporates three novel elements: a new gene set penalty that includes non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty, through a combination of heuristic Monte Carlo optimization and exact binary quadratic programming. QuaDMutNetEx is our proposed method that combines protein-protein interaction networks to the method elements of QuaDMutEx. In particular, QuaDMutEx incorporates three novel elements: a non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty. In the new method, we incorporated a new quadratic rewarding term that prefers gene solution set that is connected with respect to protein-protein interaction networks. Compared to existing methods, the proposed algorithm finds sets of putative driver genes that show higher coverage and lower excess coverage in eight sets of cancer samples coming from brain, ovarian, lung, and breast tumors. Conclusions Superior ability to improve on both coverage and excess coverage on different types of cancer shows that QuaDMutEx and QuaDMutNetEx are tools that should be part of a state-of-the-art toolbox in the driver gene discovery pipeline. It can detect genes harboring rare driver mutations that may be missed by existing methods.