Classify hyperdiploidy status of multiple myeloma patients using gene expression profiles.

Multiple myeloma (MM) is a cancer of antibody-making plasma cells. It frequently harbors alterations in DNA and chromosome copy numbers, and can be divided into two major subtypes, hyperdiploid (HMM) and non-hyperdiploid multiple myeloma (NHMM). The two subtypes have different survival prognosis, po...

Full description

Bibliographic Details
Main Authors: Yingxiang Li, Xujun Wang, Haiyang Zheng, Chengyang Wang, Stéphane Minvielle, Florence Magrangeas, Hervé Avet-Loiseau, Parantu K Shah, Yong Zhang, Nikhil C Munshi, Cheng Li
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3598955?pdf=render
Description
Summary:Multiple myeloma (MM) is a cancer of antibody-making plasma cells. It frequently harbors alterations in DNA and chromosome copy numbers, and can be divided into two major subtypes, hyperdiploid (HMM) and non-hyperdiploid multiple myeloma (NHMM). The two subtypes have different survival prognosis, possibly due to different but converging paths to oncogenesis. Existing methods for identifying the two subtypes are fluorescence in situ hybridization (FISH) and copy number microarrays, with increased cost and sample requirements. We hypothesize that chromosome alterations have their imprint in gene expression through dosage effect. Using five MM expression datasets that have HMM status measured by FISH and copy number microarrays, we have developed and validated a K-nearest-neighbor method to classify MM into HMM and NHMM based on gene expression profiles. Classification accuracy for test datasets ranges from 0.83 to 0.88. This classification will enable researchers to study differences and commonalities of the two MM subtypes in disease biology and prognosis using expression datasets without need for additional subtype measurements. Our study also supports the advantages of using cancer specific characteristics in feature design and pooling multiple rounds of classification results to improve accuracy. We provide R source code and processed datasets at www.ChengLiLab.org/software.
ISSN:1932-6203