Summary: | 碩士 === 國立臺灣大學 === 生醫電子與資訊學研究所 === 102 === Gas chromatography / time of flight mass spectrometer (GC/TOF-MS) has become an important technique for metabolomics. We developed IDMass, a novel algorithm that accurately and sensitively extract and identify the individual components in GC/TOF-MS samples in this study. IDMass comprises five main steps: noise reduction, deconvolution window determination, chemical rank determination, component extraction and identification. First, by subtracting detector noise in mass dimension, resulting peaks generated by IDMass noise reduction step demonstrates to have better shapes and also improve the identification result. Second, IDMass detects peak regions by calculating a threshold of the baseline corrected total ion chromatogram (TIC) and refining the boundaries of the regions by local minimum nearby without manual specified parameters for evaluating threshold. Third, IDMass determines the chemical rank by a two-layer local maximum method with peak picking using continuous wavelet transform to better separate peaks from different components. The chemical rank determining method is able to detect different components with similar spectrum sensitively. Forth, IDMass uses optimal exponentially modified Gaussian (EMG) model with the particle swarm optimization (PSO) to extracts individual components without manual specify the initial value for evaluating the eluted shape. IDMass uses the peak shape information as a major constraint and it is able to extract purer components than multivariate curve resolution (MCR) approaches especially in the case that co-eluted compounds with similar spectrum. However, some eluted peaks with bad shape caused by saturation of the mass spectrometer detector limit performance of IDMass but can be resolved by sample dilution. Last, by identifying compounds sequentially, IDMass can integrate the result into a peak table automatically for further statistical analysis. The performance of IDMass was tested in a data set containing 76 standard mixtures; the recall, precision and F-score were 0.92, 0.81 and 0.86, respectively. IDMass was successfully used to quantify the identified compounds in the 76 standard mixtures.
|