Summary: | 碩士 === 國防醫學院 === 公共衛生學研究所 === 98 === Background: Cigarette smoke is a complex mixture of many chemicals, and is also an important risk factor for many diseases. Recent studies have demonstrated that the gene regulation in human airway epithelium by smoking may be related to certain adverse health events. Furthermore, the regulation effect of smoking persists many years even after smoking cessation, which is a potential biomarker of smoking history. However, present studies still lack of consistency, which suggests a pool-data analysis is needed to confirm the gene regulation effect of smoking on human airway epithelium.
Methods: Microarray data were derived from NCBI GEO database and published citations, the inclusion criteria were as follows: microarray data regarding the gene regulation effect of smoking, sampling of human airway epithelium, were not in vitro experiments, and demographic data of cases were provided. A total of 302 microarray data were included in this study.
Linear regression models were used to identify the gene regulation by smoking and its reversibility, controlling age, gender, and data sources. Ingenuity Systems IPA (Ingenuity Pathway Analysis) software was used to analyze the biology function of genes of different reversibility. SPSS Clementine was used to construct the detection model of smoking history using ANN (artificial neural network) model and to evaluate the performance of models with different number of genes.
Results and discussion: With linear regression model, 149 genes were found to be regulated by smoking, including 72 rapidly reversible genes, 28 slowly reversible genes, and 49 irreversible genes. Genes regulated by smoking were mostly related to oxidoreduction and metabolism of xenobiotics. Among slowly reversible and irreversible genes, genes related to androgen or estrogen metabolism were also enriched, and may contribute to the elevated risk of estrogen-related health adverse events among female smoker and former smokers.
The detection model with 20 genes had the best performance, with accuracy rates for training set and test set 0.92 and 0.89, respectively. Models with accuracy rate for training set ranged from 0.87 to 0.92 had the best generalization.
Among the 20 genes chosen by ANN model, the regulation of PI3 is considered to be related to the risk of COPD (Chronic obstructive pulmonary disease). AKR1B10, AKR1C1, AKR1C3, ALDH3A1, CYP1B1, NQO1 and UCHL1 are considered to be related to the risk of lung cancer. Furthermore, the regulation of AKR1C1, ALDH3A1 and UCHL1 are related to drug resistance and metastasis of cancer cells, which might explain the worsen prognosis in lung cancer patients who continuously smoke.
Conclusions: By analyzing domain available data, we identified genes regulated by smoking, and genes possibly associated to COPD and lung cancer among them. Artifical Neural Network models demonstrated how gene expression can serve the purpose of smoking history detection, and we also described the factors influencing model stability and generalization.
|