Fisher linear discriminant analysis for classification and prediction of genomic susceptibility to stomach and colorectal cancers based on six STR loci in a northern Chinese Han population

Objective Gastrointestinal cancer is the leading cause of cancer-related death worldwide. The aim of this study was to verify whether the genotype of six short tandem repeat (STR) loci including AR, Bat-25, D5S346, ER1, ER2, and FGA is associated with the risk of gastric cancer (GC) and colorectal c...

Full description

Bibliographic Details
Main Authors: Shuhong Hao, Ming Ren, Dong Li, Yujie Sui, Qingyu Wang, Gaoyang Chen, Zhaoyan Li, Qiwei Yang
Format: Article
Language:English
Published: PeerJ Inc. 2019-05-01
Series:PeerJ
Subjects:
STR
Online Access:https://peerj.com/articles/7004.pdf
Description
Summary:Objective Gastrointestinal cancer is the leading cause of cancer-related death worldwide. The aim of this study was to verify whether the genotype of six short tandem repeat (STR) loci including AR, Bat-25, D5S346, ER1, ER2, and FGA is associated with the risk of gastric cancer (GC) and colorectal cancer (CRC) and to develop a model that allows early diagnosis and prediction of inherited genomic susceptibility to GC and CRC. Methods Alleles of six STR loci were determined using the peripheral blood of six colon cancer patients, five rectal cancer patients, eight GC patients, and 30 healthy controls. Fisher linear discriminant analysis (FDA) was used to establish the discriminant formula to distinguish GC and CRC patients from healthy controls. Leave-one-out cross validation and receiver operating characteristic (ROC) curves were used to validate the accuracy of the formula. The relationship between the STR status and immunohistochemical (IHC) and tumor markers was analyzed using multiple correspondence analysis. Results D5S346 was confirmed as a GC- and CRC-related STR locus. For the first time, we established a discriminant formula on the basis of the six STR loci, which was used to estimate the risk coefficient of suffering from GC and CRC. The model was statistically significant (Wilks’ lambda = 0.471, χ2 = 30.488, df = 13, and p = 0.004). The results of leave-one-out cross validation showed that the sensitivity of the formula was 73.7% and the specificity was 76.7%. The area under the ROC curve (AUC) was 0.926, with a sensitivity of 73.7% and a specificity of 93.3%. The STR status was shown to have a certain relationship with the expression of some IHC markers and the level of some tumor markers. Conclusions The results of this study complement clinical diagnostic criteria and present markers for early prediction of GC and CRC. This approach will aid in improving risk awareness of susceptible individuals and contribute to reducing the incidence of GC and CRC by prevention and early detection.
ISSN:2167-8359