Summary: | Melanin pigmentation is a complex trait governed by many genes. Variation in melanin pigmentation within, and between, populations makes it an important trait for assisting in physical identification of an individual in forensic investigations. Utilizing a training sample (n=789) comprised of various ethnicities and SNPs (75) in 24 genes previously implicated in human or animal pigmentation studies, I determined three-SNP multiple linear regression models that accounted for large proportions of pigmentation variation in skin (45.7%), eye color (76.4%), and hair [eumelanin-to-pheomelanin (43.2%) and total melanin (76.3%)], independent of ethnic origin. Rather than implementing stepwise regression, to ascertain the three-SNP predictive models, I devised an algorithm that is likely more robust than stepwise regression. The algorithm consisted of two steps: the first step reduced the pool of 75 SNPs to a pool of 40 by selection of SNPs that were significant (p<0.05) by one-way ANOVA; the second step enabled selection of SNPs for model incorporation based on their frequency in the best-fitted models of all possible combinations of three-SNP models (i.e., 40 choose 3).Prediction models were validated utilizing an independent cohort (n=242, test sample) that was very similar in ethnic composition to the training sample. Relative shrinkage was moderate for skin reflectance (23.4%), eye color (19.4%), and eumelanin-to-pheomelanin (37.3%) of hair, and largest for total melanin (67%) of hair. Additionally, we refined our model-building algorithm, enabling visual comparison of the frequency and co-linearity due to linkage or co-inheritance of SNPs of the best-fitted models. Application of our algorithm to the test sample yielded the same or similar models as the training sample. Two of the three SNPs composing the models were the same, with some variability in the third SNP of the model.
|