Summary: | 碩士 === 國立成功大學 === 統計學系碩博士班 === 100 === The main focus of NGS data analysis is read assemble. For NGS data, relative few people discussed the determination and quality score for bases. Because the accuracy of base determination affects following reads assembly, and hence analysis, it is very important if qualitative findings are to be assured in biodiversity detection and downstream statistics analysis. Cross-talk matrix is proposed by several scholars, say Giddings et al (1993). The application of cross-talk matrix enhances base-call reliability and reduces the prediction error rate. In this thesis, SN ratio will be proposed in the selection of optimal row number used, in the estimation of cross-talk matrix.
As to the issue of quality for base, the DNA data of Miscanthus is used to build model, and to estimate the parameters of model in simulating base scattered behavior. We also use the √(m^2+d^2 ) proposed by Lawrence and Solovyev (1994) to establish the quality score, and the extreme behavior of an index distribution suggested by Kao (2011). We also measure the correlation between SN ratio and the quality score, and through simulation, we discover that √(m^2+d^2 ) has positive correlation with SN ratio.
|