Analyzing the relationship between the structure and fluorescence:Machine learning method
碩士 === 國立臺灣師範大學 === 化學系 === 106 === In the study of quantitative structure-activity relationship, the proportion of data mining by machine learning method is getting higher and higher, and the use of a small number of descriptors to model a certain chemical property has always been a very important...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Online Access: | http://ndltd.ncl.edu.tw/handle/ckpfug |
id |
ndltd-TW-106NTNU5065039 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NTNU50650392019-05-16T00:52:38Z http://ndltd.ncl.edu.tw/handle/ckpfug Analyzing the relationship between the structure and fluorescence:Machine learning method 以機器學習方法分析結構與螢光波長之關係 Chou, Yi-Ming 周弈銘 碩士 國立臺灣師範大學 化學系 106 In the study of quantitative structure-activity relationship, the proportion of data mining by machine learning method is getting higher and higher, and the use of a small number of descriptors to model a certain chemical property has always been a very important part of chemical informatics. After getting the data and a large number of descriptor from the E-Dragon database, using machine learning method to find out the descriptors and algorithms for fitting the fluorescence of different substituent compounds of naphthalene and coumarin became the purpose of this experiment. The R3m, Ss, and R7u+ descriptors are selected from 1664 descriptions in order to fit the fluorescence wavelength, through the comparison and voting between four different machine learning algorithms (decision tree regression, random forest regression, GBDT regression, extreme tree regression). Then, through the comparison and test of the test set accuracy, the random forest regression is a good function for dealing with nonlinear problems and selected as the final modeling tool. The number of layers used in random forest regression is 19 layers and 65 weak learners). These three descriptors are used in this experiment as descriptors with predicted fluorescence wavelengths. After modeling, the average absolute error and the percentage error of the training set and the test set are analyzed. The average absolute error of the training set is 16 nm and the error percentage is 4%. The average absolute error of the test set is 26 nm. The percentage error is 6%. When analyzing the error results, it is also found that the degree of correlation between R3m and Ss depends on the complexity of the substituents, and the different complexity will have different effects on the molecules of different regions. If there is a high degree of correlation, that is, the substitution has multiple bonds and complexity, the prediction ability in the short wavelength range (especially purple light) is better; if the high correlation occurs on the long wavelength molecule, the model’s predictive power will be weaker. Tsai, Ming-Kang 蔡明剛 學位論文 ; thesis 70 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣師範大學 === 化學系 === 106 === In the study of quantitative structure-activity relationship, the proportion of data mining by machine learning method is getting higher and higher, and the use of a small number of descriptors to model a certain chemical property has always been a very important part of chemical informatics. After getting the data and a large number of descriptor from the E-Dragon database, using machine learning method to find out the descriptors and algorithms for fitting the fluorescence of different substituent compounds of naphthalene and coumarin became the purpose of this experiment. The R3m, Ss, and R7u+ descriptors are selected from 1664 descriptions in order to fit the fluorescence wavelength, through the comparison and voting between four different machine learning algorithms (decision tree regression, random forest regression, GBDT regression, extreme tree regression). Then, through the comparison and test of the test set accuracy, the random forest regression is a good function for dealing with nonlinear problems and selected as the final modeling tool. The number of layers used in random forest regression is 19 layers and 65 weak learners). These three descriptors are used in this experiment as descriptors with predicted fluorescence wavelengths.
After modeling, the average absolute error and the percentage error of the training set and the test set are analyzed. The average absolute error of the training set is 16 nm and the error percentage is 4%. The average absolute error of the test set is 26 nm. The percentage error is 6%. When analyzing the error results, it is also found that the degree of correlation between R3m and Ss depends on the complexity of the substituents, and the different complexity will have different effects on the molecules of different regions. If there is a high degree of correlation, that is, the substitution has multiple bonds and complexity, the prediction ability in the short wavelength range (especially purple light) is better; if the high correlation occurs on the long wavelength molecule, the model’s predictive power will be weaker.
|
author2 |
Tsai, Ming-Kang |
author_facet |
Tsai, Ming-Kang Chou, Yi-Ming 周弈銘 |
author |
Chou, Yi-Ming 周弈銘 |
spellingShingle |
Chou, Yi-Ming 周弈銘 Analyzing the relationship between the structure and fluorescence:Machine learning method |
author_sort |
Chou, Yi-Ming |
title |
Analyzing the relationship between the structure and fluorescence:Machine learning method |
title_short |
Analyzing the relationship between the structure and fluorescence:Machine learning method |
title_full |
Analyzing the relationship between the structure and fluorescence:Machine learning method |
title_fullStr |
Analyzing the relationship between the structure and fluorescence:Machine learning method |
title_full_unstemmed |
Analyzing the relationship between the structure and fluorescence:Machine learning method |
title_sort |
analyzing the relationship between the structure and fluorescence:machine learning method |
url |
http://ndltd.ncl.edu.tw/handle/ckpfug |
work_keys_str_mv |
AT chouyiming analyzingtherelationshipbetweenthestructureandfluorescencemachinelearningmethod AT zhōuyìmíng analyzingtherelationshipbetweenthestructureandfluorescencemachinelearningmethod AT chouyiming yǐjīqìxuéxífāngfǎfēnxījiégòuyǔyíngguāngbōzhǎngzhīguānxì AT zhōuyìmíng yǐjīqìxuéxífāngfǎfēnxījiégòuyǔyíngguāngbōzhǎngzhīguānxì |
_version_ |
1719171485616046080 |