Summary: | While the new paradigm of data-driven materials science has proven efficient in accelerated materials discovery, one challenge is whether the data-driven methods could deliver interpretable models that provide scientific insights in addition to accuracy. In this work, with the example of data-driven materials design for high-strength steels, we compared the efficiency of the recent Sure Independence Screening and Sparsifying Operator (SISSO) with several other conventional machine learning methods, Support Vector Regression (SVR), Decision Tree (DTe), and Gradient Boost Decision Tree (GBDT). The results show that SISSO gives interpretable and simple descriptors, while the accuracy is comparable to that of the relatively “black-box” model from SVR, GBDT, and DTe. The best SISSO descriptor was found to be scientifically consistent with that in previous studies. In addition, we show that combining with particle swarm optimization, the simple and explicit expression of the descriptor also bears advantages in reverse materials design, which is a general way for machine learning to not only predict but also tell what is the next possible action to be done.
|