Summary: | This study established an interpretable machine learning model to predict the severity of coronavirus disease 2019 (COVID-19) and output the most crucial deterioration factors. Clinical information, laboratory tests, and chest computed tomography (CT) scans at admission were collected. Two experienced radiologists reviewed the scans for the patterns, distribution, and CT scores of lung abnormalities. Six machine learning models were established to predict the severity of COVID-19. After parameter tuning and performance comparison, the optimal model was explained using Shapley Additive explanations to output the crucial factors. This study enrolled and classified 198 patients into mild (n=162; 46.93±14.49 years old) and severe (n=36; 60.97±15.91 years old) groups. The severe group had a higher temperature (37.42±0.99°C vs. 36.75±0.66°C), CT score at admission, neutrophil count, and neutrophil-to-lymphocyte ratio than the mild group. The XGBoost model ranked first among all models, with an AUC, sensitivity, and specificity of 0.924, 90.91%, and 97.96%, respectively. The early stage of chest CT, total CT score of the percentage of lung involvement, and age were the top three contributors to the prediction of the deterioration of XGBoost. A higher total score on chest CT had a more significant impact on the prediction. In conclusion, the XGBoost model to predict the severity of COVID-19 achieved excellent performance and output the essential factors in the deterioration process, which may help with early clinical intervention, improve prognosis, and reduce mortality.
|