Summary: | This study aims to explore factors affecting passenger car and truck driver injury severity in passenger car-truck crashes. Police-reported crash data from 2007 to 2017 in Canada are collected. Two-vehicle crashes involving one truck and one passenger car are extracted for modeling. Different injury severities are not equally represented. To address the data imbalance issue, this study applies four different data imbalance treatment approaches, including over-sampling, under-sampling, a hybrid method, and a cost-sensitive learning method. To test the performances of different classifiers, five classification models are used, including multinomial logistic regression, Naive Bayes, Classification and Regression Tree, support vector machine, and eXtreme Gradient Boosting (XGBoost). In both the passenger car driver and truck driver injury severity analysis, XGBoost combined with cost-sensitive learning generates the best results in terms of G-mean, area under the curve, and overall accuracy. Additionally, the Shapley Additive Explanations (SHAP) approach is adopted to interpret the result of the best-performing model. Most of the explanatory variables have similar effects on passenger car and truck driver fatality risks. Nevertheless, six variables exhibit opposite effects, including the age of the passenger car driver, crash hour, the passenger car age, road surface condition, weather condition and the truck age. Results of this study could provide some valuable insights for improving truck traffic safety. For instance, properly installing traffic control devices could be an effective way to reduce fatality risks in passenger car-truck crashes. Besides, passenger car drivers should be extremely cautious when driving between midnight to 6 am on truck corridors.
|