A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework
Quick and accurate crash detection is important for saving lives and improved traffic incident management. In this paper, a feature fusion-based deep learning framework was developed for video-based urban traffic crash detection task, aiming at achieving a balance between detection speed and accurac...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi-Wiley
2020-01-01
|
Series: | Journal of Advanced Transportation |
Online Access: | http://dx.doi.org/10.1155/2020/8848874 |
id |
doaj-738bcfbefed348f2847c2fe23d397fc1 |
---|---|
record_format |
Article |
spelling |
doaj-738bcfbefed348f2847c2fe23d397fc12020-11-30T09:11:20ZengHindawi-WileyJournal of Advanced Transportation0197-67292042-31952020-01-01202010.1155/2020/88488748848874A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning FrameworkZhenbo Lu0Wei Zhou1Shixiang Zhang2Chen Wang3Intelligent Transportation Research Center, Southeast University, Nanjing, ChinaIntelligent Transportation Research Center, Southeast University, Nanjing, ChinaChina Design Group Co., Ltd., Nanjing, ChinaIntelligent Transportation Research Center, Southeast University, Nanjing, ChinaQuick and accurate crash detection is important for saving lives and improved traffic incident management. In this paper, a feature fusion-based deep learning framework was developed for video-based urban traffic crash detection task, aiming at achieving a balance between detection speed and accuracy with limited computing resource. In this framework, a residual neural network (ResNet) combined with attention modules was proposed to extract crash-related appearance features from urban traffic videos (i.e., a crash appearance feature extractor), which were further fed to a spatiotemporal feature fusion model, Conv-LSTM (Convolutional Long Short-Term Memory), to simultaneously capture appearance (static) and motion (dynamic) crash features. The proposed model was trained by a set of video clips covering 330 crash and 342 noncrash events. In general, the proposed model achieved an accuracy of 87.78% on the testing dataset and an acceptable detection speed (FPS > 30 with GTX 1060). Thanks to the attention module, the proposed model can capture the localized appearance features (e.g., vehicle damage and pedestrian fallen-off) of crashes better than conventional convolutional neural networks. The Conv-LSTM module outperformed conventional LSTM in terms of capturing motion features of crashes, such as the roadway congestion and pedestrians gathering after crashes. Compared to traditional motion-based crash detection model, the proposed model achieved higher detection accuracy. Moreover, it could detect crashes much faster than other feature fusion-based models (e.g., C3D). The results show that the proposed model is a promising video-based urban traffic crash detection algorithm that could be used in practice in the future.http://dx.doi.org/10.1155/2020/8848874 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhenbo Lu Wei Zhou Shixiang Zhang Chen Wang |
spellingShingle |
Zhenbo Lu Wei Zhou Shixiang Zhang Chen Wang A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework Journal of Advanced Transportation |
author_facet |
Zhenbo Lu Wei Zhou Shixiang Zhang Chen Wang |
author_sort |
Zhenbo Lu |
title |
A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework |
title_short |
A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework |
title_full |
A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework |
title_fullStr |
A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework |
title_full_unstemmed |
A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework |
title_sort |
new video-based crash detection method: balancing speed and accuracy using a feature fusion deep learning framework |
publisher |
Hindawi-Wiley |
series |
Journal of Advanced Transportation |
issn |
0197-6729 2042-3195 |
publishDate |
2020-01-01 |
description |
Quick and accurate crash detection is important for saving lives and improved traffic incident management. In this paper, a feature fusion-based deep learning framework was developed for video-based urban traffic crash detection task, aiming at achieving a balance between detection speed and accuracy with limited computing resource. In this framework, a residual neural network (ResNet) combined with attention modules was proposed to extract crash-related appearance features from urban traffic videos (i.e., a crash appearance feature extractor), which were further fed to a spatiotemporal feature fusion model, Conv-LSTM (Convolutional Long Short-Term Memory), to simultaneously capture appearance (static) and motion (dynamic) crash features. The proposed model was trained by a set of video clips covering 330 crash and 342 noncrash events. In general, the proposed model achieved an accuracy of 87.78% on the testing dataset and an acceptable detection speed (FPS > 30 with GTX 1060). Thanks to the attention module, the proposed model can capture the localized appearance features (e.g., vehicle damage and pedestrian fallen-off) of crashes better than conventional convolutional neural networks. The Conv-LSTM module outperformed conventional LSTM in terms of capturing motion features of crashes, such as the roadway congestion and pedestrians gathering after crashes. Compared to traditional motion-based crash detection model, the proposed model achieved higher detection accuracy. Moreover, it could detect crashes much faster than other feature fusion-based models (e.g., C3D). The results show that the proposed model is a promising video-based urban traffic crash detection algorithm that could be used in practice in the future. |
url |
http://dx.doi.org/10.1155/2020/8848874 |
work_keys_str_mv |
AT zhenbolu anewvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT weizhou anewvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT shixiangzhang anewvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT chenwang anewvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT zhenbolu newvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT weizhou newvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT shixiangzhang newvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework AT chenwang newvideobasedcrashdetectionmethodbalancingspeedandaccuracyusingafeaturefusiondeeplearningframework |
_version_ |
1715028053662367744 |