Summary: | To enhance surgeons' efficiency and safety of patients, minimally invasive surgery (MIS) is widely used in a variety of clinical surgeries. Real-time surgical tool detection plays an important role in MIS. However, most methods of surgical tool detection may not achieve a good trade-off between detection speed and accuracy. We propose a real-time attention-guided convolutional neural network (CNN) for frame-by-frame detection of surgical tools in MIS videos, which comprises a coarse (CDM) and a refined (RDM) detection modules. The CDM is used to coarsely regress the parameters of locations to get the refined anchors and perform binary classification, which determines whether the anchor is a tool or background. The RDM subtly incorporates the attention mechanism to generate accurate detection results utilizing the refined anchors from CDM. Finally, a light-head module for more efficient surgical tool detection is proposed. The proposed method is compared to eight state-of-the-art detection algorithms using two public (EndoVis Challenge and ATLAS Dione) datasets and a new dataset we introduced (Cholec80-locations), which extends the Cholec80 dataset with spatial annotations of surgical tools. Our approach runs in real-time at 55.5 FPS and achieves 100, 94.05, and 91.65% mAP for the above three datasets, respectively. Our method achieves accurate, fast, and robust detection results by end-to-end training in MIS videos. The results demonstrate the effectiveness and superiority of our method over the eight state-of-the-art methods.
|