Summary: | Most trackers focus solely on robustness and accuracy. Visual tracking, however, is a long-term problem with a high time limitation. A tracker that is robust, accurate, with long-term sustainability and real-time processing, is of high research value and practical significance. In this paper, we comprehensively consider these requirements in order to propose a new, state-of-the-art tracker with an excellent performance. EfficientNet-B0 is adopted for the first time via neural architecture search technology as the backbone network for the tracking task. This improves the network feature extraction ability and significantly reduces the number of parameters required for the tracker backbone network. In addition, maximal Distance Intersection-over-Union is set as the target estimation method, enhancing network stability and increasing the offline training convergence rate. Channel and spatial dual attention mechanisms are employed in the target classification module to improve the discrimination of the trackers. Furthermore, the conjugate gradient optimization strategy increases the speed of the online learning target classification module. A two-stage search method combined with a screening module is proposed to enable the tracker to cope with sudden target movement and reappearance following a brief disappearance. Our proposed method has an obvious speed advantage compared with pure global searching and achieves an optimal performance on OTB2015, VOT2016, VOT2018-LT, UAV-123 and LaSOT while running at over 50 FPS.
|