YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow

One common computer vision task is to track an object as it moves from frame to frame within a video sequence. There are a myriad of applications for such capability and the underlying technologies to achieve this tracking are very well understood. More recently, deep convolutional neural networks h...

Full description

Bibliographic Details
Main Authors: Daniel S. Kaputa, Brian P. Landy
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
CNN
Online Access:https://ieeexplore.ieee.org/document/9430527/
id doaj-fc90fcb66bda4be08c01fbdc4e324359
record_format Article
spelling doaj-fc90fcb66bda4be08c01fbdc4e3243592021-06-14T23:00:19ZengIEEEIEEE Access2169-35362021-01-019824978250710.1109/ACCESS.2021.30801369430527YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical FlowDaniel S. Kaputa0https://orcid.org/0000-0002-5620-6193Brian P. Landy1https://orcid.org/0000-0001-5688-4691Rochester Institute of Technology, Rochester, NY, USARochester Institute of Technology, Rochester, NY, USAOne common computer vision task is to track an object as it moves from frame to frame within a video sequence. There are a myriad of applications for such capability and the underlying technologies to achieve this tracking are very well understood. More recently, deep convolutional neural networks have been employed to not only track, but also to classify objects as they are tracked from frame to frame. These models can be used in a tracking paradigm known as tracking by detection and can achieve very high tracking accuracy. The major drawback to these deep neural networks is the large amount of mathematical operations that must be performed for each inference which negatively impacts the number of tracked frames per second. For edge applications residing on size, weight, and power limited platforms, such as unmanned aerial vehicles, high frame rate and low latency real time tracking can be an elusive target. To overcome the limited power and computational resources of an edge compute device, various optimizations have been performed to trade off tracking speed, accuracy, power, and latency. Previous works on motion based interpolation with neural networks either do not take into account the latency accrued from camera image capture to tracking result or they compensate for this latency but are bottlenecked by the motion interpolation operation instead. The algorithm presented in this work gains the performance speedup used in previous motion based neural network inference papers and also performs a novel look back operation that is less cumbersome than other competing motion interpolation methods.https://ieeexplore.ieee.org/document/9430527/CNNclassifierdetectorneural networklow latencytracker
collection DOAJ
language English
format Article
sources DOAJ
author Daniel S. Kaputa
Brian P. Landy
spellingShingle Daniel S. Kaputa
Brian P. Landy
YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow
IEEE Access
CNN
classifier
detector
neural network
low latency
tracker
author_facet Daniel S. Kaputa
Brian P. Landy
author_sort Daniel S. Kaputa
title YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow
title_short YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow
title_full YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow
title_fullStr YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow
title_full_unstemmed YOLBO: You Only Look Back Once–A Low Latency Object Tracker Based on YOLO and Optical Flow
title_sort yolbo: you only look back once–a low latency object tracker based on yolo and optical flow
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description One common computer vision task is to track an object as it moves from frame to frame within a video sequence. There are a myriad of applications for such capability and the underlying technologies to achieve this tracking are very well understood. More recently, deep convolutional neural networks have been employed to not only track, but also to classify objects as they are tracked from frame to frame. These models can be used in a tracking paradigm known as tracking by detection and can achieve very high tracking accuracy. The major drawback to these deep neural networks is the large amount of mathematical operations that must be performed for each inference which negatively impacts the number of tracked frames per second. For edge applications residing on size, weight, and power limited platforms, such as unmanned aerial vehicles, high frame rate and low latency real time tracking can be an elusive target. To overcome the limited power and computational resources of an edge compute device, various optimizations have been performed to trade off tracking speed, accuracy, power, and latency. Previous works on motion based interpolation with neural networks either do not take into account the latency accrued from camera image capture to tracking result or they compensate for this latency but are bottlenecked by the motion interpolation operation instead. The algorithm presented in this work gains the performance speedup used in previous motion based neural network inference papers and also performs a novel look back operation that is less cumbersome than other competing motion interpolation methods.
topic CNN
classifier
detector
neural network
low latency
tracker
url https://ieeexplore.ieee.org/document/9430527/
work_keys_str_mv AT danielskaputa yolboyouonlylookbackoncex2013alowlatencyobjecttrackerbasedonyoloandopticalflow
AT brianplandy yolboyouonlylookbackoncex2013alowlatencyobjecttrackerbasedonyoloandopticalflow
_version_ 1721377871131312128