Automatic textual description of interactions between two objects in surveillance videos

Abstract The purpose of our work is to automatically generate textual video description schemas from surveillance video scenes compatible with police incidents reports. Our proposed approach is based on a generic and flexible context-free ontology. The general schema is of the form [actuator] [actio...

Full description

Bibliographic Details
Main Authors: Wael F. Youssef, Siba Haidar, Philippe Joly
Format: Article
Language:English
Published: Springer 2021-06-01
Series:SN Applied Sciences
Subjects:
Online Access:https://doi.org/10.1007/s42452-021-04534-3
Description
Summary:Abstract The purpose of our work is to automatically generate textual video description schemas from surveillance video scenes compatible with police incidents reports. Our proposed approach is based on a generic and flexible context-free ontology. The general schema is of the form [actuator] [action] [over/with] [actuated object] [+ descriptors: distance, speed, etc.]. We focus on scenes containing exactly two objects. Through elaborated steps, we generate a formatted textual description. We try to identify the existence of an interaction between the two objects, including remote interaction which does not involve physical contact and we point out when aggressivity took place in these cases. We use supervised deep learning to classify scenes into interaction or no-interaction classes and then into subclasses. The chosen descriptors used to represent subclasses are keys in surveillance systems that help generate live alerts and facilitate offline investigation.
ISSN:2523-3963
2523-3971