Compact video fingerprinting via an improved capsule net

Robustness, distinctiveness and compactness are the three basic performance metrics for video fingerprinting, and the three factors affect each other. It is challenging to improve them simultaneously. For this reason, an end-to-end fingerprinting via a capsule net is proposed. In order to capture vi...

Full description

Bibliographic Details
Main Authors: Li Xinwei, Xu Lianghao, Yang Yi
Format: Article
Language:English
Published: Taylor & Francis Group 2021-04-01
Series:Systems Science & Control Engineering
Subjects:
Online Access:http://dx.doi.org/10.1080/21642583.2020.1833782
Description
Summary:Robustness, distinctiveness and compactness are the three basic performance metrics for video fingerprinting, and the three factors affect each other. It is challenging to improve them simultaneously. For this reason, an end-to-end fingerprinting via a capsule net is proposed. In order to capture video features, a capsule net, based on a 3D/2D mixed convolution module, is designed, which maps raw data to compact real vector directly. A new designed adaptive margin triplet loss function is introduced, and it can automatically adjust the loss according to the sample distance. It is beneficial for reducing training difficulty and improving performance. Three open access video datasets FCVID, TRECVID and You Tube are composed to train and test, large experimental results have shown that the proposed fingerprinting achieves better performance than traditional and deep learning methods.
ISSN:2164-2583