Summary: | Robustness, distinctiveness and compactness are the three basic performance metrics for video fingerprinting, and the three factors affect each other. It is challenging to improve them simultaneously. For this reason, an end-to-end fingerprinting via a capsule net is proposed. In order to capture video features, a capsule net, based on a 3D/2D mixed convolution module, is designed, which maps raw data to compact real vector directly. A new designed adaptive margin triplet loss function is introduced, and it can automatically adjust the loss according to the sample distance. It is beneficial for reducing training difficulty and improving performance. Three open access video datasets FCVID, TRECVID and You Tube are composed to train and test, large experimental results have shown that the proposed fingerprinting achieves better performance than traditional and deep learning methods.
|