Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding

We propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization abil...

Full description

Bibliographic Details
Main Authors: Anh-Duc Nguyen, Woojae Kim, Jongyoo Kim, Weisi Lin, Sanghoon Lee
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8931794/
id doaj-185bd7f255ba44a38a2b6fe8e3f541e3
record_format Article
spelling doaj-185bd7f255ba44a38a2b6fe8e3f541e32021-03-30T00:33:06ZengIEEEIEEE Access2169-35362019-01-01717930417931910.1109/ACCESS.2019.29590198931794Video Frame Synthesis via Plug-and-Play Deep Locally Temporal EmbeddingAnh-Duc Nguyen0https://orcid.org/0000-0001-9895-5347Woojae Kim1https://orcid.org/0000-0002-8312-9736Jongyoo Kim2https://orcid.org/0000-0002-2435-9195Weisi Lin3https://orcid.org/0000-0001-9866-1947Sanghoon Lee4Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaMicrosoft Research Asia, Beijing, ChinaSchool of Computer Science and Engineering, Nanyang Technological University, SingaporeDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaWe propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization ability. Recently, deep convolutional neural networks (CNNs) have achieved good performance at the price of computation. However, to deploy a CNN, it is necessary to train it with a large-scale dataset beforehand, not to mention the process of fine tuning and adaptation afterwards. Also, despite the sharp motion results, their perceptual quality does not correlate well with their pixel-to-pixel difference metric performance due to various artifacts created by erroneous warping. In this paper, we take the advantages of both conventional and deep-learning models, and tackle the problem from a different perspective. The framework, which we call deep locally temporal embedding (DeepLTE), is powered by a deep CNN and can be used instantly like conventional models. DeepLTE fits an auto-encoding CNN to several consecutive frames and embeds some constraints on the latent representations so that new frames can be generated by interpolating new latent codes. Unlike the current deep learning paradigm which requires training on large datasets, DeepLTE works in a plug-and-play and unsupervised manner, and is able to generate an arbitrary number of frames from multiple given consecutive frames. We demonstrate that, without bells and whistles, DeepLTE outperforms existing state-of-the-art models in terms of the perceptual quality.https://ieeexplore.ieee.org/document/8931794/Frame synthesisvideo processingmanifold learningconvolutional neural networkunsupervised learning
collection DOAJ
language English
format Article
sources DOAJ
author Anh-Duc Nguyen
Woojae Kim
Jongyoo Kim
Weisi Lin
Sanghoon Lee
spellingShingle Anh-Duc Nguyen
Woojae Kim
Jongyoo Kim
Weisi Lin
Sanghoon Lee
Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding
IEEE Access
Frame synthesis
video processing
manifold learning
convolutional neural network
unsupervised learning
author_facet Anh-Duc Nguyen
Woojae Kim
Jongyoo Kim
Weisi Lin
Sanghoon Lee
author_sort Anh-Duc Nguyen
title Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding
title_short Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding
title_full Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding
title_fullStr Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding
title_full_unstemmed Video Frame Synthesis via Plug-and-Play Deep Locally Temporal Embedding
title_sort video frame synthesis via plug-and-play deep locally temporal embedding
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description We propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization ability. Recently, deep convolutional neural networks (CNNs) have achieved good performance at the price of computation. However, to deploy a CNN, it is necessary to train it with a large-scale dataset beforehand, not to mention the process of fine tuning and adaptation afterwards. Also, despite the sharp motion results, their perceptual quality does not correlate well with their pixel-to-pixel difference metric performance due to various artifacts created by erroneous warping. In this paper, we take the advantages of both conventional and deep-learning models, and tackle the problem from a different perspective. The framework, which we call deep locally temporal embedding (DeepLTE), is powered by a deep CNN and can be used instantly like conventional models. DeepLTE fits an auto-encoding CNN to several consecutive frames and embeds some constraints on the latent representations so that new frames can be generated by interpolating new latent codes. Unlike the current deep learning paradigm which requires training on large datasets, DeepLTE works in a plug-and-play and unsupervised manner, and is able to generate an arbitrary number of frames from multiple given consecutive frames. We demonstrate that, without bells and whistles, DeepLTE outperforms existing state-of-the-art models in terms of the perceptual quality.
topic Frame synthesis
video processing
manifold learning
convolutional neural network
unsupervised learning
url https://ieeexplore.ieee.org/document/8931794/
work_keys_str_mv AT anhducnguyen videoframesynthesisviaplugandplaydeeplocallytemporalembedding
AT woojaekim videoframesynthesisviaplugandplaydeeplocallytemporalembedding
AT jongyookim videoframesynthesisviaplugandplaydeeplocallytemporalembedding
AT weisilin videoframesynthesisviaplugandplaydeeplocallytemporalembedding
AT sanghoonlee videoframesynthesisviaplugandplaydeeplocallytemporalembedding
_version_ 1724188197296013312