Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks

Super-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning me...

Full description

Bibliographic Details
Main Authors:	Zhi-Song Liu, Wan-Chi Siu, Yui-Lam Chan
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Video deep learning residual network hierarchical structure super-resolution
Online Access:	https://ieeexplore.ieee.org/document/9490661/

id	doaj-187faa97d4114fec93ad40a3f1a5ea80
record_format	Article
spelling	doaj-187faa97d4114fec93ad40a3f1a5ea802021-08-09T23:00:36ZengIEEEIEEE Access2169-35362021-01-01910604910606410.1109/ACCESS.2021.30983269490661Efficient Video Super-Resolution via Hierarchical Temporal Residual NetworksZhi-Song Liu0https://orcid.org/0000-0003-4507-3097Wan-Chi Siu1https://orcid.org/0000-0001-8280-0367Yui-Lam Chan2https://orcid.org/0000-0002-1473-094XDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongSuper-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning methods to estimate the spatial and temporal information. This paper proposes a lighter residual network, based on a multi-stage back projection for multi-frame SR. We improve the back projection based residual block by adding weights for adaptive feature tuning, and add global & local connections to explore deeper feature representation. We jointly learn spatial-temporal feature maps by using the proposed Spatial Convolution Packing scheme as an attention mechanism to extract more information from both spatial and temporal domains. Different from others, our proposed network can input multiple low-resolution frames to obtain multiple super-resolved frames simultaneously. We can then further improve the video SR quality by self-ensemble enhancement to meet videos with different motions and distortions. Results of much experimental work show that our proposed approaches give large improvement over other state-of-the-art video SR methods. Compared to recent CNN based video SR works, our approaches can save, up to 60% computation time and achieve 0.6 dB PSNR improvement.https://ieeexplore.ieee.org/document/9490661/Videodeep learningresidual networkhierarchical structuresuper-resolution
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Zhi-Song Liu Wan-Chi Siu Yui-Lam Chan
spellingShingle	Zhi-Song Liu Wan-Chi Siu Yui-Lam Chan Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks IEEE Access Video deep learning residual network hierarchical structure super-resolution
author_facet	Zhi-Song Liu Wan-Chi Siu Yui-Lam Chan
author_sort	Zhi-Song Liu
title	Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_short	Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_full	Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_fullStr	Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_full_unstemmed	Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_sort	efficient video super-resolution via hierarchical temporal residual networks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	Super-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning methods to estimate the spatial and temporal information. This paper proposes a lighter residual network, based on a multi-stage back projection for multi-frame SR. We improve the back projection based residual block by adding weights for adaptive feature tuning, and add global & local connections to explore deeper feature representation. We jointly learn spatial-temporal feature maps by using the proposed Spatial Convolution Packing scheme as an attention mechanism to extract more information from both spatial and temporal domains. Different from others, our proposed network can input multiple low-resolution frames to obtain multiple super-resolved frames simultaneously. We can then further improve the video SR quality by self-ensemble enhancement to meet videos with different motions and distortions. Results of much experimental work show that our proposed approaches give large improvement over other state-of-the-art video SR methods. Compared to recent CNN based video SR works, our approaches can save, up to 60% computation time and achieve 0.6 dB PSNR improvement.
topic	Video deep learning residual network hierarchical structure super-resolution
url	https://ieeexplore.ieee.org/document/9490661/
work_keys_str_mv	AT zhisongliu efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks AT wanchisiu efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks AT yuilamchan efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks
_version_	1721213383497220096

Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks

Similar Items