Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks

Super-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning me...

Full description

Bibliographic Details
Main Authors: Zhi-Song Liu, Wan-Chi Siu, Yui-Lam Chan
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9490661/
id doaj-187faa97d4114fec93ad40a3f1a5ea80
record_format Article
spelling doaj-187faa97d4114fec93ad40a3f1a5ea802021-08-09T23:00:36ZengIEEEIEEE Access2169-35362021-01-01910604910606410.1109/ACCESS.2021.30983269490661Efficient Video Super-Resolution via Hierarchical Temporal Residual NetworksZhi-Song Liu0https://orcid.org/0000-0003-4507-3097Wan-Chi Siu1https://orcid.org/0000-0001-8280-0367Yui-Lam Chan2https://orcid.org/0000-0002-1473-094XDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongSuper-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning methods to estimate the spatial and temporal information. This paper proposes a lighter residual network, based on a multi-stage back projection for multi-frame SR. We improve the back projection based residual block by adding weights for adaptive feature tuning, and add global & local connections to explore deeper feature representation. We jointly learn spatial-temporal feature maps by using the proposed Spatial Convolution Packing scheme as an attention mechanism to extract more information from both spatial and temporal domains. Different from others, our proposed network can input multiple low-resolution frames to obtain multiple super-resolved frames simultaneously. We can then further improve the video SR quality by self-ensemble enhancement to meet videos with different motions and distortions. Results of much experimental work show that our proposed approaches give large improvement over other state-of-the-art video SR methods. Compared to recent CNN based video SR works, our approaches can save, up to 60% computation time and achieve 0.6 dB PSNR improvement.https://ieeexplore.ieee.org/document/9490661/Videodeep learningresidual networkhierarchical structuresuper-resolution
collection DOAJ
language English
format Article
sources DOAJ
author Zhi-Song Liu
Wan-Chi Siu
Yui-Lam Chan
spellingShingle Zhi-Song Liu
Wan-Chi Siu
Yui-Lam Chan
Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
IEEE Access
Video
deep learning
residual network
hierarchical structure
super-resolution
author_facet Zhi-Song Liu
Wan-Chi Siu
Yui-Lam Chan
author_sort Zhi-Song Liu
title Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_short Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_full Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_fullStr Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_full_unstemmed Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
title_sort efficient video super-resolution via hierarchical temporal residual networks
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Super-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning methods to estimate the spatial and temporal information. This paper proposes a lighter residual network, based on a multi-stage back projection for multi-frame SR. We improve the back projection based residual block by adding weights for adaptive feature tuning, and add global & local connections to explore deeper feature representation. We jointly learn spatial-temporal feature maps by using the proposed Spatial Convolution Packing scheme as an attention mechanism to extract more information from both spatial and temporal domains. Different from others, our proposed network can input multiple low-resolution frames to obtain multiple super-resolved frames simultaneously. We can then further improve the video SR quality by self-ensemble enhancement to meet videos with different motions and distortions. Results of much experimental work show that our proposed approaches give large improvement over other state-of-the-art video SR methods. Compared to recent CNN based video SR works, our approaches can save, up to 60% computation time and achieve 0.6 dB PSNR improvement.
topic Video
deep learning
residual network
hierarchical structure
super-resolution
url https://ieeexplore.ieee.org/document/9490661/
work_keys_str_mv AT zhisongliu efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks
AT wanchisiu efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks
AT yuilamchan efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks
_version_ 1721213383497220096