Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks
Super-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning me...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9490661/ |
id |
doaj-187faa97d4114fec93ad40a3f1a5ea80 |
---|---|
record_format |
Article |
spelling |
doaj-187faa97d4114fec93ad40a3f1a5ea802021-08-09T23:00:36ZengIEEEIEEE Access2169-35362021-01-01910604910606410.1109/ACCESS.2021.30983269490661Efficient Video Super-Resolution via Hierarchical Temporal Residual NetworksZhi-Song Liu0https://orcid.org/0000-0003-4507-3097Wan-Chi Siu1https://orcid.org/0000-0001-8280-0367Yui-Lam Chan2https://orcid.org/0000-0002-1473-094XDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong KongSuper-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning methods to estimate the spatial and temporal information. This paper proposes a lighter residual network, based on a multi-stage back projection for multi-frame SR. We improve the back projection based residual block by adding weights for adaptive feature tuning, and add global & local connections to explore deeper feature representation. We jointly learn spatial-temporal feature maps by using the proposed Spatial Convolution Packing scheme as an attention mechanism to extract more information from both spatial and temporal domains. Different from others, our proposed network can input multiple low-resolution frames to obtain multiple super-resolved frames simultaneously. We can then further improve the video SR quality by self-ensemble enhancement to meet videos with different motions and distortions. Results of much experimental work show that our proposed approaches give large improvement over other state-of-the-art video SR methods. Compared to recent CNN based video SR works, our approaches can save, up to 60% computation time and achieve 0.6 dB PSNR improvement.https://ieeexplore.ieee.org/document/9490661/Videodeep learningresidual networkhierarchical structuresuper-resolution |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhi-Song Liu Wan-Chi Siu Yui-Lam Chan |
spellingShingle |
Zhi-Song Liu Wan-Chi Siu Yui-Lam Chan Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks IEEE Access Video deep learning residual network hierarchical structure super-resolution |
author_facet |
Zhi-Song Liu Wan-Chi Siu Yui-Lam Chan |
author_sort |
Zhi-Song Liu |
title |
Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks |
title_short |
Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks |
title_full |
Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks |
title_fullStr |
Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks |
title_full_unstemmed |
Efficient Video Super-Resolution via Hierarchical Temporal Residual Networks |
title_sort |
efficient video super-resolution via hierarchical temporal residual networks |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Super-Resolving (SR) video is more challenging compared with image super-resolution because of the demanding computation time. To enlarge a low-resolution video, the temporal relationship among frames must be fully exploited. We can model video SR as a multi-frame SR problem and use deep learning methods to estimate the spatial and temporal information. This paper proposes a lighter residual network, based on a multi-stage back projection for multi-frame SR. We improve the back projection based residual block by adding weights for adaptive feature tuning, and add global & local connections to explore deeper feature representation. We jointly learn spatial-temporal feature maps by using the proposed Spatial Convolution Packing scheme as an attention mechanism to extract more information from both spatial and temporal domains. Different from others, our proposed network can input multiple low-resolution frames to obtain multiple super-resolved frames simultaneously. We can then further improve the video SR quality by self-ensemble enhancement to meet videos with different motions and distortions. Results of much experimental work show that our proposed approaches give large improvement over other state-of-the-art video SR methods. Compared to recent CNN based video SR works, our approaches can save, up to 60% computation time and achieve 0.6 dB PSNR improvement. |
topic |
Video deep learning residual network hierarchical structure super-resolution |
url |
https://ieeexplore.ieee.org/document/9490661/ |
work_keys_str_mv |
AT zhisongliu efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks AT wanchisiu efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks AT yuilamchan efficientvideosuperresolutionviahierarchicaltemporalresidualnetworks |
_version_ |
1721213383497220096 |