Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
Spatial audio has attracted more and more attention in the fields of virtual reality (VR), blind navigation and so on. The individualized head-related transfer functions (HRTFs) play an important role in generating spatial audio with accurate localization perception. Existing methods only focus on o...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9539178/ |
id |
doaj-2656835942f846d4a86b8e03da461b19 |
---|---|
record_format |
Article |
spelling |
doaj-2656835942f846d4a86b8e03da461b192021-09-23T23:00:15ZengIEEEIEEE Access2169-35362021-01-01912872212873310.1109/ACCESS.2021.31131339539178Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering SystemJinyan Lu0Xiaoke Qi1https://orcid.org/0000-0003-2776-3297School of Electrical and Information Engineering, Henan University of Engineering, Zhengzhou, ChinaSchool of Information Management for Law, China University of Political Science and Law, Beijing, ChinaSpatial audio has attracted more and more attention in the fields of virtual reality (VR), blind navigation and so on. The individualized head-related transfer functions (HRTFs) play an important role in generating spatial audio with accurate localization perception. Existing methods only focus on one database, and do not fully utilize the information from multiple databases. In light of this, a pre-trained-based individualization model is proposed to predict HRTFs for any target user in this paper, and a real-time spatial audio rendering system built on a wearable device is implemented to produce an immersive virtual auditory display. The proposed method first builds a pre-trained model based on multiple databases using a DNN-based model combined with an autoencoder-based dimensional reduction method. This model can capture the nonlinear relationship between user-independent HRTFs and position-dependent features. Then, fine tuning is done using a transfer learning technique at a limit number of layers based on the pre-trained model. The key idea behind fine tuning is to transfer the pre-trained user-independent model to the user-dependent one based on anthropometric features. Finally, real-time issues are discussed to guarantee a fluent auditory experience during dynamic scene update, including fine-grained head-related impulse response (HRIR) acquisition, efficient spatial audio reproduction, and parallel synthesis and playback. These techniques ensure that the system is implemented with little computational cost, thus minimizing processing delay. The experimental results show that the proposed model outperforms other methods in terms of subjective and objective metrics. Additionally, our rendering system runs on HTC Vive, with almost unnoticeable delay.https://ieeexplore.ieee.org/document/9539178/Head-related transfer functionsindividualizationpre-trained modelreal timespatial hearing |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jinyan Lu Xiaoke Qi |
spellingShingle |
Jinyan Lu Xiaoke Qi Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System IEEE Access Head-related transfer functions individualization pre-trained model real time spatial hearing |
author_facet |
Jinyan Lu Xiaoke Qi |
author_sort |
Jinyan Lu |
title |
Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System |
title_short |
Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System |
title_full |
Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System |
title_fullStr |
Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System |
title_full_unstemmed |
Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System |
title_sort |
pre-trained-based individualization model for real-time spatial audio rendering system |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Spatial audio has attracted more and more attention in the fields of virtual reality (VR), blind navigation and so on. The individualized head-related transfer functions (HRTFs) play an important role in generating spatial audio with accurate localization perception. Existing methods only focus on one database, and do not fully utilize the information from multiple databases. In light of this, a pre-trained-based individualization model is proposed to predict HRTFs for any target user in this paper, and a real-time spatial audio rendering system built on a wearable device is implemented to produce an immersive virtual auditory display. The proposed method first builds a pre-trained model based on multiple databases using a DNN-based model combined with an autoencoder-based dimensional reduction method. This model can capture the nonlinear relationship between user-independent HRTFs and position-dependent features. Then, fine tuning is done using a transfer learning technique at a limit number of layers based on the pre-trained model. The key idea behind fine tuning is to transfer the pre-trained user-independent model to the user-dependent one based on anthropometric features. Finally, real-time issues are discussed to guarantee a fluent auditory experience during dynamic scene update, including fine-grained head-related impulse response (HRIR) acquisition, efficient spatial audio reproduction, and parallel synthesis and playback. These techniques ensure that the system is implemented with little computational cost, thus minimizing processing delay. The experimental results show that the proposed model outperforms other methods in terms of subjective and objective metrics. Additionally, our rendering system runs on HTC Vive, with almost unnoticeable delay. |
topic |
Head-related transfer functions individualization pre-trained model real time spatial hearing |
url |
https://ieeexplore.ieee.org/document/9539178/ |
work_keys_str_mv |
AT jinyanlu pretrainedbasedindividualizationmodelforrealtimespatialaudiorenderingsystem AT xiaokeqi pretrainedbasedindividualizationmodelforrealtimespatialaudiorenderingsystem |
_version_ |
1717370317823803392 |