Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System

Spatial audio has attracted more and more attention in the fields of virtual reality (VR), blind navigation and so on. The individualized head-related transfer functions (HRTFs) play an important role in generating spatial audio with accurate localization perception. Existing methods only focus on o...

Full description

Bibliographic Details
Main Authors: Jinyan Lu, Xiaoke Qi
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9539178/
id doaj-2656835942f846d4a86b8e03da461b19
record_format Article
spelling doaj-2656835942f846d4a86b8e03da461b192021-09-23T23:00:15ZengIEEEIEEE Access2169-35362021-01-01912872212873310.1109/ACCESS.2021.31131339539178Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering SystemJinyan Lu0Xiaoke Qi1https://orcid.org/0000-0003-2776-3297School of Electrical and Information Engineering, Henan University of Engineering, Zhengzhou, ChinaSchool of Information Management for Law, China University of Political Science and Law, Beijing, ChinaSpatial audio has attracted more and more attention in the fields of virtual reality (VR), blind navigation and so on. The individualized head-related transfer functions (HRTFs) play an important role in generating spatial audio with accurate localization perception. Existing methods only focus on one database, and do not fully utilize the information from multiple databases. In light of this, a pre-trained-based individualization model is proposed to predict HRTFs for any target user in this paper, and a real-time spatial audio rendering system built on a wearable device is implemented to produce an immersive virtual auditory display. The proposed method first builds a pre-trained model based on multiple databases using a DNN-based model combined with an autoencoder-based dimensional reduction method. This model can capture the nonlinear relationship between user-independent HRTFs and position-dependent features. Then, fine tuning is done using a transfer learning technique at a limit number of layers based on the pre-trained model. The key idea behind fine tuning is to transfer the pre-trained user-independent model to the user-dependent one based on anthropometric features. Finally, real-time issues are discussed to guarantee a fluent auditory experience during dynamic scene update, including fine-grained head-related impulse response (HRIR) acquisition, efficient spatial audio reproduction, and parallel synthesis and playback. These techniques ensure that the system is implemented with little computational cost, thus minimizing processing delay. The experimental results show that the proposed model outperforms other methods in terms of subjective and objective metrics. Additionally, our rendering system runs on HTC Vive, with almost unnoticeable delay.https://ieeexplore.ieee.org/document/9539178/Head-related transfer functionsindividualizationpre-trained modelreal timespatial hearing
collection DOAJ
language English
format Article
sources DOAJ
author Jinyan Lu
Xiaoke Qi
spellingShingle Jinyan Lu
Xiaoke Qi
Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
IEEE Access
Head-related transfer functions
individualization
pre-trained model
real time
spatial hearing
author_facet Jinyan Lu
Xiaoke Qi
author_sort Jinyan Lu
title Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
title_short Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
title_full Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
title_fullStr Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
title_full_unstemmed Pre-Trained-Based Individualization Model for Real-Time Spatial Audio Rendering System
title_sort pre-trained-based individualization model for real-time spatial audio rendering system
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Spatial audio has attracted more and more attention in the fields of virtual reality (VR), blind navigation and so on. The individualized head-related transfer functions (HRTFs) play an important role in generating spatial audio with accurate localization perception. Existing methods only focus on one database, and do not fully utilize the information from multiple databases. In light of this, a pre-trained-based individualization model is proposed to predict HRTFs for any target user in this paper, and a real-time spatial audio rendering system built on a wearable device is implemented to produce an immersive virtual auditory display. The proposed method first builds a pre-trained model based on multiple databases using a DNN-based model combined with an autoencoder-based dimensional reduction method. This model can capture the nonlinear relationship between user-independent HRTFs and position-dependent features. Then, fine tuning is done using a transfer learning technique at a limit number of layers based on the pre-trained model. The key idea behind fine tuning is to transfer the pre-trained user-independent model to the user-dependent one based on anthropometric features. Finally, real-time issues are discussed to guarantee a fluent auditory experience during dynamic scene update, including fine-grained head-related impulse response (HRIR) acquisition, efficient spatial audio reproduction, and parallel synthesis and playback. These techniques ensure that the system is implemented with little computational cost, thus minimizing processing delay. The experimental results show that the proposed model outperforms other methods in terms of subjective and objective metrics. Additionally, our rendering system runs on HTC Vive, with almost unnoticeable delay.
topic Head-related transfer functions
individualization
pre-trained model
real time
spatial hearing
url https://ieeexplore.ieee.org/document/9539178/
work_keys_str_mv AT jinyanlu pretrainedbasedindividualizationmodelforrealtimespatialaudiorenderingsystem
AT xiaokeqi pretrainedbasedindividualizationmodelforrealtimespatialaudiorenderingsystem
_version_ 1717370317823803392