A Preliminary Study on End - to - End Speech Synthesis System

碩士 === 國立臺北科技大學 === 電子工程系研究所 === 105 === The traditional Two-Stage speech synthesis system uses parser + HTS. The front text analysis error will affect the post-level synthesis. In this paper, we proposed an End-to-End speech synthesis system based on deep neural network. We used four sub-network to...

Full description

Bibliographic Details
Main Authors: Shu-Han Liao, 廖書漢
Other Authors: 廖元甫
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/j467fx
id ndltd-TW-105TIT05427109
record_format oai_dc
spelling ndltd-TW-105TIT054271092019-05-15T23:53:44Z http://ndltd.ncl.edu.tw/handle/j467fx A Preliminary Study on End - to - End Speech Synthesis System 端對端語音合成系統初步研究 Shu-Han Liao 廖書漢 碩士 國立臺北科技大學 電子工程系研究所 105 The traditional Two-Stage speech synthesis system uses parser + HTS. The front text analysis error will affect the post-level synthesis. In this paper, we proposed an End-to-End speech synthesis system based on deep neural network. We used four sub-network to do it, including DNNG is about Grapheme-to-Phoneme, DNNC extract character class, DNNT to do timing relation of character, and DNNS for synthesis. We have a professional announcer to record synthetic corpus, the contents of the material for the Mendelian book and about 3,000 lines of online Chinese and English essays. The experimental results show that our idea is better than traditional system in preference degree of intelligibility, naturalness, and similarity, is about 72%, 70%, and 61%. And the subjective 5-scale mean opinion score in intelligibility, naturalness, and similarity, is about 3.59, 3.1, and 3.18. Also higher than traditional system’s 3.33, 3.03, and 2.9. This result shows our idea has great performance. 廖元甫 2017 學位論文 ; thesis 101 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 電子工程系研究所 === 105 === The traditional Two-Stage speech synthesis system uses parser + HTS. The front text analysis error will affect the post-level synthesis. In this paper, we proposed an End-to-End speech synthesis system based on deep neural network. We used four sub-network to do it, including DNNG is about Grapheme-to-Phoneme, DNNC extract character class, DNNT to do timing relation of character, and DNNS for synthesis. We have a professional announcer to record synthetic corpus, the contents of the material for the Mendelian book and about 3,000 lines of online Chinese and English essays. The experimental results show that our idea is better than traditional system in preference degree of intelligibility, naturalness, and similarity, is about 72%, 70%, and 61%. And the subjective 5-scale mean opinion score in intelligibility, naturalness, and similarity, is about 3.59, 3.1, and 3.18. Also higher than traditional system’s 3.33, 3.03, and 2.9. This result shows our idea has great performance.
author2 廖元甫
author_facet 廖元甫
Shu-Han Liao
廖書漢
author Shu-Han Liao
廖書漢
spellingShingle Shu-Han Liao
廖書漢
A Preliminary Study on End - to - End Speech Synthesis System
author_sort Shu-Han Liao
title A Preliminary Study on End - to - End Speech Synthesis System
title_short A Preliminary Study on End - to - End Speech Synthesis System
title_full A Preliminary Study on End - to - End Speech Synthesis System
title_fullStr A Preliminary Study on End - to - End Speech Synthesis System
title_full_unstemmed A Preliminary Study on End - to - End Speech Synthesis System
title_sort preliminary study on end - to - end speech synthesis system
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/j467fx
work_keys_str_mv AT shuhanliao apreliminarystudyonendtoendspeechsynthesissystem
AT liàoshūhàn apreliminarystudyonendtoendspeechsynthesissystem
AT shuhanliao duānduìduānyǔyīnhéchéngxìtǒngchūbùyánjiū
AT liàoshūhàn duānduìduānyǔyīnhéchéngxìtǒngchūbùyánjiū
AT shuhanliao preliminarystudyonendtoendspeechsynthesissystem
AT liàoshūhàn preliminarystudyonendtoendspeechsynthesissystem
_version_ 1719156702531551232