A Preliminary Study on End - to - End Speech Synthesis System
碩士 === 國立臺北科技大學 === 電子工程系研究所 === 105 === The traditional Two-Stage speech synthesis system uses parser + HTS. The front text analysis error will affect the post-level synthesis. In this paper, we proposed an End-to-End speech synthesis system based on deep neural network. We used four sub-network to...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/j467fx |
id |
ndltd-TW-105TIT05427109 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105TIT054271092019-05-15T23:53:44Z http://ndltd.ncl.edu.tw/handle/j467fx A Preliminary Study on End - to - End Speech Synthesis System 端對端語音合成系統初步研究 Shu-Han Liao 廖書漢 碩士 國立臺北科技大學 電子工程系研究所 105 The traditional Two-Stage speech synthesis system uses parser + HTS. The front text analysis error will affect the post-level synthesis. In this paper, we proposed an End-to-End speech synthesis system based on deep neural network. We used four sub-network to do it, including DNNG is about Grapheme-to-Phoneme, DNNC extract character class, DNNT to do timing relation of character, and DNNS for synthesis. We have a professional announcer to record synthetic corpus, the contents of the material for the Mendelian book and about 3,000 lines of online Chinese and English essays. The experimental results show that our idea is better than traditional system in preference degree of intelligibility, naturalness, and similarity, is about 72%, 70%, and 61%. And the subjective 5-scale mean opinion score in intelligibility, naturalness, and similarity, is about 3.59, 3.1, and 3.18. Also higher than traditional system’s 3.33, 3.03, and 2.9. This result shows our idea has great performance. 廖元甫 2017 學位論文 ; thesis 101 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北科技大學 === 電子工程系研究所 === 105 === The traditional Two-Stage speech synthesis system uses parser + HTS. The front text analysis error will affect the post-level synthesis. In this paper, we proposed an End-to-End speech synthesis system based on deep neural network. We used four sub-network to do it, including DNNG is about Grapheme-to-Phoneme, DNNC extract character class, DNNT to do timing relation of character, and DNNS for synthesis. We have a professional announcer to record synthetic corpus, the contents of the material for the Mendelian book and about 3,000 lines of online Chinese and English essays. The experimental results show that our idea is better than traditional system in preference degree of intelligibility, naturalness, and similarity, is about 72%, 70%, and 61%. And the subjective 5-scale mean opinion score in intelligibility, naturalness, and similarity, is about 3.59, 3.1, and 3.18. Also higher than traditional system’s 3.33, 3.03, and 2.9. This result shows our idea has great performance.
|
author2 |
廖元甫 |
author_facet |
廖元甫 Shu-Han Liao 廖書漢 |
author |
Shu-Han Liao 廖書漢 |
spellingShingle |
Shu-Han Liao 廖書漢 A Preliminary Study on End - to - End Speech Synthesis System |
author_sort |
Shu-Han Liao |
title |
A Preliminary Study on End - to - End Speech Synthesis System |
title_short |
A Preliminary Study on End - to - End Speech Synthesis System |
title_full |
A Preliminary Study on End - to - End Speech Synthesis System |
title_fullStr |
A Preliminary Study on End - to - End Speech Synthesis System |
title_full_unstemmed |
A Preliminary Study on End - to - End Speech Synthesis System |
title_sort |
preliminary study on end - to - end speech synthesis system |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/j467fx |
work_keys_str_mv |
AT shuhanliao apreliminarystudyonendtoendspeechsynthesissystem AT liàoshūhàn apreliminarystudyonendtoendspeechsynthesissystem AT shuhanliao duānduìduānyǔyīnhéchéngxìtǒngchūbùyánjiū AT liàoshūhàn duānduìduānyǔyīnhéchéngxìtǒngchūbùyánjiū AT shuhanliao preliminarystudyonendtoendspeechsynthesissystem AT liàoshūhàn preliminarystudyonendtoendspeechsynthesissystem |
_version_ |
1719156702531551232 |