A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based m...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9001015/ |
id |
doaj-33e68ea21e9849bea51303d140274ab4 |
---|---|
record_format |
Article |
spelling |
doaj-33e68ea21e9849bea51303d140274ab42021-03-30T02:33:11ZengIEEEIEEE Access2169-35362020-01-018362023620910.1109/ACCESS.2020.29746749001015A Text Normalization Method for Speech Synthesis Based on Local Attention MechanismLan Huang0https://orcid.org/0000-0003-3233-3777Shunan Zhuang1https://orcid.org/0000-0003-3567-7313Kangping Wang2College of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaThis paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization.https://ieeexplore.ieee.org/document/9001015/Natural language processingdeep learningtext normalization |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lan Huang Shunan Zhuang Kangping Wang |
spellingShingle |
Lan Huang Shunan Zhuang Kangping Wang A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism IEEE Access Natural language processing deep learning text normalization |
author_facet |
Lan Huang Shunan Zhuang Kangping Wang |
author_sort |
Lan Huang |
title |
A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism |
title_short |
A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism |
title_full |
A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism |
title_fullStr |
A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism |
title_full_unstemmed |
A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism |
title_sort |
text normalization method for speech synthesis based on local attention mechanism |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization. |
topic |
Natural language processing deep learning text normalization |
url |
https://ieeexplore.ieee.org/document/9001015/ |
work_keys_str_mv |
AT lanhuang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT shunanzhuang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT kangpingwang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT lanhuang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT shunanzhuang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT kangpingwang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism |
_version_ |
1724184938448683008 |