A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism

This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based m...

Full description

Bibliographic Details
Main Authors: Lan Huang, Shunan Zhuang, Kangping Wang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9001015/
id doaj-33e68ea21e9849bea51303d140274ab4
record_format Article
spelling doaj-33e68ea21e9849bea51303d140274ab42021-03-30T02:33:11ZengIEEEIEEE Access2169-35362020-01-018362023620910.1109/ACCESS.2020.29746749001015A Text Normalization Method for Speech Synthesis Based on Local Attention MechanismLan Huang0https://orcid.org/0000-0003-3233-3777Shunan Zhuang1https://orcid.org/0000-0003-3567-7313Kangping Wang2College of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaThis paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization.https://ieeexplore.ieee.org/document/9001015/Natural language processingdeep learningtext normalization
collection DOAJ
language English
format Article
sources DOAJ
author Lan Huang
Shunan Zhuang
Kangping Wang
spellingShingle Lan Huang
Shunan Zhuang
Kangping Wang
A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
IEEE Access
Natural language processing
deep learning
text normalization
author_facet Lan Huang
Shunan Zhuang
Kangping Wang
author_sort Lan Huang
title A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_short A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_full A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_fullStr A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_full_unstemmed A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_sort text normalization method for speech synthesis based on local attention mechanism
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization.
topic Natural language processing
deep learning
text normalization
url https://ieeexplore.ieee.org/document/9001015/
work_keys_str_mv AT lanhuang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
AT shunanzhuang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
AT kangpingwang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
AT lanhuang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
AT shunanzhuang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
AT kangpingwang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
_version_ 1724184938448683008