A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism

This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based m...

Full description

Bibliographic Details
Main Authors:	Lan Huang, Shunan Zhuang, Kangping Wang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Natural language processing deep learning text normalization
Online Access:	https://ieeexplore.ieee.org/document/9001015/

id	doaj-33e68ea21e9849bea51303d140274ab4
record_format	Article
spelling	doaj-33e68ea21e9849bea51303d140274ab42021-03-30T02:33:11ZengIEEEIEEE Access2169-35362020-01-018362023620910.1109/ACCESS.2020.29746749001015A Text Normalization Method for Speech Synthesis Based on Local Attention MechanismLan Huang0https://orcid.org/0000-0003-3233-3777Shunan Zhuang1https://orcid.org/0000-0003-3567-7313Kangping Wang2College of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaThis paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization.https://ieeexplore.ieee.org/document/9001015/Natural language processingdeep learningtext normalization
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Lan Huang Shunan Zhuang Kangping Wang
spellingShingle	Lan Huang Shunan Zhuang Kangping Wang A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism IEEE Access Natural language processing deep learning text normalization
author_facet	Lan Huang Shunan Zhuang Kangping Wang
author_sort	Lan Huang
title	A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_short	A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_full	A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_fullStr	A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_full_unstemmed	A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism
title_sort	text normalization method for speech synthesis based on local attention mechanism
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization.
topic	Natural language processing deep learning text normalization
url	https://ieeexplore.ieee.org/document/9001015/
work_keys_str_mv	AT lanhuang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT shunanzhuang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT kangpingwang atextnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT lanhuang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT shunanzhuang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism AT kangpingwang textnormalizationmethodforspeechsynthesisbasedonlocalattentionmechanism
_version_	1724184938448683008

A Text Normalization Method for Speech Synthesis Based on Local Attention Mechanism

Similar Items