Summary: | This paper proposes a deep learning model based on a recurrent neural network (RNN) to solve the problem of text normalization for speech synthesis. Traditional rule-based models cannot take advantage of contextual information and do not handle text outside of rules well, while deep learning-based models can handle these problems better. Based on the seq2seq neural network, we construct a new text-normalized deep learning model that considers the context of words in sentences by using the gated recurrent unit (GRU) and local attention mechanism. In recent years, seq2seq has made many achievements in different fields of natural language processing. Our research proves that well-constructed small network models in specific application fields can also achieve meaningful results. In our small network, the local attention mechanism is used to reduce the computational complexity without decreasing accuracy. In the experiments, we compared our model with the attention-based model proposed by Sproat, the without attention model and the attention model. Experimental results show that our method reduces the network scale while gathering the main context information, overcomes the defect of high complexity of the traditional attention mechanism model, and achieves higher accuracy. It can also be seen from experiments that the most important related words are relatively close to the target word in text normalization.
|