Frame-Based Alignment and Adaptive CRF for Personalized Spectral and Prosody Conversion

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 98 === Research on personalized speech synthesis is a popular issue in recent years. The personalized speech generally consists of two major factors, which are acoustic and prosodic feature. Traditionally, the personalized acoustic feature can be obtained through spe...

Full description

Bibliographic Details
Main Authors: Yu-TingChao, 趙郁婷
Other Authors: Chung-Hsien Wu
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/24672902976189437320
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 98 === Research on personalized speech synthesis is a popular issue in recent years. The personalized speech generally consists of two major factors, which are acoustic and prosodic feature. Traditionally, the personalized acoustic feature can be obtained through spectral feature transformation by voice conversion methods. Frame-based voice conversion suffers from the inaccurate results of phone pair alignment using only spectral distance and conversion results are improper. In this study, the feature vectors of parallel corpus are transformed into codewords in an eigen-space and the occurrence distribution of the codewords will be used for distance measure of DTW. Considering both spectral and eigen-codeword distribution, a more precise alignment result can be obtained. The prosodic feature is an important part for personalized speech synthesis. The prosodic boundaries of the same sentences are different since it is uttered by different speakers. To generate the personalized prosodic boundaries, the personalized prosodic boundaries prediction can be obtained using CRF model adaptation for personalized speech synthesis. The purpose of this study is to develop a personalized speech synthesis system by voice conversion using small parallel corpus. It contains two major parts: (1)The result of personalized spectral and prosody conversion can be improved by parallel corpus alignment considering both spectral distance and eigen-codeword distribution. (2)Personalized prosodic boundary prediction using CRF model adaptation. Objective and subjective tests were performed to evaluate the performance of the proposed approach. The experimental results demonstrate that the proposed method can improve the quality of personalized voice conversion.