Machine Learning Based Expressive Violin Synthesis and its Subjective Listening Evaluation

碩士 === 國立成功大學 === 資訊工程學系 === 104 === Musicians continuously manipulate the interpretational factors such as tempo, dynamics, vibrato, etc., by their experiences in order to convey different expressive intensions. Though the characteristics and skills differ from individual to individual, the profess...

Full description

Bibliographic Details
Main Authors: Chih-HongYang, 楊智弘
Other Authors: Sun-Yuan Hsieh
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/95771386589167894545
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系 === 104 === Musicians continuously manipulate the interpretational factors such as tempo, dynamics, vibrato, etc., by their experiences in order to convey different expressive intensions. Though the characteristics and skills differ from individual to individual, the professional violinists have certain viewpoints and understandings of expressive performance. We are curious about how these nuances human performers interpreted determine distinct expressions and if machine can control those interpretational factors close to or even as well as human beings in music domain in the future. This thesis combines some present techniques and delivers a synthesis system to automatically synthesize distinct expressions from deadpan performance by controlling expressive factors. We follow the works of expressive musical term analysis and derive a subset of essential features as the control parameters. The performance of synthetic sounds is evaluated by the Support Vector Machine classification task (machine, objective), and the listening test (human, subjective). Our classification results show that the synthesized results have highly significant differences from the original data of an amateur performer. Moreover, using the energy curve model based on our statistical method greatly increases accuracy. As the time we obtain great accuracy from the classification results, we still doubt whether the expressivity of our synthesized sounds can be approved. That is, the machine judgment is just the first step for evaluating our system, and we apply listening test for subjective evaluation. The listening test results are almost consistent with classification results: our synthetic version is easier to distinguish distinct expressions than amateur performance, but still has less expressivity than professional performance. Our motivation to this work is to know how human performing expressions, and to let machine manipulate as human performance. The long-term goal is to let our synthetic system perform expressivity as well as human or even can synthesize late violinists’ performance in the future.