Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation

Oral evaluation is one of the most critical processes in children’s language learning. Traditionally, the Scoring Rubric is widely used in oral evaluation for providing a ranking score by assessing word accuracy, phoneme accuracy, fluency, and accent position of a tester. In recent years,...

Full description

Bibliographic Details
Main Authors:	Liu Zhang, Chao Shu, Jin Guo, Hanyi Zhang, Cheng Xie, Qing Liu
Format:	Article
Language:	English
Published:	MDPI AG 2020-03-01
Series:	Electronics
Subjects:	oral evaluation generative adversarial network neural audio caption gated recurrent unit long short-term memory
Online Access:	https://www.mdpi.com/2079-9292/9/3/424

id	doaj-d2d89d8848e04797b941135a2d34416b
record_format	Article
spelling	doaj-d2d89d8848e04797b941135a2d34416b2020-11-25T02:25:12ZengMDPI AGElectronics2079-92922020-03-019342410.3390/electronics9030424electronics9030424Generative Adversarial Network-Based Neural Audio Caption Model for Oral EvaluationLiu Zhang0Chao Shu1Jin Guo2Hanyi Zhang3Cheng Xie4Qing Liu5School of Software, Yunnan University; Kunming 650504, ChinaSchool of Software, Yunnan University; Kunming 650504, ChinaSchool of Software, Yunnan University; Kunming 650504, ChinaSchool of Software, Yunnan University; Kunming 650504, ChinaSchool of Software, Yunnan University; Kunming 650504, ChinaSchool of Software, Yunnan University; Kunming 650504, ChinaOral evaluation is one of the most critical processes in children’s language learning. Traditionally, the Scoring Rubric is widely used in oral evaluation for providing a ranking score by assessing word accuracy, phoneme accuracy, fluency, and accent position of a tester. In recent years, by the emerging demands of the market, oral evaluation requires not only providing a single score from pronunciation but also in-depth, meaning comments based on content, context, logic, and understanding. However, the Scoring Rubric requires massive human work (oral evaluation experts) to provide such deep meaning comments. It is considered uneconomical and inefficient in the current market. Therefore, this paper proposes an automated expert comment generation approach for oral evaluation. The approach first extracts the oral features from the children’s audio as well as the text features from the corresponding expert comments. Then, a Gated Recurrent Unit (GRU) is applied to encode the oral features into the model. Afterwards, a Long Short-Term Memory (LSTM) model is applied to train the mappings between oral features and text features and generate expert comments for the new coming oral audio. Finally, a Generative Adversarial Network (GAN) is combined to improve the quality of the generated comments. It generates pseudo-comments to train the discriminator to recognize the human-like comments. The proposed approach is evaluated in a real-world audio dataset (children oral audio) collected by our collaborative company. The proposed approach is also integrated into a commercial application to generate expert comments for children’s oral evaluation. The experimental results and the lessons learned from real-world applications show that the proposed approach is effective for providing meaningful comments for oral evaluation.https://www.mdpi.com/2079-9292/9/3/424oral evaluationgenerative adversarial networkneural audio captiongated recurrent unitlong short-term memory
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Liu Zhang Chao Shu Jin Guo Hanyi Zhang Cheng Xie Qing Liu
spellingShingle	Liu Zhang Chao Shu Jin Guo Hanyi Zhang Cheng Xie Qing Liu Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation Electronics oral evaluation generative adversarial network neural audio caption gated recurrent unit long short-term memory
author_facet	Liu Zhang Chao Shu Jin Guo Hanyi Zhang Cheng Xie Qing Liu
author_sort	Liu Zhang
title	Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
title_short	Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
title_full	Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
title_fullStr	Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
title_full_unstemmed	Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
title_sort	generative adversarial network-based neural audio caption model for oral evaluation
publisher	MDPI AG
series	Electronics
issn	2079-9292
publishDate	2020-03-01
description	Oral evaluation is one of the most critical processes in children’s language learning. Traditionally, the Scoring Rubric is widely used in oral evaluation for providing a ranking score by assessing word accuracy, phoneme accuracy, fluency, and accent position of a tester. In recent years, by the emerging demands of the market, oral evaluation requires not only providing a single score from pronunciation but also in-depth, meaning comments based on content, context, logic, and understanding. However, the Scoring Rubric requires massive human work (oral evaluation experts) to provide such deep meaning comments. It is considered uneconomical and inefficient in the current market. Therefore, this paper proposes an automated expert comment generation approach for oral evaluation. The approach first extracts the oral features from the children’s audio as well as the text features from the corresponding expert comments. Then, a Gated Recurrent Unit (GRU) is applied to encode the oral features into the model. Afterwards, a Long Short-Term Memory (LSTM) model is applied to train the mappings between oral features and text features and generate expert comments for the new coming oral audio. Finally, a Generative Adversarial Network (GAN) is combined to improve the quality of the generated comments. It generates pseudo-comments to train the discriminator to recognize the human-like comments. The proposed approach is evaluated in a real-world audio dataset (children oral audio) collected by our collaborative company. The proposed approach is also integrated into a commercial application to generate expert comments for children’s oral evaluation. The experimental results and the lessons learned from real-world applications show that the proposed approach is effective for providing meaningful comments for oral evaluation.
topic	oral evaluation generative adversarial network neural audio caption gated recurrent unit long short-term memory
url	https://www.mdpi.com/2079-9292/9/3/424
work_keys_str_mv	AT liuzhang generativeadversarialnetworkbasedneuralaudiocaptionmodelfororalevaluation AT chaoshu generativeadversarialnetworkbasedneuralaudiocaptionmodelfororalevaluation AT jinguo generativeadversarialnetworkbasedneuralaudiocaptionmodelfororalevaluation AT hanyizhang generativeadversarialnetworkbasedneuralaudiocaptionmodelfororalevaluation AT chengxie generativeadversarialnetworkbasedneuralaudiocaptionmodelfororalevaluation AT qingliu generativeadversarialnetworkbasedneuralaudiocaptionmodelfororalevaluation
_version_	1724852457101590528

Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation

Similar Items