Self-Information Loss Compensation Learning for Machine-Generated Text Detection

The technology of automatic text generation by machine has always been an important task in natural language processing, but the low-quality text generated by the machine seriously affects the user experience due to poor readability and fuzzy effective information. The machine-generated text detecti...

Full description

Bibliographic Details
Main Authors:	Weikuan Wang, Ao Feng
Format:	Article
Language:	English
Published:	Hindawi Limited 2021-01-01
Series:	Mathematical Problems in Engineering
Online Access:	http://dx.doi.org/10.1155/2021/6669468

id	doaj-e1bc6f1cdefb4c9683b71437b4fcd548
record_format	Article
spelling	doaj-e1bc6f1cdefb4c9683b71437b4fcd5482021-03-01T01:13:45ZengHindawi LimitedMathematical Problems in Engineering1563-51472021-01-01202110.1155/2021/6669468Self-Information Loss Compensation Learning for Machine-Generated Text DetectionWeikuan Wang0Ao Feng1Chengdu University of Information TechnologyChengdu University of Information TechnologyThe technology of automatic text generation by machine has always been an important task in natural language processing, but the low-quality text generated by the machine seriously affects the user experience due to poor readability and fuzzy effective information. The machine-generated text detection method based on traditional machine learning relies on a large number of artificial features with detection rules. The general method of text classification based on deep learning tends to the orientation of text topics, but logical information between texts sequences is not well utilized. For this problem, we propose an end-to-end model which uses the text sequences self-information to compensate for the information loss in the modeling process, to learn the logical information between the text sequences for machine-generated text detection. This is a text classification task. We experiment on a Chinese question and answer the dataset collected from a biomedical social media, which includes human-written text and machine-generated text. The result shows that our method is effective and exceeds most baseline models.http://dx.doi.org/10.1155/2021/6669468
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Weikuan Wang Ao Feng
spellingShingle	Weikuan Wang Ao Feng Self-Information Loss Compensation Learning for Machine-Generated Text Detection Mathematical Problems in Engineering
author_facet	Weikuan Wang Ao Feng
author_sort	Weikuan Wang
title	Self-Information Loss Compensation Learning for Machine-Generated Text Detection
title_short	Self-Information Loss Compensation Learning for Machine-Generated Text Detection
title_full	Self-Information Loss Compensation Learning for Machine-Generated Text Detection
title_fullStr	Self-Information Loss Compensation Learning for Machine-Generated Text Detection
title_full_unstemmed	Self-Information Loss Compensation Learning for Machine-Generated Text Detection
title_sort	self-information loss compensation learning for machine-generated text detection
publisher	Hindawi Limited
series	Mathematical Problems in Engineering
issn	1563-5147
publishDate	2021-01-01
description	The technology of automatic text generation by machine has always been an important task in natural language processing, but the low-quality text generated by the machine seriously affects the user experience due to poor readability and fuzzy effective information. The machine-generated text detection method based on traditional machine learning relies on a large number of artificial features with detection rules. The general method of text classification based on deep learning tends to the orientation of text topics, but logical information between texts sequences is not well utilized. For this problem, we propose an end-to-end model which uses the text sequences self-information to compensate for the information loss in the modeling process, to learn the logical information between the text sequences for machine-generated text detection. This is a text classification task. We experiment on a Chinese question and answer the dataset collected from a biomedical social media, which includes human-written text and machine-generated text. The result shows that our method is effective and exceeds most baseline models.
url	http://dx.doi.org/10.1155/2021/6669468
work_keys_str_mv	AT weikuanwang selfinformationlosscompensationlearningformachinegeneratedtextdetection AT aofeng selfinformationlosscompensationlearningformachinegeneratedtextdetection
_version_	1714842518232760320

Self-Information Loss Compensation Learning for Machine-Generated Text Detection

Similar Items