Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model

With recent advances in machine learning, computers are able to create more convincing text, creating a concern for an increase in fake information on the internet. At the same time, researchers are creating tools for detecting computer-generated text. Researchers have been able to exploit flaws in...

Full description

Bibliographic Details
Main Authors:	Al-Kadhimi, Staffan, Löwenström, Paul
Format:	Others
Language:	English
Published:	KTH, Skolan för elektroteknik och datavetenskap (EECS) 2020
Subjects:	Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280335

id	ndltd-UPSALLA1-oai-DiVA.org-kth-280335
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2803352020-09-09T05:21:31ZIdentification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language modelengIdentifiering av maskingenererade recensioner : 1D CNN applicerat på den neurala språkmodellen GPT-2Al-Kadhimi, StaffanLöwenström, PaulKTH, Skolan för elektroteknik och datavetenskap (EECS)KTH, Skolan för elektroteknik och datavetenskap (EECS)2020Computer SciencesDatavetenskap (datalogi)With recent advances in machine learning, computers are able to create more convincing text, creating a concern for an increase in fake information on the internet. At the same time, researchers are creating tools for detecting computer-generated text. Researchers have been able to exploit flaws in neural language models and use them against themselves; for example, GLTR provides human users with a visual representation of texts that assists in classification as human-written or machine-generated. By training a convolutional neural network (CNN) on GLTR output data from analysis of machine-generated and human-written movie reviews, we are able to take GLTR a step further and use it to automatically perform this classification. However, using a CNN with GLTR as the main source of data for classification does not appear to be enough to be on par with the best existing approaches. I och med de senaste framstegen inom maskininlärning kan datorer skapa mer och mer övertygande text, vilket skapar en oro för ökad falsk information på internet. Samtidigt vägs detta upp genom att forskare skapar verktyg för att identifiera datorgenererad text. Forskare har kunnat utnyttja svagheter i neurala språkmodeller och använda dessa mot dem. Till exempel tillhandahåller GLTR användare en visuell representation av texter, som hjälp för att klassificera dessa som människo- skrivna eller maskingenererade. Genom att träna ett faltningsnätverk (convolutional neural network, eller CNN) på utdata från GLTR-analys av maskingenererade och människoskrivna filmrecensioner, tar vi GLTR ett steg längre och använder det för att genomföra klassifikationen automatiskt. Emellertid tycks det ej vara tillräckligt att använda en CNN med GLTR som huvuddatakälla för att klassificera på en nivå som är jämförbar med de bästa existerande metoderna. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280335TRITA-EECS-EX ; 2020:389application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer Sciences Datavetenskap (datalogi)
spellingShingle	Computer Sciences Datavetenskap (datalogi) Al-Kadhimi, Staffan Löwenström, Paul Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model
description	With recent advances in machine learning, computers are able to create more convincing text, creating a concern for an increase in fake information on the internet. At the same time, researchers are creating tools for detecting computer-generated text. Researchers have been able to exploit flaws in neural language models and use them against themselves; for example, GLTR provides human users with a visual representation of texts that assists in classification as human-written or machine-generated. By training a convolutional neural network (CNN) on GLTR output data from analysis of machine-generated and human-written movie reviews, we are able to take GLTR a step further and use it to automatically perform this classification. However, using a CNN with GLTR as the main source of data for classification does not appear to be enough to be on par with the best existing approaches. === I och med de senaste framstegen inom maskininlärning kan datorer skapa mer och mer övertygande text, vilket skapar en oro för ökad falsk information på internet. Samtidigt vägs detta upp genom att forskare skapar verktyg för att identifiera datorgenererad text. Forskare har kunnat utnyttja svagheter i neurala språkmodeller och använda dessa mot dem. Till exempel tillhandahåller GLTR användare en visuell representation av texter, som hjälp för att klassificera dessa som människo- skrivna eller maskingenererade. Genom att träna ett faltningsnätverk (convolutional neural network, eller CNN) på utdata från GLTR-analys av maskingenererade och människoskrivna filmrecensioner, tar vi GLTR ett steg längre och använder det för att genomföra klassifikationen automatiskt. Emellertid tycks det ej vara tillräckligt att använda en CNN med GLTR som huvuddatakälla för att klassificera på en nivå som är jämförbar med de bästa existerande metoderna.
author	Al-Kadhimi, Staffan Löwenström, Paul
author_facet	Al-Kadhimi, Staffan Löwenström, Paul
author_sort	Al-Kadhimi, Staffan
title	Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model
title_short	Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model
title_full	Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model
title_fullStr	Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model
title_full_unstemmed	Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model
title_sort	identification of machine-generated reviews : 1d cnn applied on the gpt-2 neural language model
publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
publishDate	2020
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280335
work_keys_str_mv	AT alkadhimistaffan identificationofmachinegeneratedreviews1dcnnappliedonthegpt2neurallanguagemodel AT lowenstrompaul identificationofmachinegeneratedreviews1dcnnappliedonthegpt2neurallanguagemodel AT alkadhimistaffan identifieringavmaskingenereraderecensioner1dcnnappliceratpadenneuralasprakmodellengpt2 AT lowenstrompaul identifieringavmaskingenereraderecensioner1dcnnappliceratpadenneuralasprakmodellengpt2
_version_	1719339472524410880

Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model

Similar Items