An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to be...

Full description

Bibliographic Details
Main Authors:	Marco Pota, Mirko Ventura, Rosario Catelli and Massimo Catelli
Format:	Article
Language:	English
Published:	MDPI AG 2021-12-01
Series:	Sensors
Subjects:	n/a
Online Access:	https://www.mdpi.com/1424-8220/21/1/133

id	doaj-193b3b27a80d40df9327a293f14069be
record_format	Article
spelling	doaj-193b3b27a80d40df9327a293f14069be2020-12-29T00:01:53ZengMDPI AGSensors1424-82202021-12-012113313310.3390/s21010133An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in ItalianMarco Pota0Mirko Ventura1Rosario Catelli and Massimo Catelli2Institute for High Performance Computing and Networking (ICAR), National Research Council, 80131 Naples, ItalyInstitute for High Performance Computing and Networking (ICAR), National Research Council, 80131 Naples, ItalyInstitute for High Performance Computing and Networking (ICAR), National Research Council, 80131 Naples, ItalyOver the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.https://www.mdpi.com/1424-8220/21/1/133n/a
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Marco Pota Mirko Ventura Rosario Catelli and Massimo Catelli
spellingShingle	Marco Pota Mirko Ventura Rosario Catelli and Massimo Catelli An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian Sensors n/a
author_facet	Marco Pota Mirko Ventura Rosario Catelli and Massimo Catelli
author_sort	Marco Pota
title	An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
title_short	An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
title_full	An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
title_fullStr	An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
title_full_unstemmed	An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
title_sort	effective bert-based pipeline for twitter sentiment analysis: a case study in italian
publisher	MDPI AG
series	Sensors
issn	1424-8220
publishDate	2021-12-01
description	Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.
topic	n/a
url	https://www.mdpi.com/1424-8220/21/1/133
work_keys_str_mv	AT marcopota aneffectivebertbasedpipelinefortwittersentimentanalysisacasestudyinitalian AT mirkoventura aneffectivebertbasedpipelinefortwittersentimentanalysisacasestudyinitalian AT rosariocatelliandmassimocatelli aneffectivebertbasedpipelinefortwittersentimentanalysisacasestudyinitalian AT marcopota effectivebertbasedpipelinefortwittersentimentanalysisacasestudyinitalian AT mirkoventura effectivebertbasedpipelinefortwittersentimentanalysisacasestudyinitalian AT rosariocatelliandmassimocatelli effectivebertbasedpipelinefortwittersentimentanalysisacasestudyinitalian
_version_	1724368133434638336

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Similar Items