Data and Text Mining Techniques for In-Domain and Cross-Domain Applications

In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modali...

Full description

Bibliographic Details
Main Author:	Domeniconi, Giacomo <1986>
Other Authors:	Moro, Gianluca
Format:	Doctoral Thesis
Language:	en
Published:	Alma Mater Studiorum - Università di Bologna 2016
Subjects:	ING-INF/05 Sistemi di elaborazione delle informazioni
Online Access:	http://amsdottorato.unibo.it/7494/

id	ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-7494
record_format	oai_dc
spelling	ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-74942016-09-06T05:02:38Z Data and Text Mining Techniques for In-Domain and Cross-Domain Applications Domeniconi, Giacomo <1986> ING-INF/05 Sistemi di elaborazione delle informazioni In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modalities, each of which has a different representation, distribution, scale and density. For example, text is usually represented as discrete sparse word count vectors, whereas an image is represented by pixel intensities, and so on. Nowadays plenty of Data Mining and Machine Learning techniques are proposed in literature, which have already achieved significant success in many knowledge engineering areas, including classification, regression and clustering. Anyway some challenging issues remain when tackling a new problem: how to represent the problem? What approach is better to use among the huge quantity of possibilities? What is the information to be used in the Machine Learning task and how to represent it? There exist any different domains from which borrow knowledge? This dissertation proposes some possible representation approaches for problems in different domains, from text mining to genomic analysis. In particular, one of the major contributions is a different way to represent a classical classification problem: instead of using an instance related to each object (a document, or a gene, or a social post, etc.) to be classified, it is proposed to use a pair of objects or a pair object-class, using the relationship between them as label. The application of this approach is tested on both flat and hierarchical text categorization datasets, where it potentially allows the efficient addition of new categories during classification. Furthermore, the same idea is used to extract conversational threads from an unregulated pool of messages and also to classify the biomedical literature based on the genomic features treated. Alma Mater Studiorum - Università di Bologna Moro, Gianluca Sartori, Claudio 2016-05-12 Doctoral Thesis PeerReviewed application/pdf en http://amsdottorato.unibo.it/7494/ info:eu-repo/semantics/embargoedAccess info:eu-repo/date/embargoEnd/2017-02-28
collection	NDLTD
language	en
format	Doctoral Thesis
sources	NDLTD
topic	ING-INF/05 Sistemi di elaborazione delle informazioni
spellingShingle	ING-INF/05 Sistemi di elaborazione delle informazioni Domeniconi, Giacomo <1986> Data and Text Mining Techniques for In-Domain and Cross-Domain Applications
description	In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modalities, each of which has a different representation, distribution, scale and density. For example, text is usually represented as discrete sparse word count vectors, whereas an image is represented by pixel intensities, and so on. Nowadays plenty of Data Mining and Machine Learning techniques are proposed in literature, which have already achieved significant success in many knowledge engineering areas, including classification, regression and clustering. Anyway some challenging issues remain when tackling a new problem: how to represent the problem? What approach is better to use among the huge quantity of possibilities? What is the information to be used in the Machine Learning task and how to represent it? There exist any different domains from which borrow knowledge? This dissertation proposes some possible representation approaches for problems in different domains, from text mining to genomic analysis. In particular, one of the major contributions is a different way to represent a classical classification problem: instead of using an instance related to each object (a document, or a gene, or a social post, etc.) to be classified, it is proposed to use a pair of objects or a pair object-class, using the relationship between them as label. The application of this approach is tested on both flat and hierarchical text categorization datasets, where it potentially allows the efficient addition of new categories during classification. Furthermore, the same idea is used to extract conversational threads from an unregulated pool of messages and also to classify the biomedical literature based on the genomic features treated.
author2	Moro, Gianluca
author_facet	Moro, Gianluca Domeniconi, Giacomo <1986>
author	Domeniconi, Giacomo <1986>
author_sort	Domeniconi, Giacomo <1986>
title	Data and Text Mining Techniques for In-Domain and Cross-Domain Applications
title_short	Data and Text Mining Techniques for In-Domain and Cross-Domain Applications
title_full	Data and Text Mining Techniques for In-Domain and Cross-Domain Applications
title_fullStr	Data and Text Mining Techniques for In-Domain and Cross-Domain Applications
title_full_unstemmed	Data and Text Mining Techniques for In-Domain and Cross-Domain Applications
title_sort	data and text mining techniques for in-domain and cross-domain applications
publisher	Alma Mater Studiorum - Università di Bologna
publishDate	2016
url	http://amsdottorato.unibo.it/7494/
work_keys_str_mv	AT domeniconigiacomo1986 dataandtextminingtechniquesforindomainandcrossdomainapplications
_version_	1718382820479467520

Data and Text Mining Techniques for In-Domain and Cross-Domain Applications

Similar Items