Making Thin Data Thick: User Behavior Analysis with Minimum Information
abstract: With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the...
Other Authors: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/2286/R.I.34813 |
id |
ndltd-asu.edu-item-34813 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-asu.edu-item-348132018-06-22T03:06:30Z Making Thin Data Thick: User Behavior Analysis with Minimum Information abstract: With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data. This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick. In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information? In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information. The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied. Dissertation/Thesis Zafarani, Reza (Author) Liu, Huan (Advisor) Kambhampati, Subbarao (Committee member) Xue, Guoliang (Committee member) Leskovec, Jure (Committee member) Arizona State University (Publisher) Computer science Minimum Information Mining across Sites Social Signatures User Identification Users across Sites eng 205 pages Doctoral Dissertation Computer Science 2015 Doctoral Dissertation http://hdl.handle.net/2286/R.I.34813 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2015 |
collection |
NDLTD |
language |
English |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Computer science Minimum Information Mining across Sites Social Signatures User Identification Users across Sites |
spellingShingle |
Computer science Minimum Information Mining across Sites Social Signatures User Identification Users across Sites Making Thin Data Thick: User Behavior Analysis with Minimum Information |
description |
abstract: With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data.
This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick.
In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information?
In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information.
The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied. === Dissertation/Thesis === Doctoral Dissertation Computer Science 2015 |
author2 |
Zafarani, Reza (Author) |
author_facet |
Zafarani, Reza (Author) |
title |
Making Thin Data Thick: User Behavior Analysis with Minimum Information |
title_short |
Making Thin Data Thick: User Behavior Analysis with Minimum Information |
title_full |
Making Thin Data Thick: User Behavior Analysis with Minimum Information |
title_fullStr |
Making Thin Data Thick: User Behavior Analysis with Minimum Information |
title_full_unstemmed |
Making Thin Data Thick: User Behavior Analysis with Minimum Information |
title_sort |
making thin data thick: user behavior analysis with minimum information |
publishDate |
2015 |
url |
http://hdl.handle.net/2286/R.I.34813 |
_version_ |
1718700855186685952 |