Latent Clustering Models for Outlier Identification in Telecom Data

Collected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or...

Full description

Bibliographic Details
Main Authors:	Ye Ouyang, Alexis Huet, J. P. Shim, Mantian (Mandy) Hu
Format:	Article
Language:	English
Published:	Hindawi Limited 2016-01-01
Series:	Mobile Information Systems
Online Access:	http://dx.doi.org/10.1155/2016/1542540

id	doaj-0b3094990fa24b51b27785b98d182bc5
record_format	Article
spelling	doaj-0b3094990fa24b51b27785b98d182bc52021-07-02T02:54:40ZengHindawi LimitedMobile Information Systems1574-017X1875-905X2016-01-01201610.1155/2016/15425401542540Latent Clustering Models for Outlier Identification in Telecom DataYe Ouyang0Alexis Huet1J. P. Shim2Mantian (Mandy) Hu3Columbia University, New York, NY, USANanjing Howso Technology, Nanjing, ChinaGeorgia State University, Atlanta, GA, USADepartment of Marketing, The Chinese University of Hong Kong, Shatin, Hong KongCollected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or technical problems. Clustering models can help to identify issues by showing patterns in network data, which can quickly catch anomalies and highlight previously unseen outliers. In this article, we develop and compare clustering models for telecom data, focusing on those that include time-stamp information management. Two main models are introduced, solved in detail, and analyzed: Gaussian Probabilistic Latent Semantic Analysis (GPLSA) and time-dependent Gaussian Mixture Models (time-GMM). These models are then compared with other different clustering models, such as Gaussian model and GMM (which do not contain time-stamp information). We perform computation on both sample and telecom traffic data to show that the efficiency and robustness of GPLSA make it the superior method to detect outliers and provide results automatically with low tuning parameters or expertise requirement.http://dx.doi.org/10.1155/2016/1542540
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ye Ouyang Alexis Huet J. P. Shim Mantian (Mandy) Hu
spellingShingle	Ye Ouyang Alexis Huet J. P. Shim Mantian (Mandy) Hu Latent Clustering Models for Outlier Identification in Telecom Data Mobile Information Systems
author_facet	Ye Ouyang Alexis Huet J. P. Shim Mantian (Mandy) Hu
author_sort	Ye Ouyang
title	Latent Clustering Models for Outlier Identification in Telecom Data
title_short	Latent Clustering Models for Outlier Identification in Telecom Data
title_full	Latent Clustering Models for Outlier Identification in Telecom Data
title_fullStr	Latent Clustering Models for Outlier Identification in Telecom Data
title_full_unstemmed	Latent Clustering Models for Outlier Identification in Telecom Data
title_sort	latent clustering models for outlier identification in telecom data
publisher	Hindawi Limited
series	Mobile Information Systems
issn	1574-017X 1875-905X
publishDate	2016-01-01
description	Collected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or technical problems. Clustering models can help to identify issues by showing patterns in network data, which can quickly catch anomalies and highlight previously unseen outliers. In this article, we develop and compare clustering models for telecom data, focusing on those that include time-stamp information management. Two main models are introduced, solved in detail, and analyzed: Gaussian Probabilistic Latent Semantic Analysis (GPLSA) and time-dependent Gaussian Mixture Models (time-GMM). These models are then compared with other different clustering models, such as Gaussian model and GMM (which do not contain time-stamp information). We perform computation on both sample and telecom traffic data to show that the efficiency and robustness of GPLSA make it the superior method to detect outliers and provide results automatically with low tuning parameters or expertise requirement.
url	http://dx.doi.org/10.1155/2016/1542540
work_keys_str_mv	AT yeouyang latentclusteringmodelsforoutlieridentificationintelecomdata AT alexishuet latentclusteringmodelsforoutlieridentificationintelecomdata AT jpshim latentclusteringmodelsforoutlieridentificationintelecomdata AT mantianmandyhu latentclusteringmodelsforoutlieridentificationintelecomdata
_version_	1721342518858088448

Latent Clustering Models for Outlier Identification in Telecom Data

Similar Items