Why so many people? Explaining Nonhabitual Transport Overcrowding With Internet Data

Public transport smartcard data can be used for detection of large crowds. By comparing statistics on habitual behavior (e.g., average by time of day), one can specifically identify nonhabitual crowds, which are often very problematic for transport systems. While habitual overcrowding (e.g., peak ho...

Full description

Bibliographic Details
Main Authors: Pereira, Francisco C. (Author), Rodrigues, Filipe (Author), Polisciuc, Evgheni (Author), Ben-Akiva, Moshe E. (Contributor)
Other Authors: Massachusetts Institute of Technology. Department of Civil and Environmental Engineering (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE), 2016-02-29T19:05:36Z.
Subjects:
Online Access:Get fulltext
Description
Summary:Public transport smartcard data can be used for detection of large crowds. By comparing statistics on habitual behavior (e.g., average by time of day), one can specifically identify nonhabitual crowds, which are often very problematic for transport systems. While habitual overcrowding (e.g., peak hour) is well understood both by traffic managers and travelers, nonhabitual overcrowding hotspots can become even more disruptive and unpleasant because they are generally unexpected. By quickly understanding such cases, a transport manager can react and mitigate transport system disruptions. We propose a probabilistic data analysis model that breaks each nonhabitual overcrowding hotspot into a set of explanatory components. The potential explanatory components are initially retrieved from social networks and special events websites and then processed through text-analysis techniques. Finally, for each such component, the probabilistic model estimates a specific share in the total overcrowding counts. We first validate with synthetic data and then test our model with real data from the public transport system (EZLink) of Singapore, focused on three case study areas. We demonstrate that it is able to generate explanations that are intuitively plausible and consistent both locally (correlation coefficient, i.e., CC, from 85% to 99% for the three areas) and globally (CC from 41.2% to 83.9%). This model is directly applicable to any other domain sensitive to crowd formation due to large social events (e.g., communications, water, energy, waste).
Singapore. National Research Foundation (Singapore-MIT Alliance for Research and Technology)
Fundacao para a Ciencia e a Tecnologia (Project PTDC/ECM-TRA/1898/2012)