Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness

Homelessness has been a persistent social concern in the United States. A combination of political and economic events since the 1960s has driven increases in poverty that, by 1991, had surpassed 1928 depression era levels in some accounts. This paper explores how the emerging field of behavioral ec...

Full description

Bibliographic Details
Main Authors: Andrea Yoder Clark, Nicole Blumenfeld, Eric Lal, Shikar Darbari, Shiyang Northwood, Ashkan Wadpey
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/17/2045
id doaj-301872771e844c528935801f349124d7
record_format Article
spelling doaj-301872771e844c528935801f349124d72021-09-09T13:52:08ZengMDPI AGMathematics2227-73902021-08-0192045204510.3390/math9172045Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to HomelessnessAndrea Yoder Clark0Nicole Blumenfeld1Eric Lal2Shikar Darbari3Shiyang Northwood4Ashkan Wadpey5School of Business, University of San Diego, 5998 Alcala Park, San Diego, CA 92110, USA2-1-1 San Diego, P.O. Box 420039, San Diego, CA 92124, USASchool of Business, University of San Diego, 5998 Alcala Park, San Diego, CA 92110, USASchool of Business, University of San Diego, 5998 Alcala Park, San Diego, CA 92110, USASchool of Business, University of San Diego, 5998 Alcala Park, San Diego, CA 92110, USASchool of Business, University of San Diego, 5998 Alcala Park, San Diego, CA 92110, USAHomelessness has been a persistent social concern in the United States. A combination of political and economic events since the 1960s has driven increases in poverty that, by 1991, had surpassed 1928 depression era levels in some accounts. This paper explores how the emerging field of behavioral economics can use machine learning and data science methods to explore preventative responses to homelessness. In this study, machine learning data mining strategies, specifically K-means cluster analysis and later, decision trees, were used to understand how environmental factors and resultant behaviors can contribute to the experience of homelessness. Prevention of the first homeless event is especially important as studies show that if a person has experienced homelessness once, they are 2.6 times more likely to have another homeless episode. Study findings demonstrate that when someone is at risk for not being able to pay utility bills at the same time as they experience challenges with two or more of the other social determinants of health, the individual is statistically significantly more likely to have their first homeless event. Additionally, for men over 50 who are not in the workforce, have a health hardship, and experience two or more other social determinants of health hardships at the same time, the individual has a high statistically significant probability of experiencing homelessness for the first time.https://www.mdpi.com/2227-7390/9/17/2045data sciencemachine learningdata miningk-meanscluster analysisdecision trees
collection DOAJ
language English
format Article
sources DOAJ
author Andrea Yoder Clark
Nicole Blumenfeld
Eric Lal
Shikar Darbari
Shiyang Northwood
Ashkan Wadpey
spellingShingle Andrea Yoder Clark
Nicole Blumenfeld
Eric Lal
Shikar Darbari
Shiyang Northwood
Ashkan Wadpey
Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
Mathematics
data science
machine learning
data mining
k-means
cluster analysis
decision trees
author_facet Andrea Yoder Clark
Nicole Blumenfeld
Eric Lal
Shikar Darbari
Shiyang Northwood
Ashkan Wadpey
author_sort Andrea Yoder Clark
title Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
title_short Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
title_full Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
title_fullStr Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
title_full_unstemmed Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
title_sort using k-means cluster analysis and decision trees to highlight significant factors leading to homelessness
publisher MDPI AG
series Mathematics
issn 2227-7390
publishDate 2021-08-01
description Homelessness has been a persistent social concern in the United States. A combination of political and economic events since the 1960s has driven increases in poverty that, by 1991, had surpassed 1928 depression era levels in some accounts. This paper explores how the emerging field of behavioral economics can use machine learning and data science methods to explore preventative responses to homelessness. In this study, machine learning data mining strategies, specifically K-means cluster analysis and later, decision trees, were used to understand how environmental factors and resultant behaviors can contribute to the experience of homelessness. Prevention of the first homeless event is especially important as studies show that if a person has experienced homelessness once, they are 2.6 times more likely to have another homeless episode. Study findings demonstrate that when someone is at risk for not being able to pay utility bills at the same time as they experience challenges with two or more of the other social determinants of health, the individual is statistically significantly more likely to have their first homeless event. Additionally, for men over 50 who are not in the workforce, have a health hardship, and experience two or more other social determinants of health hardships at the same time, the individual has a high statistically significant probability of experiencing homelessness for the first time.
topic data science
machine learning
data mining
k-means
cluster analysis
decision trees
url https://www.mdpi.com/2227-7390/9/17/2045
work_keys_str_mv AT andreayoderclark usingkmeansclusteranalysisanddecisiontreestohighlightsignificantfactorsleadingtohomelessness
AT nicoleblumenfeld usingkmeansclusteranalysisanddecisiontreestohighlightsignificantfactorsleadingtohomelessness
AT ericlal usingkmeansclusteranalysisanddecisiontreestohighlightsignificantfactorsleadingtohomelessness
AT shikardarbari usingkmeansclusteranalysisanddecisiontreestohighlightsignificantfactorsleadingtohomelessness
AT shiyangnorthwood usingkmeansclusteranalysisanddecisiontreestohighlightsignificantfactorsleadingtohomelessness
AT ashkanwadpey usingkmeansclusteranalysisanddecisiontreestohighlightsignificantfactorsleadingtohomelessness
_version_ 1717759713289961472