Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution

In studies on the health impacts of air pollution, regression analysis continues to advance far beyond classical linear regression, which many scientists may have become familiar with in an introductory statistics course. With each new level of complexity, regression analysis may become less transpa...

Full description

Bibliographic Details
Main Authors: John F. Joseph, Chad Furl, Hatim O. Sharif, Thankam Sunil, Charles G. Macias
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/8/3375
id doaj-7b64606d02b544979fcbbdbdb9bac7c9
record_format Article
spelling doaj-7b64606d02b544979fcbbdbdb9bac7c92021-04-09T23:01:35ZengMDPI AGApplied Sciences2076-34172021-04-01113375337510.3390/app11083375Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air PollutionJohn F. Joseph0Chad Furl1Hatim O. Sharif2Thankam Sunil3Charles G. Macias4Department of Civil and Environmental Engineering, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USADepartment of Civil and Environmental Engineering, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USADepartment of Civil and Environmental Engineering, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USADepartment of Public Health, University of Tennessee, Knoxville, 1914 Andy Holt Ave., Knoxville, TN 37996, USACenter for Clinical Effectiveness and Evidence-Based Outcome Center, Baylor College of Medicine/Texas Children’s Hospital, 6621 Fannin St., Houston, TX 77030, USAIn studies on the health impacts of air pollution, regression analysis continues to advance far beyond classical linear regression, which many scientists may have become familiar with in an introductory statistics course. With each new level of complexity, regression analysis may become less transparent, even to the analyst working with the data. This may be especially true in count data regression models, where the response variable (typically given the symbol y) is count data (i.e., takes on values of 0, 1, 2, …). In such models, the normal distribution (the familiar bell-shaped curve) for the residuals (i.e., the differences between the observed values and the values predicted by the regression model) no longer applies. Unless care is taken to correctly specify just how those residuals are distributed, the tendency to accept untrue hypotheses may be greatly increased. The aim of this paper is to present a simple histogram of predicted and observed count values (POCH), which, while rarely found in the environmental literature but presented in authoritative statistical texts, can dramatically reduce the risk of accepting untrue hypotheses. POCH can also increase the transparency of count data regression models to analysts themselves and to the scientific community in general.https://www.mdpi.com/2076-3417/11/8/3375count datacorrelationregression models
collection DOAJ
language English
format Article
sources DOAJ
author John F. Joseph
Chad Furl
Hatim O. Sharif
Thankam Sunil
Charles G. Macias
spellingShingle John F. Joseph
Chad Furl
Hatim O. Sharif
Thankam Sunil
Charles G. Macias
Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution
Applied Sciences
count data
correlation
regression models
author_facet John F. Joseph
Chad Furl
Hatim O. Sharif
Thankam Sunil
Charles G. Macias
author_sort John F. Joseph
title Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution
title_short Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution
title_full Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution
title_fullStr Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution
title_full_unstemmed Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution
title_sort towards improving transparency of count data regression models for health impacts of air pollution
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2021-04-01
description In studies on the health impacts of air pollution, regression analysis continues to advance far beyond classical linear regression, which many scientists may have become familiar with in an introductory statistics course. With each new level of complexity, regression analysis may become less transparent, even to the analyst working with the data. This may be especially true in count data regression models, where the response variable (typically given the symbol y) is count data (i.e., takes on values of 0, 1, 2, …). In such models, the normal distribution (the familiar bell-shaped curve) for the residuals (i.e., the differences between the observed values and the values predicted by the regression model) no longer applies. Unless care is taken to correctly specify just how those residuals are distributed, the tendency to accept untrue hypotheses may be greatly increased. The aim of this paper is to present a simple histogram of predicted and observed count values (POCH), which, while rarely found in the environmental literature but presented in authoritative statistical texts, can dramatically reduce the risk of accepting untrue hypotheses. POCH can also increase the transparency of count data regression models to analysts themselves and to the scientific community in general.
topic count data
correlation
regression models
url https://www.mdpi.com/2076-3417/11/8/3375
work_keys_str_mv AT johnfjoseph towardsimprovingtransparencyofcountdataregressionmodelsforhealthimpactsofairpollution
AT chadfurl towardsimprovingtransparencyofcountdataregressionmodelsforhealthimpactsofairpollution
AT hatimosharif towardsimprovingtransparencyofcountdataregressionmodelsforhealthimpactsofairpollution
AT thankamsunil towardsimprovingtransparencyofcountdataregressionmodelsforhealthimpactsofairpollution
AT charlesgmacias towardsimprovingtransparencyofcountdataregressionmodelsforhealthimpactsofairpollution
_version_ 1721532310139961344