Mining Mozambique Health Data : The Case of Malaria: From Bayesian Incidence Risk to Incidence Case Predictions

The health sector in Mozambique is piled with data, holding records of major public health diseases, such as malaria, cholera, etc. The process of scrutinizing such a mass of health data for useful information is challenging but essential for the health authorities and professionals. Statistical lea...

Full description

Bibliographic Details
Main Author: Zacarias, Orlando P.
Format: Doctoral Thesis
Language:English
Published: Stockholms universitet, Institutionen för data- och systemvetenskap 2015
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-122672
http://nbn-resolving.de/urn:isbn:978-91-7649-304-5
Description
Summary:The health sector in Mozambique is piled with data, holding records of major public health diseases, such as malaria, cholera, etc. The process of scrutinizing such a mass of health data for useful information is challenging but essential for the health authorities and professionals. Statistical learning and inferential approaches can be used to provide health decision makers with appropriate tools for disease diagnosis and assessment, where the analysis is performed using Bayesian predictive techniques and data mining. The purpose of this thesis is to investigate how predictive data mining and Bayesian regression methods can be used effectively, so as to extract useful knowledge from reported malaria health data to support decision making and management.  In summary, effective Bayesian predictive methods based on spatial and space-time reported cases of malaria have been derived, allowing the extraction of the main risk factors for malaria. Predictive models that combine consecutive temporal connections within the analysis of the space-time variations of the disease have been found to be relevant when the explicit modeling of seasonality is not required or is even unfeasible. Investigation of the most effective ways to derive numerical predictive models was performed using several regression predictive methods. The conclusions are that effective numerical prediction of new cases of the disease can be achieved by training support vector machines using a time-window approach for the choice of different training sets based on a number of years and reducing the time towards the test set. The best performance is obtained for a smaller time-window. Another contribution of this thesis is the determining of the importance of predictors in the prediction of the incidence of malaria, performed by adopting the permutation accuracy strategy (from the random forests method) using the test set. Also, an additional contribution relates to a significant reduction in the predictive error, which has been obtained by the employment of a sample correction bias strategy, while testing the predictive models in different regions, other than where they were initially developed.