Summary: | Thesis (Ph.D.)--Boston University === Cluster analysis is widely used in many disciplines including biology, psychology, archaeology, geography, and marketing. Methods have been developed to extend cluster analysis to longitudinal data, clustering subject trajectories rather than single time points. Here, I examine 2 methods of longitudinal cluster analysis: k-means and model-based (implemented using FlexMix in R) cluster analysis. I compare these two methods based on the Correct Classification Rate, the ability of the method to correctly classify subject trajectories into groups, using a simulation study. Both methods are found to perform well under most circumstances, but in 64% of the scenarios examined, the model-based method out-performs the k-means approach. Next, I examine three criteria that have been used to determine how many groups exist in the data: the Akaike's Information Criteria (AIC), the Davies-Bouldin Index (DB), and the Calinski-Harabasz pseudo F-statistic (CH). The latter two were developed specifically for choosing the number of groups in a cluster analysis with a single observation per person, while the AIC was developed as a general model fit statistic. Few studies have used these criteria in the context of longitudinal data and no study has compared their efficacy. We found that the DB and CH fail to correctly identify the number of groups in the majority cases, while the AIC was better able to determine the correct number. Finally, as no study has examined the addition of a covariate to cluster analysis, we compare results of a cluster analysis when a covariate was taken into account to when it is ignored. When a covariate is both time-dependent and associated with the outcome, regardless of the magnitude of the association, it is important to take this variable into account in the analysis. If the covariate is associated only with the outcome and not time-dependent, depending on the magnitude of the association, it may be necessary to account for the covariate. In summary, we present methods for clustering trajectories, evaluate methods for determining the number of groups and determine the importance of adjusting for covariates in the cluster analysis of longitudinal data.
|