Robust estimation and inference under cellwise and casewise contamination

Cellwise outliers are likely to occur together with casewise outliers in datasets of relatively large dimension. Recent work has shown that traditional high breakdown point procedures may fail when applied to such datasets. In this thesis, we consider this problem when the goal is to (1) estimate mu...

Full description

Bibliographic Details
Main Author: Leung, Andy Chin Yin
Language:English
Published: University of British Columbia 2017
Online Access:http://hdl.handle.net/2429/60145
id ndltd-UBC-oai-circle.library.ubc.ca-2429-60145
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-601452018-01-05T17:29:29Z Robust estimation and inference under cellwise and casewise contamination Leung, Andy Chin Yin Cellwise outliers are likely to occur together with casewise outliers in datasets of relatively large dimension. Recent work has shown that traditional high breakdown point procedures may fail when applied to such datasets. In this thesis, we consider this problem when the goal is to (1) estimate multivariate location and scatter matrix and (2) estimate regression coefficients and confidence intervals for inference, which both are cornerstones in multivariate data analysis. To address the first problem, we propose a two-step procedure to deal with casewise and cellwise outliers, which generally proceeds as follows: first, it uses a filter to identify cellwise outliers and replace them by missing values; then, it applies a robust estimator to the incomplete data to down-weight casewise outliers. We show that the two-step procedure is consistent under the central model provided the filter is appropriately chosen. The proposed two-step procedure for estimating location and scatter matrix is then applied in regression for the case of continuous covariates by simply adding a third step, which computes robust regression coefficients from the estimated robust multivariate location and scatter matrix obtained in the second step. We show that the three-step estimator is consistent and asymptotically normal at the central model, for the case of continuous covariates. Finally, the estimator is extended to handle both continuous and dummy covariates. Extensive simulation results and real data examples show that the proposed methods can handle both cellwise and casewise outliers similarly well. Science, Faculty of Statistics, Department of Graduate 2017-01-03T23:24:33Z 2017-01-21T04:02:11 2016 2017-02 Text Thesis/Dissertation http://hdl.handle.net/2429/60145 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description Cellwise outliers are likely to occur together with casewise outliers in datasets of relatively large dimension. Recent work has shown that traditional high breakdown point procedures may fail when applied to such datasets. In this thesis, we consider this problem when the goal is to (1) estimate multivariate location and scatter matrix and (2) estimate regression coefficients and confidence intervals for inference, which both are cornerstones in multivariate data analysis. To address the first problem, we propose a two-step procedure to deal with casewise and cellwise outliers, which generally proceeds as follows: first, it uses a filter to identify cellwise outliers and replace them by missing values; then, it applies a robust estimator to the incomplete data to down-weight casewise outliers. We show that the two-step procedure is consistent under the central model provided the filter is appropriately chosen. The proposed two-step procedure for estimating location and scatter matrix is then applied in regression for the case of continuous covariates by simply adding a third step, which computes robust regression coefficients from the estimated robust multivariate location and scatter matrix obtained in the second step. We show that the three-step estimator is consistent and asymptotically normal at the central model, for the case of continuous covariates. Finally, the estimator is extended to handle both continuous and dummy covariates. Extensive simulation results and real data examples show that the proposed methods can handle both cellwise and casewise outliers similarly well. === Science, Faculty of === Statistics, Department of === Graduate
author Leung, Andy Chin Yin
spellingShingle Leung, Andy Chin Yin
Robust estimation and inference under cellwise and casewise contamination
author_facet Leung, Andy Chin Yin
author_sort Leung, Andy Chin Yin
title Robust estimation and inference under cellwise and casewise contamination
title_short Robust estimation and inference under cellwise and casewise contamination
title_full Robust estimation and inference under cellwise and casewise contamination
title_fullStr Robust estimation and inference under cellwise and casewise contamination
title_full_unstemmed Robust estimation and inference under cellwise and casewise contamination
title_sort robust estimation and inference under cellwise and casewise contamination
publisher University of British Columbia
publishDate 2017
url http://hdl.handle.net/2429/60145
work_keys_str_mv AT leungandychinyin robustestimationandinferenceundercellwiseandcasewisecontamination
_version_ 1718585488078536704