Robust estimation of multivariate scatter in non-affine equivariant scenarios

We consider the problem of robust estimation of the scatter matrix of an elliptical distribution when observed data are corrupted in a cell-wise manner. The first half of the thesis develops a framework for dealing with data subjected to independent cell-wise contamination. Each data cell (as oppose...

Full description

Bibliographic Details
Main Author: Danilov, Mikhail
Language:English
Published: University of British Columbia 2010
Online Access:http://hdl.handle.net/2429/19462
id ndltd-UBC-oai-circle.library.ubc.ca-2429-19462
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-194622018-01-05T17:24:06Z Robust estimation of multivariate scatter in non-affine equivariant scenarios Danilov, Mikhail We consider the problem of robust estimation of the scatter matrix of an elliptical distribution when observed data are corrupted in a cell-wise manner. The first half of the thesis develops a framework for dealing with data subjected to independent cell-wise contamination. Each data cell (as opposed to data case in traditional robustness) can be contaminated independently of the rest of the case. Instead of downweighting the whole case we attempt to identify the affected cells, remove the offending values and treat them as missing at random for subsequent likelihood-based processing. We explore several variations of the detection procedure that takes into account the multivariate structure of the data and end up with a heuristic algorithm that identifies and removes a large proportion of dangerous independent contamination. Although there are not many existing methods to measure against, the proposed covariance estimate compares favorably to naive alternatives such as pairwise estimates or univariate Winsorising. The cell-wise data corruption mechanism that we deal with in the second half of this thesis is missing data. Missing data on their own have been well studied and likelihood methods are well developed. The new setting that we are interested in is when missing data come together with the traditional case-wise contamination. Both issues have been studied extensively over that last few decades but little attention has been paid to how to address them both at the same time. We propose a modification of the S-estimate that allows robust estimation of multivariate location and scatter matrix in the presence of missing completely at random (MCAR) data. The method is based on the idea of the maximum likelihood of the observed data and extends it into the world of S-estimates. The estimate comes complete with the computation algorithm, which is an adjusted version of the widely used Fast-S procedure. Simulation results and applications to real datasets confirm the superiority of our method over available alternatives. Preliminary investigation reported in the concluding chapter suggests that combining the two main ideas presented in this thesis can yield an estimate that is robust against case-wise and cell-wise contamination simultaneously. Science, Faculty of Statistics, Department of Graduate 2010-02-01T16:08:26Z 2010-02-01T16:08:26Z 2010 2010-05 Text Thesis/Dissertation http://hdl.handle.net/2429/19462 eng Attribution-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-sa/3.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description We consider the problem of robust estimation of the scatter matrix of an elliptical distribution when observed data are corrupted in a cell-wise manner. The first half of the thesis develops a framework for dealing with data subjected to independent cell-wise contamination. Each data cell (as opposed to data case in traditional robustness) can be contaminated independently of the rest of the case. Instead of downweighting the whole case we attempt to identify the affected cells, remove the offending values and treat them as missing at random for subsequent likelihood-based processing. We explore several variations of the detection procedure that takes into account the multivariate structure of the data and end up with a heuristic algorithm that identifies and removes a large proportion of dangerous independent contamination. Although there are not many existing methods to measure against, the proposed covariance estimate compares favorably to naive alternatives such as pairwise estimates or univariate Winsorising. The cell-wise data corruption mechanism that we deal with in the second half of this thesis is missing data. Missing data on their own have been well studied and likelihood methods are well developed. The new setting that we are interested in is when missing data come together with the traditional case-wise contamination. Both issues have been studied extensively over that last few decades but little attention has been paid to how to address them both at the same time. We propose a modification of the S-estimate that allows robust estimation of multivariate location and scatter matrix in the presence of missing completely at random (MCAR) data. The method is based on the idea of the maximum likelihood of the observed data and extends it into the world of S-estimates. The estimate comes complete with the computation algorithm, which is an adjusted version of the widely used Fast-S procedure. Simulation results and applications to real datasets confirm the superiority of our method over available alternatives. Preliminary investigation reported in the concluding chapter suggests that combining the two main ideas presented in this thesis can yield an estimate that is robust against case-wise and cell-wise contamination simultaneously. === Science, Faculty of === Statistics, Department of === Graduate
author Danilov, Mikhail
spellingShingle Danilov, Mikhail
Robust estimation of multivariate scatter in non-affine equivariant scenarios
author_facet Danilov, Mikhail
author_sort Danilov, Mikhail
title Robust estimation of multivariate scatter in non-affine equivariant scenarios
title_short Robust estimation of multivariate scatter in non-affine equivariant scenarios
title_full Robust estimation of multivariate scatter in non-affine equivariant scenarios
title_fullStr Robust estimation of multivariate scatter in non-affine equivariant scenarios
title_full_unstemmed Robust estimation of multivariate scatter in non-affine equivariant scenarios
title_sort robust estimation of multivariate scatter in non-affine equivariant scenarios
publisher University of British Columbia
publishDate 2010
url http://hdl.handle.net/2429/19462
work_keys_str_mv AT danilovmikhail robustestimationofmultivariatescatterinnonaffineequivariantscenarios
_version_ 1718582335274745856