A Fast Algorithm for the Minimum Covariance Determinant Estimator

Peter J. Rousseeuw and Katrien Van Driessen (1999)

Abstract

The minimum covariance determinant (MCD) method of Rousseeuw (1984) is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. We discuss two important applications of larger size: one about a production process at Philips with n=677 objects and p=9 variables, and a data set from astronomy with n=137,256 objects and p=27 variables. To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD. The basic ideas are an inequality involving order statistics and determinants, and techniques which we call 'selective iteration' and 'nested extensions'. For small data sets FAST-MCD typically finds the exact MCD, whereas for larger data sets it gives more accurate results than existing algorithms and is faster by orders of magnitude. Moreover, FAST-MCD is able to detect an exact fit, i.e. a hyperplane containing h or more observations. The new algorithm makes the MCD method available as a routine tool for analyzing multivariate data. We also propose the distance-distance plot (or 'D-D plot') which displays MCD-based robust distances versus Mahalanobis distances, and illustrate it with some examples.

Keywords

Breakdown value, Multivariate location and scatter, Outlier detection, Regression, Robust estimation.


Papers 1999 - Abstract - Program FAST-MCD - Program FAST-MCD IN MATLAB - Paper

Antwerp Group on Robust & Applied Statistics
Department of Mathematics and Computer Sciences
University of Antwerp (UA)
Middelheimlaan 1, B-2020 Antwerpen, Belgium
agoras@mail.win.ua.ac.be
http://www.agoras.ua.ac.be/