Abstract
The minimum covariance determinant (MCD) method of Rousseeuw (1984) is a highly robust estimator of multivariate location and scatter.
Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant.
Until now applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions.
We discuss two important applications of larger size: one about a production process at Philips with n=677 objects and p=9 variables, and a data set from astronomy with n=137,256 objects and p=27 variables.
To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD.
The basic ideas are an inequality involving order statistics and determinants, and techniques which we call 'selective iteration' and 'nested extensions'.
For small data sets FAST-MCD typically finds the exact MCD, whereas for larger data sets it gives more accurate results than existing algorithms and is faster by orders of magnitude.
Moreover, FAST-MCD is able to detect an exact fit, i.e. a hyperplane containing h or more observations.
The new algorithm makes the MCD method available as a routine tool for analyzing multivariate data.
We also propose the distance-distance plot (or 'D-D plot') which displays MCD-based robust distances versus Mahalanobis distances, and illustrate it with some examples.
Keywords
Breakdown value, Multivariate location and scatter, Outlier detection, Regression, Robust estimation.
|