Computing LTS Regression for Large Data Sets

Peter J. Rousseeuw and Katrien Van Driessen (1999)

Abstract

Least trimmed squares (LTS) regression is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator, with breakdown value (n-h)/n. It turned out that the computation time of existing LTS algorithms grew too fast with the size of the data set, precluding their use for data mining. Therefore we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call 'selective iteration' and 'nested extensions'. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. Moreover, FAST-LTS runs faster than all programs for least median of squares (LMS). The new algorithm makes the LTS method available as a tool for robust regression in large data sets, e.g. in a data mining context.

Keywords

Breakdown value, Linear model, Outlier detection, Regression, Robust estimation.


Papers 2002 - Abstract - Program FAST-LTS - Program FAST-LTS IN MATLAB - Paper

Antwerp Group on Robust & Applied Statistics
Department of Mathematics and Computer Sciences
University of Antwerp (UA)
Middelheimlaan 1, B-2020 Antwerpen, Belgium
agoras@mail.win.ua.ac.be
http://www.agoras.ua.ac.be/