Least squares
|
Least squares is a mathematical optimization technique that attempts to find a "best fit" to a set of data by attempting to minimize the sum of the squares of the differences (called residuals) between the fitted function and the data.
The least squares technique is commonly used in curve fitting. Many other optimization problems can also be expressed in a least squares form, by either minimizing energy or maximizing entropy.
Contents |
Formulation of the problem
Suppose that the data set consists of the points (xi, yi) with i = 1, 2, ..., n. We want to find a function f such that
- <math>f(x_i)\approx y_i.<math>
To attain this goal, we suppose that the function f is of a particular form containing some parameters which need to be determined. For instance, suppose that it is quadratic, meaning that f(x) = ax2 + bx + c, where a, b and c are not yet known. We now seek the values of a, b and c that minimize the sum of the squares of the residuals:
- <math> S = \sum_{i=1}^n (y_i - f(x_i))^2. <math>
This explains the name least squares.
Solving the least squares problem
In the above example, f is linear in the parameters a, b and c. The problem simplifies considerably in this case and essentially reduces to a system of linear equations. This is explained in the article on linear least squares.
The problem is more difficult if f is not linear in the parameters to be determined. We then need to solve a general (unconstrained) optimization problem. Any algorithm for such problems, like Newton's method and gradient descent, can be used. Another possibility is to apply an algorithm that is developed especially to tackle least squares problems, for instance the Gauss-Newton algorithm or the Levenberg-Marquardt algorithm.
Least squares and regression analysis
In regression analysis, one replaces the relation
- <math>f(x_i)\approx y_i<math>
by
- <math>f(x_i) = y_i + \varepsilon_i,<math>
where the noise term ε is a random variable with mean zero. Again, we distinguish between linear regression, in which case the function f is linear in the parameters to be determined (e.g., f(x) = ax2 + bx + c), and nonlinear regression. As before, linear regression is much simpler than nonlinear regression. (It is tempting to think that the reason for the name linear regression is that the graph of the function f(x) = ax + b is a line. Fitting a curve f(x) = ax2 + bx + c, estimating a, b, and c by least squares, is an instance of linear regression because the vector of least-square estimates of a, b, and c is a linear transformation of the vector whose components are f(xi) + εi.)
One frequently estimates the parameters (a, b and c in the above example) by least squares: those values are taken that minimize S. The Gauss-Markov theorem states that the least squares estimates are optimal in a certain sense, if we take f(x) = ax + b with a and b to be determined and the noise terms ε are independent and identically distributed (see the article for a more precise statement and less restrictive conditions on the noise terms).
External links
- http://www.physics.csbsju.edu/stats/least_squares.html
- http://www.zunzun.com
- http://www.orbitals.com/self/least/least.htm
- Template:Planetmath referencede:Methode der kleinsten Quadrate
fr:Méthode des moindres carrés nl:Kleinste-kwadratenmethode ja:最小二乗法 pl:Metoda najmniejszych kwadratów su:Kuadrat leutik sv:Minsta kvadratmetoden