Statistics 2 - Least Squares

Residuals and Least Squares

Developing a Model by Observation:
A simple type of regression equation is a straight line. A scatter plot of the data is drawn, two points are chosen that "appear" to lie on the line of best fit, the slope is determined and an equation is written. This is known as a freehand method of curve fitting. Unfortunately, different observers, who choose different points, may obtain different equations.

Developing a Model by Least Squares:
To avoid individual judgment in curve fitting, it is necessary to agree on a definition of a “best-fitting line” or curve. Consider the following set of points:

For a given value of x, say x1, there will be a difference between the value y1 and the corresponding value as determined by the “best fitting” curve. This distance, D1, is referred to as a residual.

A residual is the difference from the actual y-value and the value obtained by plugging the x-value (that goes with the y-value) into the regression equation.

Using these residuals, the following definition has been developed:

Definition:
Of all curves approximating a given set of data points, the curve having the property that

is a minimum is called a best-fitting curve.

A curve having this property is said to fit the data in the least-squares sense and is called a least-squares curve.

The graphing calculator uses this least squares process to determine regression models. When regression models are computed, residuals are automatically stored in a list called RESID.

Note: For a perfect fit, the residuals will be all zero and ZOOM 9: ZoomStat will result in a WINDOW RANGE error since Ymin = 0 and Ymax = 0. If you still wish to see the plot, change Ymin = -1 and Ymax = 1 and then press GRAPH.