22.1 Least squares, residuals and the estimation criterion

One problem is, of course, the correct specification of the model, i.e. the question if the true model is really linear. In this graphic you can freely choose a functional relation. By default, I have set a quadratic relation. Then, the data will be generated and you can use the buttons „Show KQ-estimation“ and „Show true model“ to show the estimation and true model (double click). What do you notice about the residuals?

The table below shows you the true value of the parameters, the values estimated using least-squares-regression, and the values you set.

  wahr geschätzt selbst eingestellt

In addition, we give you some more values for the residuals: The average of the residuals 1 n i = 1nr i, the average of the absolute values of the residuals 1 n i = 1n|r i| and the average of the squares of the residuals 1 n i = 1nr i2. It is noticeable that the average of the residuals is always very small. This is because the positive and negative errors cancel each other out if you simply add them all up. This is not the case with the absolute amounts and the squares. Moreover, you will see that with some effort you can do better than the value of the absolute amounts, but never better than the squares.

  wahr geschätzt selbst eingestellt
Durchschnitt der Residuen
Durchschnitt der Absolutbeträge der Residuen
Durchschnitt der Quadrate der Residuen

Why is that? Because of the definition what "fit as well as possible" should mean. Up to now we have not defined this at all and as you can see in this example, there are several ways to do so. One is the least-squares-regression. Here, our model is estimated to minimize the sum of the residual squares (which is equivalent to the average). So, since the KQ-regression minimizes this value, there are no other α and β that give a smaller residual squares sum, not even the true parameters! If you take another criterion, for example, the sum of the absolute values, there are other optimal estimates for α and β. To explain, which advantages and disadvantages the individual criteria have, would go beyond the scope of this website. You will learn this in your statistics lecture or a good book.

Note 1: From the one-dimensional you already know these criteria. If you want to find the "mean" value of a set of numbers, the average minimizes the sum of the quadratic deviations and the median minimizes the sum of the absolute deviations.

Note 2 (for those who are interested): The criterion of squares is called L2 norm, for the absolute values it is called L1 norm, according to the size of the exponent. Thus, one can also use an L3, L4, LX norm as a criterion by minimizing the sum of |r|3, r4 or |r|X. You can even define the L norm, where α and β are chosen to minimize the largest residual.

(c) by Christian Bauer
Prof. Dr. Christian Bauer
Chair of monetary economics
Trier University
D-54296 Trier
Tel.: +49 (0)651/201-2743
E-mail: Bauer@uni-trier.de
URL: https://www.cbauer.de