21.1 Introduction

Linear regressions, regression methods in general, as well as inductive methods, differ from descriptive analyses mainly in the reference of the model. Regressions always refer to an underlying model and try to verify whether the model assumptions are correct or can be rejected, and/or to quantify the effects. On the following pages, I do not want to give you the details of the individual methods or the calculation steps, but I would like to give you a graphical representation of the idea behind regression and explain some aspects for your understanding.

In most cases, and for a linear regression this can be shown most easily, the aim is to fit a curve (model) to an existing data set in the best possible way. In simple linear regression, this is a straight line, which you can also fit manually to a data cloud and then compare with the regression result and the true relation. In addition, we also provide the real and estimated parameter values to visualize the contrast.

We start by presenting the true relation and the generation of the data. Of course, both are not known for real problems and are therefore suppressed on the following pages.

In our example, the true relation is:

y = α + βx + 𝜖,

where the slope (linear relation) is β, the constant is α and the measurement error or error term is 𝜖. I.e. for each data point (measured value) the y-value depends on the x-value, where y α + βx applies and the error term 𝜖 is added. The x-value is thus multiplied by β, and α as well as a random error term 𝜖 are added.

If you click the button "Generate and estimate data", data will be generated according to the parameters of your model and the linear regression estimation will be performed.

Here, you can set the parameters α and β

For technical reasons, the number of data points n is limited to 2000. Higher numbers would lead to too much computational effort and should be applied with suitable statistical programs. Here, for the didactic effect, a number of 200 seems sufficient to us.

Number of data points:

In this graphic, data points are randomly generated around the given curve. For this purpose, x-values are determined randomly, the corresponding y-value is calculated according to the true model, and then a random error term is added. Then a straight line is estimated, which fits as good as possible into the data cloud. Both the type of the true relation (here straight line) and the type of error (here independent of the x-value and additive) can be designed differently. The adjustment is done here using the method of least squares, which we will discuss in detail on the corresponding page.

  wahr geschätzt
α
β

In our example, the relationship of x- and y- values depends on two parameters: the y- axis intercept α (level) and the slope β. The table below compares the true values and the estimated values.

If you enter β0 = 0 as the true value, the true relation is that the y value no longer depends on the x value. The straight line runs flat (horizontal). The estimated value should then also be close to 0. In Statistics you will learn that the estimated value for β does not differ significantly from 0, i.e. due to the data situation you cannot say for sure whether the true β0 = 0 (y does not depend on x), or is slightly different from 0 (y depends on x). You can find out more about this in your statistics lecture.


(c) by Christian Bauer
Prof. Dr. Christian Bauer
Chair of monetary economics
Trier University
D-54296 Trier
Tel.: +49 (0)651/201-2743
E-mail: Bauer@uni-trier.de
URL: https://www.cbauer.de