en:learning:schools:s01:lecture-notes:ba-ln-05

**This is an old revision of the document!**

“This is the key to a new order. This code disk means freedom.”

Tron, Tron

- Transforming variables used in linear regression models
- Predicting variable values using linear models

At the end of this session you should be able to

- visually check if the residuals of a linear model are normally distributed
- transform input variables of regression models if necessary
- predict values based on linear models

Predicting data values based on a linear regression model is quite straight forward. Given that the independent variable x_{i} is available for position i (e.g. a research plot location), one can estimate (i.e. predict) the value of the dependent variable y_{i} using

`y`

_{i} = a x_{i} + b

with a and b as the slope and intercept of the linear regression model.

Before we have a look at that, let's revisit the coefficient of determination.

R^{2} can be interpreted as the percentage degree of how much of the variability of the dependent data can be explained by a linear regression model. But this interpretation requires that the residuals of the model are normally distributed (and also homoscedastic but this is a different story).

Hence, before using a linear model for predictions, one has to make sure that the R^{2} is actually meaningful. This is especially true when no further validation analysis of the goodness of the prediction should be performed.

A simple visual test is to look a a Q-Q plot where the residuals of the individual data pairs should fall on or close to the 1:1 line.

If such a behavior is not given, it is quite likely that the input variables are not normally distributed. One possible solution might be the transformation of one or both input variables e.g. by computing its square root or more complicated stuff like the inverse sinus of the square root of the normalized variable. The latter would be a good choice for percentage coverage values (i.e. normalization transforms values between 0 and 100 to between 0 and 1 in this case).

Computing a prediction based on a linear model is straight forward. Simply apply the linear model equation to the set of dependent variables for which the prediction of the independent variable is required.

If the input variables for the linear regression model have been transformed before, the resulting values from the prediction have to be transformed back by inverting the transformation function (i.e. square if the original transformation was square root etc.).

We came back on validating predictions in the next session.

If you need more examples, have a look at C05-1 Predicting observations with linear models

**Note on data used for illustrating analysis**
The analysis used for illustration on this site are based on data from a field survey of areas in the Fogo natural park in 2007 by K. Mauer. For more information, please refer to this report.

en/learning/schools/s01/lecture-notes/ba-ln-05.1509355282.txt.gz · Last modified: 2017/10/30 10:21 by aziegler

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International