This is an old revision of the document!
“All Programs have a desire to be useful.”
Master Control Program, Tron
At the end of this session you should be able to
Since linear models can only explain linear relationships and hence only make sense if such a relationship is feasible to explain the dependency between the dependent and one or more independent variables. Since eyeball analysis is very effective in estimating dependencies in the 2D space, plotting the dependent variable as a function of the independent variable should be a standard task in data analysis prior to fitting a model.
To visualize y as a function of x, a simple scatter plot can be plotted using R's plot() function:
Linear regression models link a dependent variable (e.g. y) to one ore more independent variables (e.g. x) using a linear function of type
y = a x + b.
One standard function used in R for computing linear regressions is the lm() function which allows both simple and multiple linear regression. A minimalistic call of the function looks like
lm(y ~ x)
While the application of a linear model function to any distribution of x and y will result in a regression line, the goodness of the fitted model can be anything between a complete disaster and a perfect result.
To estimate this goodness, one usually first looks at the coefficient of determination, i.e. r square (R2). In linear regression, (R2) is generally given by
R2 = 1 - ∑(yi - ŷi)2 / ∑(yi - ymean)2 =
1 - (Sum of squared differences of residuals) / (Sum of squared differences of all y)
with yi as the observed value corresponding to xi, ŷi as the result of the fitted model for xi, and ymean as the mean value over all yi in the sample.
Hence, R2 shows a value range between 0 and 1 and expresses the explained variance as the difference between the unexplained variance of the model (caused by residuals different from 0) and the total variance in the y sample.
Please note that actual value of R2 is irrelevant if it is not significant. The significance of a linear model is derived from testing the random chance that the actual slope (i.e. a) and actual intercept (i.e. b) of the linear model could also be 0.
To get R2 and the p-value of a linear model use the summary() function applied to your linear model variable, e.g.: