“ I don't even know what I'm doing here.”
Chrom, Tron
At the end of this session you should be able to
To interpret the success or characteristics of you model, there are more measures beside the p value and R² you learned in LN04-1 Regressions.
The minimum and maximum values of a prediction indicate how well a model is able to predict extreme values (either low or high).
Comparing the mean and median values of a prediction to the observed values teachs you about a general over- or underestimation of the prediction: The mean value is calculated by summing up all values of the dataset of interest and divid it by the number of observations. Though the mean value is widely used to characterize datasets, it has the major disadvantage of being highly affected by outliers. The median, in contrast is the value which is located in the middle of an ordered dataset. Thus it is robust to outliers.
The standard deviation (sd) describes the spread of the data. It is the average deviation from each value to the mean value of the distribution.
Luckily, as a R user you don't have to calculate these measures by hand. The functions
mean() max() min() median() sd()
will do it for you!
A boxplot is a useful visualization of the measures shown in the section above. It is therefore often used to depict the differences of distributions eg. between predicted and observed values.
(Chen-Pan Liao [CC_BY_SA] via wikimedia.org)
A Boxplot shows several components: