en:learning:schools:s01:lecture-notes:ba-ln-07

“ I don't even know what I'm doing here.”

Chrom, Tron

- Describing and visualizing model results by boxplots and simple statistics

At the end of this session you should be able to

- Calculate characteristics of the model output
- Create boxplots
- Interpret model results based on boxplots

To interpret the success or characteristics of you model, there are more measures beside the p value and R² you learned in LN04-1 Regressions.

The **minimum and maximum** values of a prediction indicate how well a model is able to predict extreme values (either low or high).

Comparing the mean and median values of a prediction to the observed values teachs you about a general over- or underestimation of the prediction:
The **mean value** is calculated by summing up all values of the dataset of interest and divid it by the number of observations. Though the mean value is widely used to characterize datasets, it has the major disadvantage of
being highly affected by outliers. The **median**, in contrast is the value which is located in the middle of an ordered dataset. Thus it is robust to outliers.

The **standard deviation (sd)** describes the spread of the data. It is the average deviation from each value to the mean value of the distribution.

Luckily, as a R user you don't have to calculate these measures by hand. The functions

mean() max() min() median() sd()

will do it for you!

A boxplot is a useful visualization of the measures shown in the section above. It is therefore often used to depict the differences of distributions eg. between predicted and observed values.

(Chen-Pan Liao [CC_BY_SA] via wikimedia.org)

A Boxplot shows several components:

- The
**box**includes the distribution of the values located in the second and third quartil, thus of the 50% of values which are closest to the mean value. - The
**median**is depicted by the line in the box. The whiskers and representation of outliers represent the spread of the values. - The
**Whiskers**mark the remaining values which don't fall into the second and third quartile. The length of the whiskers is not standardizized. Often they are expanded to 1.5*the interquartile range (IQR). - The
**interquartile range**is the range between the lowest value falling into the second quartile and the highest value falling into the third quartile. - All values which are higher than 1.5*IQR are considered as
**outliers**and are usually marked by points over or under the whiskers, respectively.

en/learning/schools/s01/lecture-notes/ba-ln-07.txt · Last modified: 2015/09/22 16:22 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International