User Tools

Site Tools


Sidebar

Translations of this page:

Back to BIS-Schools

Biodiversity data analysis with R

Lectures

Worksheets

Code

Excursus

en:learning:schools:s01:lecture-notes:ba-ln-09

L09: Pimping plots

“You know, you look nothing like your pictures.”

Kevin Flynn, Tron

Things we cover in this session

  • Visual analytics
  • Basic plots (in R)

Things you need for this session

Things to take home from this session

At the end of this session you should be able to

  • explain the paradigm shift linked to visual analytics
  • pimp some visualizations using R's basic plot functions

Visual analytics

The term visual analytics refers to an interactive and computer-based analysis procedure of data sets. As a scientific key word, it is relatively new and the Thomson Reuter's Web of Knowledge shows (only) 394 entries starting in 2002. If also non-ISI journals and other sources are considered, one gets 9,640 results from Google Scholar but only 35 hits prior to 2000.

From a geographical point of view one might think that interactive and visualization-based data analysis is quite common in the field of Geographical Information Science. However, a lot of maps only show the final result of an investigation and are not actually used within the process of analysis. To illustrate this point, have a look at the well known map of [Snow1855] below: upload.wikimedia.org_wikipedia_commons_thumb_2_27_snow-cholera-map-1.jpg_515px-snow-cholera-map-1.jpg

By mapping the cholera death, snow concluded from the spatial pattern (black squares) that the Broad Street pump (~center of the outbreak) was the source of the 1854 cholera epidemic in London.

If you want to read more on the visual analytics paradigm, [Fox2011] is a good starting point. If you want to read an entire book with a variety of topic-related chapters, [Keim2010] is freely available as PDF.

Visualization of non spatial data

Certainly, everyone of you has quite some experience in visualizing non-spatial data. In general, visualization should be guided by [Kelleher2011]

  • focus,
  • clearness and
  • simplicity.

Keep these guidelines in mind when you start visualizing data.

Before we start focusing on R's basic gallery of plotting types we approach this subject with some more general examples first.

8-O Have a look at C09-4 - Visualization (traps) now for some notes on plots, color and animations.

While not all of the examples in E06-1 should be generally avoided, there is more to think about when it comes to visualization of data sets.

8-O Have a look at [Kelleher2011] now for some short visualization guidelines.

While you surely know a variety of different plotting types, some visualization ideas might not come to your mind since you have never seen the specific idea before. Of course, search engines are your friend but you might also have a look at e.g. flowingdata.com for some input on this subject. We noticed that web page in a presentation of Hadley Wickham, the programmer of ggplot2 who has also some nice online courses.

Before starting with plotting functions in R, just one final remark: of course there are other ways to visualize your data and you should take what ever works best for you (although it might likely be R). For an overview of visualization tools aside from R have a look at this page. For visualizing public data, you might also directly use Google's public data viewer.

Visualization using R

R offers a large pallet of options for visualizing data. Probably the generic plotting routines from the graphics package are the most frequently used functions. For specific purposes however, especially when it comes to publication quality figures, other packages will likely be used most frequently. Above all, the lattice and the ggplo2 package will come to your attention if you look for visualization functions in this context.

As part of this course, we will focus on the generic plotting functions and also provide some help on the usage of the lattice package. Of course, all of our visualizations can also be produced with ggplot2.

The basic command structure for visualizing data using the generic functions is

<name of plotting function>(<x-axis data>, <y-axis data>,…)

while the structure for the lattice package is

<name of plotting function>(<y-axis data> ~ <x-axis data>, …)

which is not to difficult to distinguish.

For visualizing non-spatial data sets, the following overview provides you with the most important plots/functions.

Plot type Generic plotting function Lattice function
scatter plots plot() xyplot
box and whisker plots boxplot() bwplot()
histograms and density plots hist histogram() and densityplot()

8-O Have a look at C09-1 - Generic plotting functions now for more information on the generic plotting functions.

User defined labels and tics

As soon as you use some kind of transformation function (e.g. log, square root) for your original data values, your axis scales in a visualization will change as a consequence. Hence, you can no longer directly read the actual value at a certain position on your axis. Fortunately, there is a simple solution to this problem. Just define your own tics (i.e. the positions at which a value is drawn on your axis) and labels (i.e. the character or numeric value which is drawn at a tic) and add them to your plot instead of the original transformed information.

8-O Please have a look at C09-2 - Generic axis labelingfor an example on that topic using R's generic plotting functions.

Combining multiple plots

Another feature you might miss so far is to add e.g. multiple lines to the same scatter plot or draw certain groups of symbols in certain colors. Fortunately, the solution for this problem is again quite straight forward. It always consists of two parts:

  1. Draw the first part of the plot you want (e.g. the first line or the first few symbols which have the same color) using one of the standard plotting functions
  2. Add the remaining data to the plots using a special plotting function (generic way) or panels (lattice way).

Finally, you may also want to add not only one but multiple plots on a single page. Again, the solution is quite simple. Just divide your page into individual grids by defining a number of rows and columns and plot an individual plot into each grid cell afterwards.

8-O Please have a look at C09-3 - Multiple generic plots for an example on that topic using R's generic plotting functions.

Time for practice

en/learning/schools/s01/lecture-notes/ba-ln-09.txt · Last modified: 2017/10/30 10:23 by aziegler