User Tools

Site Tools


Sidebar

Translations of this page:

Back to BIS-Schools

Biodiversity data analysis with R

Lectures

Worksheets

Code

Excursus

en:learning:schools:s01:code-examples:ba-ce-00-02

C00-2 Data frame basics

Creation of a data frame

A data frame is created from scratch by supplying vectors to the the data.frame function. Here are some examples:

x <- c(2.5, 3.5, 3.4)
y <- c(5, 10, 1)
my_df <- data.frame(x, y)
my_df
##     x  y
## 1 2.5  5
## 2 3.5 10
## 3 3.4  1
colnames(my_df) <- c("Floats", "Integers")

my_other_df <- data.frame(X = c(2, 3, 4), Y = c("A", "B", "C"))
my_other_df
##   X Y
## 1 2 A
## 2 3 B
## 3 4 C

The colnames function allows to supply column names to an existing data frame. Alternatively, the column names can be set within the data.frame function by assign the vector elements to a variable (X and Y in the second example above).

Dimensions of a data frame

To get the dimensions of a data frame, use the ncol (number of columns), nrow (number of rows) or str (structure) function:

ncol(my_other_df)
## [1] 2
nrow(my_other_df)
## [1] 3
str(my_other_df)
## 'data.frame':    3 obs. of  2 variables:
##  $ X: num  2 3 4
##  $ Y: Factor w/ 3 levels "A","B","C": 1 2 3

Displaying and accessing the content of a data frame

The content of a data frame is accessed by either a position information given in square brackets (e.g. df[3,4]) or a column name given after a $ sign (e.g. df$columnName). Here’s an example:

my_other_df[1,]  # Shows first row
##   X Y
## 1 2 A
my_other_df[,2]  # Shows second column
## [1] A B C
## Levels: A B C
my_other_df$Y  # Shows second column
## [1] A B C
## Levels: A B C

If position information is used, the ordering matters. If you think of a data frame like a table, then the following applies:

  • In a 1-D data frame, the first dimension is the row
  • In a 2-D data frame, the first dimension is the row, the second the column

Higher dimensions follow the same logic.

Here are some possible combinations:

  • Single row, all columns: df[x,] with \(x \in \text{number of rows}\)
  • Single column, all rows: df[,y] with \(x \in \text{number of columns}\)
  • Single row and column: df[x,y] with \(x, y \in \text{number of rows, columns}\)
  • All except one row, all columns: df[-x,y] with \(x \in \text{number of rows}\)
  • Selected rows, all columns: df[c(x1, x2, x3),] with \(x1, x2, x3 \in \text{number of rows}\)
  • Continous rows, all columns: df[c(x1:x2),] with \(x1, x2 \in \text{number of rows}\)

In summary, dimensions like rows or columns that should be selected have positive numbers, such that should be hidden have negative numbers and if all entries of a dimension should be selected, one just leaves the field empty. If more than one dimension should be shown/hidden, one has to supply this information by a vector which is defined by the c function.

my_other_df[c(1,3),]  # Shows rows 1 and 3
##   X Y
## 1 2 A
## 3 4 C
my_other_df[c(1,2),]  # Shows rows 1 to 2
##   X Y
## 1 2 A
## 2 3 B

If you are interested in the first or last rows, you can also use the ‘head’ or ‘tail’ command. The default number of lines is 5 but this can be changed by the second argument. Let’s have a look at the first two rows:

head(my_other_df, 2)
##   X Y
## 1 2 A
## 2 3 B

And now on the last two rows:

tail(my_other_df, 2)
##   X Y
## 2 3 B
## 3 4 C

Changing, adding or deleting an element of a data frame

In order to change an element of a data frame (individual value or entire vectors like rows or columns), you have to access it following the logic above. To add or delete a column, you have to supply/remove a vector to the specified position.

Other (more individual) changes or adding rows will be covered later.

# overwrite an element
my_other_df$X[3] <- 400  # same as my_other_df[3,1] <- 400
my_other_df
##     X Y
## 1   2 A
## 2   3 B
## 3 400 C
# change an entire dimension
my_other_df[,1] <- c("200", "300", "401")  # same as my_other_df$X <- 400
my_other_df
##     X Y
## 1 200 A
## 2 300 B
## 3 401 C
# add a new column
my_other_df$z <- c(255, 300, 100)
my_other_df
##     X Y   z
## 1 200 A 255
## 2 300 B 300
## 3 401 C 100
# delete a column
my_other_df$z <- NULL
my_other_df
##     X Y
## 1 200 A
## 2 300 B
## 3 401 C

As for lists, to actually delete an element, it has to be set to NULL.

For more information have a look at e.g. the respective data type site at Quick R. There you will also find an overview on how to get information about an object. Of course, looking into the package documentation or search the web is always a good idea, too.

en/learning/schools/s01/code-examples/ba-ce-00-02.txt · Last modified: 2015/09/22 16:19 (external edit)