A data frame is created from scratch by supplying vectors to the the data.frame
function. Here are some examples:
x <- c(2.5, 3.5, 3.4)
y <- c(5, 10, 1)
my_df <- data.frame(x, y)
my_df
## x y
## 1 2.5 5
## 2 3.5 10
## 3 3.4 1
colnames(my_df) <- c("Floats", "Integers")
my_other_df <- data.frame(X = c(2, 3, 4), Y = c("A", "B", "C"))
my_other_df
## X Y
## 1 2 A
## 2 3 B
## 3 4 C
The colnames
function allows to supply column names to an existing data frame. Alternatively, the column names can be set within the data.frame
function by assign the vector elements to a variable (X and Y in the second example above).
To get the dimensions of a data frame, use the ncol
(number of columns), nrow
(number of rows) or str
(structure) function:
ncol(my_other_df)
## [1] 2
nrow(my_other_df)
## [1] 3
str(my_other_df)
## 'data.frame': 3 obs. of 2 variables:
## $ X: num 2 3 4
## $ Y: Factor w/ 3 levels "A","B","C": 1 2 3
The content of a data frame is accessed by either a position information given in square brackets (e.g. df[3,4]
) or a column name given after a $ sign (e.g. df$columnName). Here’s an example:
my_other_df[1,] # Shows first row
## X Y
## 1 2 A
my_other_df[,2] # Shows second column
## [1] A B C
## Levels: A B C
my_other_df$Y # Shows second column
## [1] A B C
## Levels: A B C
If position information is used, the ordering matters. If you think of a data frame like a table, then the following applies:
Higher dimensions follow the same logic.
Here are some possible combinations:
df[x,]
with \(x \in \text{number of rows}\)df[,y]
with \(x \in \text{number of columns}\)df[x,y]
with \(x, y \in \text{number of rows, columns}\)df[-x,y]
with \(x \in \text{number of rows}\)df[c(x1, x2, x3),]
with \(x1, x2, x3 \in \text{number of rows}\)df[c(x1:x2),]
with \(x1, x2 \in \text{number of rows}\)In summary, dimensions like rows or columns that should be selected have positive numbers, such that should be hidden have negative numbers and if all entries of a dimension should be selected, one just leaves the field empty. If more than one dimension should be shown/hidden, one has to supply this information by a vector which is defined by the c
function.
my_other_df[c(1,3),] # Shows rows 1 and 3
## X Y
## 1 2 A
## 3 4 C
my_other_df[c(1,2),] # Shows rows 1 to 2
## X Y
## 1 2 A
## 2 3 B
If you are interested in the first or last rows, you can also use the ‘head’ or ‘tail’ command. The default number of lines is 5 but this can be changed by the second argument. Let’s have a look at the first two rows:
head(my_other_df, 2)
## X Y
## 1 2 A
## 2 3 B
And now on the last two rows:
tail(my_other_df, 2)
## X Y
## 2 3 B
## 3 4 C
In order to change an element of a data frame (individual value or entire vectors like rows or columns), you have to access it following the logic above. To add or delete a column, you have to supply/remove a vector to the specified position.
Other (more individual) changes or adding rows will be covered later.
# overwrite an element
my_other_df$X[3] <- 400 # same as my_other_df[3,1] <- 400
my_other_df
## X Y
## 1 2 A
## 2 3 B
## 3 400 C
# change an entire dimension
my_other_df[,1] <- c("200", "300", "401") # same as my_other_df$X <- 400
my_other_df
## X Y
## 1 200 A
## 2 300 B
## 3 401 C
# add a new column
my_other_df$z <- c(255, 300, 100)
my_other_df
## X Y z
## 1 200 A 255
## 2 300 B 300
## 3 401 C 100
# delete a column
my_other_df$z <- NULL
my_other_df
## X Y
## 1 200 A
## 2 300 B
## 3 401 C
As for lists, to actually delete an element, it has to be set to NULL
.
For more information have a look at e.g. the respective data type site at Quick R. There you will also find an overview on how to get information about an object. Of course, looking into the package documentation or search the web is always a good idea, too.