Indexing


What follows is a refresher. Presumably you’ve had enough R exposure to be aware of some of this. However, much of data processing regards data frames, or other tables of mixed data types, so more time will be spent on slicing and dicing of data frames instead. Even so, it would be impossible to use R effectively without knowing how to handle basic data types.

Base R Indexing Refresher

Slicing vectors

letters[4:6]  # lower case letters a-z
[1] "d" "e" "f"
letters[c(13, 10, 3)]
[1] "m" "j" "c"

Slicing matrices/data.frames

myMatrix[1, 2:3]  # matrix[rows, columns]

Label-based indexing

mydf['row1', 'b']

Position-based indexing

mydf[1, 2]

Mixed indexing

mydf['row1', 2]

If the row/column value is empty, all rows/columns are retained.

mydf['row1', ]
mydf[, 'b']

Non-contiguous

mydf[c(1, 3), ]

Boolean

mydf[mydf$a >= 2, ]

List/Data.frame extraction

[ : grab a slice of elements/columns

[[ : grab specific elements/columns

$ : grab specific elements/columns

@: extract slot for S4 objects

my_list_or_df[2:4]
my_list_or_df[['name']]
my_list_or_df$name
my_list@name

In general, position-based indexing should be avoided, except in the case of iterative programming of the sort that will be covered later. The reason is that these become magic numbers when not commented, such that no one will know what they refer to. In addition, any change to the rows/columns of data will render the numbers incorrect, where labels would still be applicable.

Indexing Exercises

This following is a refresher of base R indexing only.

Here is a matrix, a data.frame and a list.

mymatrix = matrix(rnorm(100), 10, 10)
mydf = cars
mylist = list(mymatrix, thisdf = mydf)

Exercise 1

For the matrix, in separate operations, take a slice of rows, a selection of columns, and a single element.

Exercise 2

For the data.frame, grab a column in 3 different ways.

Exercise 3

For the list grab an element by number and by name.