Add columns
We often want to add new columns to our data set and calculate their value based on existing columns. This chapter introduces how to do this with dplyr.
Last updated
Was this helpful?
We often want to add new columns to our data set and calculate their value based on existing columns. This chapter introduces how to do this with dplyr.
Last updated
Was this helpful?
This chapter introduces the following new concepts and functions:
The -function for adding new columns or changing existing ones.
The transmute
-function as an alternative to mutate
to only keep the new columns and the columns involved in the calculation.
Conversion functions as.character
, as.double
, as.integer
, as.logical, and
factor
to cast a data type onto a column.
Functions to extract parts of a date column, such as year
, month
, day
, weekday
, etc., mainly from the -package.
mutate
commandWe find ourselves in the need to expand the existing set of columns with new information. This information is usually derived from the existing columns. A good example is the turnover for our Campusbier data set. The table line_items
contains the price and the quantity, but not the turnover. Thus, we can calculate a new column turnover
as price * quantity
.
For working with date and time columns, the package lubridate
offers useful functions:
We can extract the calendar week with the function week
:
For the month, the equivalent function is month
:
A function with the corresponding name quarter
:
No surprises here:
We often want to bring a date to a higher level of aggregation. For example, if we aim to aggregate dates (daily) to full weeks, months, quarters, or years, we can use ceiling_date
and floor_date
to round a date down or up to achieve the desired format:
To perform some operations, such as statistical applications, or to visualize data properly, we may need to transform a column into a different data type. When loading a data set, the readr
functions like read_csv
are good at guessing the underlying data type for each column. Occasionally, they guess wrong, and that's when mutate comes to the rescue:
There is a function to cast a column to each existing data type in R:
as.character
to cast a column to a string. This should always work, as strings are the most general data type.
as.double
to cast a column to a number with decimal places. This only works when the original values are, in fact, numeric. Values that cannot be converted into a decimal number are replaced with NA
.
as.integer
to cast a column to a whole number. Analogous to as.double
, this only works when the values contain solely the characters 0-9.
as.logical
to cast a column to a boolean value. This works when the values contain strings with the values "TRUE" and "FALSE" or "0" and "1". Similar, the numeric values 0 and 1 can also be converted to TRUE
and FALSE
.
lubridate::as.Date
lubridate::as.POSIXct
A compact introduction is also provided in from the book "R for Data Science" by Hadley Wickham.