Post

Calculating Time Difference in R

Recently, I needed to calculate the time difference between two times that were exported from a system. The times in the report were formatted like "07:30:00 AM" or “04:00:00 PM". This data in this particular data set does do not span more than one calendar day.

After some digging around, I found that the best function to use for this was difftime(). The difftime() function works by feeding it two timestamps and it returns the difference as a decimal value. If you test this out in the console, you will notice that the results are verbose and says "Time difference of X.XXXX secs". However, when you store the calculation in the data frame, it just displays the decimal value and does not contain the extra text.

1
2
difftime(Sys.time() + 30, Sys.time())
# > Time difference of 29.9999 secs

For my data set, I needed to convert the time to a full timestamp for the difftime() function to work properly. If you don’t feed it a full timestamp (YYYY-MM-DD HH:MM:SS) and only feed it the HH:MM:SS, it will throw an error about not having the origin (it doesn’t know the start/end date to perform the calculation).

Convert the time to a full timestamp

In order to convert the time into a full timestamp, we can use the parse_date_time() function from the Lubridate package. This function is really powerful and converts a string value to the format of your choosing. The format parameter has a lot of options, so I recommend looking at the help guide for this one (type ?parse_date_time in the console).

Here we will feed the parse_date_time() function with our time string and tell it that we want the full timestamp back in 24 hour format.

1
2
parse_date_time("07:30:00 PM", format = "%I:%M:%S %p")
# > "0000-01-01 19:30:00"

Since we didn’t have a date in our value, the function fills it in with the “0000-01-01“, which for my dataset is perfectly acceptable.

Note: you can also use the strptime() function, but it returns the current date instead of “0000-01-01“. If you would rather not install the Lubridate package, you can use the strptime() function to achieve similar results.

Now, we’ll convert the start_time and end_time columns in the data frame to a full timestamp using:

1
2
dat$start_time = parse_date_time(dat$start_time, "%I:%M:%S %p")
dat$end_time = parse_date_time(dat$end_time, "%I:%M:%S %p")

The screenshots below show the before and after values:

Before:

A two column data frame in R showing start_times and end_times as a string

After:

A two column data frame in R showing start_times and end_times formatted as a timestamp with date and time values.

Calculate the time difference

Now that we have full timestamps, we can feed the difftime() function and have it calculate the time difference. With difftime() you can specify the units that the function returns. Instead of “hours”, you can specify “days” or “weeks” so that it returns a different unit. If you leave the units parameter out of the function, it will pick a “suitable set” for you.

From the help documentation: “If units = "auto", a suitable set of units is chosen, the largest possible (excluding “weeks”) in which all the absolute differences are greater than one.“

1
dat$hrs_worked = difftime(dat$end_time, dat$start_time, units = "hours")

And just like that, we have a new column hrs_worked that shows us the hours between the two values.

R data frame showing start_time, end_time, and hrs_worked columns

Conclusion

To calculate the time difference between two time values in R, we first need to convert the time values to a full timestamp (YYYY-MM-DD HH-MM-SS) using the parse_date_time() function from the Lubridate package. Once we have a full timestamp, we can feed the values into the difftime() function and have it calculate the number of hours between the two values.

This post is licensed under CC BY 4.0 by the author.