close up of calculator buttons and spreadsheet with pen and paperclip.

Counting the Number of Weekdays from a Date Range in R

Recently, I needed to be able to extract the number of weekdays from a date range and calculate a percentage so that accruals could be mapped to the appropriate month. I wasn’t able to find exactly what I needed to make this work, and it didn’t appear that any packages for R were available to make this any easier, so I built it myself.

The file I worked with contained thousands of records for payroll vouchers. Each row has a pay period start date and a pay period end date. From these two dates, I needed to extract the week days and then create a percentage that would be used for the accrual.

One caveat for you is that this is how I accomplished the task. This is not the only way to do this, but it’s how I did it. As with all code, your mileage may vary, and you will probably have to chop this up to make it work with your data set.

Code Breakdown

In order to get what I needed, I started with a loop that evaluates each row and does some calculations. Below is the full code snippet, and below this code snippet is the breakdown. This code example uses the following packages: dplyr and lubridate for some functions. Make sure to install those or add them to your script using the library() function

for (i in 1:nrow(dat)) {
  x = data.frame("dates_worked" = seq(ymd(dat$Pay.Period.Start[i]), 
      ymd(dat$Pay.Period.End[i]), by="days"))
  x$weekday = !(weekdays(as.Date(x$dates_worked)) %in% c('Saturday','Sunday'))
  x = x %>% filter(weekday == T)
  x$month = month(x$dates_worked)
  
  y = as.data.frame(table(x$month))
  y$Var1 = as.integer(as.character(y$Var1))
  y$tot = sum(y$Freq)
  y$pct = round(y$Freq / y$tot, digits = 1)
  
  min_y = y %>% filter(Var1 == min(Var1))
  max_y = y %>% filter(Var1 == max(Var1))
  
  dat$first_mo[i] = min_y$Var1
  dat$first_mo_pct[i] = min_y$pct
}

dat$accrual = dat$total * dat$first_mo_pct

Line-by-line Explanation

Line 1

for (i in 1:nrow(dat)) { ... }

This row is pretty simple, it starts the for loop that will evaluate all records in the data frame (dat).

Line 2 & 3

x = data.frame("dates_worked" = seq(ymd(dat$Pay.Period.Start[i]), ymd(dat$Pay.Period.End[i]), by="days"))

With this line, we create a new data frame of dates (x) by using the sequence (seq()) function and the Pay.Period.Start and Pay.Period.End date values. This is really the meat and potatoes of the calculation because it gives us the sequence of dates to start our calculations.

Data frame showing a sequence of dates in R

Line 4 & 5

x$weekday = !(weekdays(as.Date(x$dates_worked)) %in% c('Saturday','Sunday'))

This line adds the weekday column to the x data frame. This line of code converts the dates_worked column into a date value, applies the weekdays() function to the date, and asks R to return the inverse (!) of the weekdays that fall on Saturday and Sunday.

Essentially, if the weekday value is Saturday or Sunday, convert the value to FALSE. If the weekday value is Monday, Tuesday, Wednesday, Thursday, or Friday, then convert the value to TRUE.

Data frame showing dates_worked and weekday values in R

x = x %>% filter(weekday == T)

This line uses the filter() function from the dplyr package to filter the weekday column so that it only shows weekdays

Data frame showing dates_worked and weekday columns

Line 6

x$month = month(x$dates_worked)

Line 5 uses the months() function to add a month column to the data frame based on the dates_worked column

Data frame showing dates_worked, weekday, and month columns.

Line 8 & 9

y = as.data.frame(table(x$month))

Now we will create a new data frame (y) and set it to the summary of the month column from the x data frame by using the table() function.

Var1 is the month number, and
Freq is the frequency, or how many times a date appears in that month.

The table() function is a great way to quickly summarize results, but it’s not terribly descriptive. Below is the output of table(x$month). The month numbers are the first row, and the frequency is the second row.

results of table function on x$month value in R

Wrapping the table function in an as.data.frame() function makes for and easy way to do some additional calculations and sorting in the following steps.

Data frame showing Var1 and Freq columns in R

y$Var1 = as.integer(as.character(y$Var1))

When you use the table() function, it doesn’t parse the values or really know what it’s working with. So, by default, it stores the values in the Freq column as a factor. In this line of code, we convert the factor to a character and then convert it to an integer to prep it for some simple math in the next few lines.

You might be thinking, well why don’t you just drop the as.character() function and go straight to the as.integer() function. I had the same thought originally. If you were to convert the factor directly to an integer, you would end up with 1 and 2 instead of 2 and 8. This is because of how factors are stored/used in R.

Line 10 & 11

y$tot = sum(y$Freq)

y$pct = round(y$Freq / y$tot, digits = 1)

This is where the simple math comes into play. Here, we will create two new columns: tot and pct. These two lines total up the number of weekdays for the row being evaluated (tot) and then we create a percentage from the Freq column divided by the tot column.

Data frame showing Var1, Freq, tot, and pct columns.

This tells us that there were a total of 10 weekdays between these two dates and 20% of them were in April and the other 80% of the dates fell in May.

Line 13 & 14

min_y = y %>% filter(Var1 == min(Var1))

max_y = y %>% filter(Var1 == max(Var1))

The next two lines (12 and 13) identify the order of the months (using the month number). Here we filter the y data frame on Var1 (which is the month number) for the min and max values and set those rows to a new data frame (min_y and max_y). This gives us the month number and the percentage that we will add to our main data frame dat.

Line 16, 17, & 20

dat$first_mo[i] = min_y$Var1

dat$first_mo_pct[i] = min_y$pct

These two lines are where it all comes together. Up until these two lines of code, we’ve put a sequence of dates together, filtered out the weekends, counted the weekdays, split them up by month, and have the min and max values ready to go.

Now, we need to add them back to the main data frame (dat) and move on to the next row in the for loop. This is done fairly easily. Since we are still in an iteration of the for loop, all we have to do is tell R where to add the values we are interested in. In this case, we want to create two new columns, first_mo and first_mo_pct columns, and store the corresponding values from the min_y data frame.

Here is where the placement of the loop index [i] is very important. If we were to forget the [i] portion for dat$first_mo, it would store the min_y$Var1 for the last run of the loop in every cell in the data frame for the first_mo column. More than likely you would catch it when every row looks the same for the particular column, but it can be annoying to troubleshoot.

The dat$first_mo[i] = min_y$Var1 line of code can be interpreted as: take the Var1 value from the min_y data frame and store it in the first_mo column in the dat data frame on row i (i is the index that increments as the for loop runs).

Line 20 does a calculation outside the for loop to show how much should accrue and to what month.

final data frame showing pay period start pay_period_end, total, first_mo, first_mo_pct, and accrual total

Conclusion

Calculating the number of weekdays between two date ranges is fairly easy to achieve using some built-in functions in R. One caveat for you is that this is how I accomplished the task. This is not the only way to do this, but it’s how I did it. As with all code, your mileage may vary, and you will probably have to chop this up to make it work with your data set.

If you are just looking to count the number of weekdays between two date ranges, the following code snippet from the above example will help you out. At a high level, you have to create a sequence of dates, check the weekday, and then filter out the weekends. Once you have that, you can use your favorite aggregation method to count them up.

x = data.frame("dates_worked" = seq(ymd(dat$Pay.Period.Start[i]),
ymd(dat$Pay.Period.End[i]), by="days"))

x$weekday = !(weekdays(as.Date(x$dates_worked)) %in% c('Saturday','Sunday'))

x = x %>% filter(weekday == T)

table(x$weekday)