Recently, I needed to be able to extract the number of weekdays from a date range and calculate a percentage so that accruals could be mapped to the appropriate month. I wasn’t able to find exactly what I needed to make this work, and it didn’t appear that any packages for R were available to make this any easier, so I built it myself.

The file I worked with contained thousands of records for payroll vouchers. Each row has a pay period start date and a pay period end date. From these two dates, I needed to extract the week days and then create a percentage that would be used for the accrual.

One caveat for you is that this is how I accomplished the task. This is not the only way to do this, but it’s how I did it. As with all code, your mileage may vary, and you will probably have to chop this up to make it work with your data set.

## Code Breakdown

In order to get what I needed, I started with a loop that evaluates each row and does some calculations. Below is the full code snippet, and below this code snippet is the breakdown. This code example uses the following packages: `dplyr`

and `lubridate`

for some functions. Make sure to install those or add them to your script using the `library()`

function

for (i in 1:nrow(dat)) { x = data.frame("dates_worked" = seq(ymd(dat$Pay.Period.Start[i]), ymd(dat$Pay.Period.End[i]), by="days")) x$weekday = !(weekdays(as.Date(x$dates_worked)) %in% c('Saturday','Sunday')) x = x %>% filter(weekday == T) x$month = month(x$dates_worked) y = as.data.frame(table(x$month)) y$Var1 = as.integer(as.character(y$Var1)) y$tot = sum(y$Freq) y$pct = round(y$Freq / y$tot, digits = 1) min_y = y %>% filter(Var1 == min(Var1)) max_y = y %>% filter(Var1 == max(Var1)) dat$first_mo[i] = min_y$Var1 dat$first_mo_pct[i] = min_y$pct } dat$accrual = dat$total * dat$first_mo_pct

## Line-by-line Explanation

### Line 1

`for (i in 1:nrow(dat)) { ... }`

This row is pretty simple, it starts the `for`

loop that will evaluate all records in the data frame (`dat`

).

### Line 2 & 3

`x = data.frame("dates_worked" = seq(ymd(dat$Pay.Period.Start[i]), ymd(dat$Pay.Period.End[i]), by="days"))`

With this line, we create a new data frame of dates (`x`

) by using the sequence (`seq()`

) function and the `Pay.Period.Start`

and `Pay.Period.End`

date values. This is really the meat and potatoes of the calculation because it gives us the sequence of dates to start our calculations.

### Line 4 & 5

`x$weekday = !(weekdays(as.Date(x$dates_worked)) %in% c('Saturday','Sunday'))`

This line adds the `weekday`

column to the `x`

data frame. This line of code converts the `dates_worked`

column into a date value, applies the `weekdays()`

function to the date, and asks R to return the inverse (`!`

) of the `weekdays`

that fall on Saturday and Sunday.

Essentially, if the `weekday`

value is Saturday or Sunday, convert the value to `FALSE`

. If the weekday value is Monday, Tuesday, Wednesday, Thursday, or Friday, then convert the value to `TRUE`

.

`x = x %>% filter(weekday == T)`

This line uses the `filter()`

function from the `dplyr`

package to filter the `weekday`

column so that it only shows weekdays

### Line 6

`x$month = month(x$dates_worked)`

Line 5 uses the `months()`

function to add a month column to the data frame based on the `dates_worked`

column

### Line 8 & 9

`y = as.data.frame(table(x$month))`

Now we will create a new data frame (`y`

) and set it to the summary of the month column from the `x`

data frame by using the `table()`

function.

`Var1`

is the month number, and `Freq`

is the frequency, or how many times a date appears in that month.

The `table()`

function is a great way to quickly summarize results, but it’s not terribly descriptive. Below is the output of `table(x$month)`

. The month numbers are the first row, and the frequency is the second row.

Wrapping the table function in an `as.data.frame()`

function makes for and easy way to do some additional calculations and sorting in the following steps.

`y$Var1 = as.integer(as.character(y$Var1))`

When you use the `table()`

function, it doesn’t parse the values or really know what it’s working with. So, by default, it stores the values in the `Freq`

column as a factor. In this line of code, we convert the factor to a character and then convert it to an integer to prep it for some simple math in the next few lines.

You might be thinking, well why don’t you just drop the `as.character()`

function and go straight to the `as.integer()`

function. I had the same thought originally. If you were to convert the factor directly to an integer, you would end up with 1 and 2 instead of 2 and 8. This is because of how factors are stored/used in R.

### Line 10 & 11

`y$tot = sum(y$Freq)`

`y$pct = round(y$Freq / y$tot, digits = 1)`

This is where the simple math comes into play. Here, we will create two new columns: `tot`

and `pct`

. These two lines total up the number of weekdays for the row being evaluated (`tot`

) and then we create a percentage from the `Freq`

column divided by the `tot`

column.

This tells us that there were a total of 10 weekdays between these two dates and 20% of them were in April and the other 80% of the dates fell in May.

### Line 13 & 14

`min_y = y %>% filter(Var1 == min(Var1))`

`max_y = y %>% filter(Var1 == max(Var1))`

The next two lines (12 and 13) identify the order of the months (using the month number). Here we filter the `y`

data frame on `Var1`

(which is the month number) for the min and max values and set those rows to a new data frame (`min_y`

and `max_y`

). This gives us the month number and the percentage that we will add to our main data frame `dat`

.

### Line 16, 17, & 20

`dat$first_mo[i] = min_y$Var1`

`dat$first_mo_pct[i] = min_y$pct`

These two lines are where it all comes together. Up until these two lines of code, we’ve put a sequence of dates together, filtered out the weekends, counted the weekdays, split them up by month, and have the min and max values ready to go.

Now, we need to add them back to the main data frame (`dat`

) and move on to the next row in the `for`

loop. This is done fairly easily. Since we are still in an iteration of the for loop, all we have to do is tell R where to add the values we are interested in. In this case, we want to create two new columns, `first_mo`

and `first_mo_pct`

columns, and store the corresponding values from the `min_y`

data frame.

Here is where the placement of the loop index `[i]`

is very important. If we were to forget the `[i]`

portion for `dat$first_mo`

, it would store the `min_y$Var1`

for the last run of the loop in every cell in the data frame for the `first_mo`

column. More than likely you would catch it when every row looks the same for the particular column, but it can be annoying to troubleshoot.

The `dat$first_mo[i] = min_y$Var1`

line of code can be interpreted as: take the `Var1`

value from the `min_y`

data frame and store it in the `first_mo`

column in the `dat`

data frame on row `i`

(`i`

is the index that increments as the `for`

loop runs).

Line 20 does a calculation outside the for loop to show how much should accrue and to what month.

## Conclusion

Calculating the number of weekdays between two date ranges is fairly easy to achieve using some built-in functions in R. One caveat for you is that this is how I accomplished the task. This is not the only way to do this, but it’s how I did it. As with all code, your mileage may vary, and you will probably have to chop this up to make it work with your data set.

If you are just looking to count the number of weekdays between two date ranges, the following code snippet from the above example will help you out. At a high level, you have to create a sequence of dates, check the weekday, and then filter out the weekends. Once you have that, you can use your favorite aggregation method to count them up.

`x = data.frame("dates_worked" = seq(ymd(dat$Pay.Period.Start[i]),`

ymd(dat$Pay.Period.End[i]), by="days"))

`x$weekday = !(weekdays(as.Date(x$dates_worked)) %in% c('Saturday','Sunday'))`

`x = x %>% filter(weekday == T)`

`table(x$weekday)`