Post

Two Ways to Benchmark Your R Scripts

Sometimes you need to know how long your scripts take to process in order to make improvements or to get an idea of how long it will take to process in the future. Most times, I could care less about how long my scripts take to process because they generally run in just a few seconds. Other times, it’s nice to know exactly how long it took the script to run end-to-end. Regardless of the reason, I’ll show you the base R way to time your scripts and then I’ll show you the tictoc package for timing your scripts.

The Base R Way

The easiest way to time your scripts using base R functionality is to use the Sys.time() function. At the top of your script, you’ll store the current time in a variable. I generally use start_time. Similarly, at the end of your script, you’ll store the current time in a different variable, end_time. The variable name doesn’t matter–they just have to be different.

Below is a simple example that times how long it takes your computer to count from 1 to 1,000,000. When the entire code chunk is executed, R will grab the current time at the start of the script and store it in the start_time variable, print out each number from 1 to 1,000,000, and finally, set the end_time variable to the current time when it finishes. Finally, it calculates the time difference between the two variables.

In this case, it took my computer 17.26 seconds to process this chunk of code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
start_time = Sys.time()

for (i in 1:1000000) {
  print(i)
}

end_time = Sys.time()

end_time - start_time

------------------------
Console Export:
------------------------
> end_time - start_time
Time difference of 17.25818 secs

If you were to remove the print(i) line from the code and reprocess, you’ll notice a much faster processing time. This is because R doesn’t have to waste time printing each number out to the console before it moves to the next iteration in the for loop. The processing time went from 17 seconds to 0.0075 seconds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
start_time = Sys.time()

for (i in 1:1000000) {

}

end_time = Sys.time()

end_time - start_time

---

## Console Export:

> end_time - start_time
Time difference of 0.007505178 secs

Using the base R way to time your scripts is very simple to implement, but it can be a little annoying if you wanted to time multiple chunks of your script to see how long particular sections are taking to process. The example below will illustrate this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
start_time = Sys.time()

for (i in 1:1000000) {
print(i)
}

end_time = Sys.time()

for1 = end_time - start_time

start_time = Sys.time()

for (i in 1:1000000) {
a = i+1
}

end_time = Sys.time()

for2 = end_time - start_time

paste0("For loop 1: ", for1)
paste0("For loop 2: ", for2)

---

## Console Export:

> paste0("For loop 1: ", for1)
> [1] "For loop 1: 19.3057639598846"

> paste0("For loop 2: ", for2)

[1] "For loop 2: 0.0614628791809082"

We can see that the first for loop took 19 seconds to run, and the second for loop took 0.06 seconds to run. It’s not hard to implement, but you have to put more thought into it when you’re timing your sections.

This is where the tictoc R package comes in to make things super easy.

The ‘tictoc’ Package

The tictoc package for R was built by Sergei Izrailev, and as of today, it is regularly maintained. Implementing the tictoc package in your script is fairly straightforward.

If you haven’t already done so, make sure to install the package by using the following command in the R or RStudio console window: install.packages('tictoc').

Once you have it installed, make sure to add the package by inserting library(tictoc) at the top of your script.

Instead of having to use Sys.time() around the code chunks and then using some sort of paste0() command to calculate the time difference and provide you with a description of what was being timed, you’ll use the tic() and toc() commands.

The neat thing about the tic() function, is that it takes a description for the chunk of code you will be timing. I’ll be timing both for loops in this example, so at the start of the first for loop, I’ll add tic("For loop 1"). Once R processes a toc() command, the tictoc package will automatically perform the time difference calculation and print out the results using any description provided in the tic() command.

The tictoc package automatically grabs the current time when it processes the tic(...) command. Once it reaches a toc() command, it grabs the new time and automatically does the time difference calculation and prints it to the screen using the description from the preceding tic(...) command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
library(tictoc)

start_time = tic("For loop 1")

for (i in 1:1000000) {
print(i)
}

toc()

---

## Console Export:

> toc()
> For loop 1: 17.455 sec elapsed

The full script from above looks like this when switched over to use the tictoc package.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
library(tictoc)

start_time = tic("For loop 1")

for (i in 1:1000000) {
print(i)
}

toc()

start_time = tic()

for (i in 1:1000000) {
a = i+1
}

x = toc()

---

## Console Export:

...
[1] 1000000

> toc()

For loop 1: 17.455 sec elapsed

...

> toc()
> For loop 2: 0.067 sec elapsed

I find the tictoc package to be much simpler and faster to use when I want to time the full script or just a portion of a script.

By no means is the tictoc package the only benchmarking/timing package available for R–there are many packages that will do the same things and some with visualization functionality. The tictoc package has the perfect blend of simplicity and functionality that I need when I am trying to time my scripts.

Conclusion

In this post, I showed you two ways to benchmark your scripts or chunks of your script to see where improvements can be made. The first option is to use the Sys.time() function from the base R language. The Sys.time() function can be used at the start and end of your script (using different variables). At the end of the script, you’ll need to do the time difference calculation to see how long it took for your code to execute.

The second way we covered, was to use the tictoc package. the tictoc package allows you to use tic(...) at the beginning and toc() at the end of your script. When your script reaches toc(), the tictoc package will automatically perform the time difference calculation and print the results to the screen. The other benefit of the tictoc package, is that the tic() function takes a value argument that will serve as a description for the code chunk that is being evaluated, which saves you from having to insert a paste0() command to identify the code being evaluated.

This post is licensed under CC BY 4.0 by the author.