The proffer
package profiles R code to find bottlenecks. Visit
https://r-prof.github.io/proffer/ for documentation.
https://r-prof.github.io/proffer/reference/index.html has a complete
list of available functions in the package.
This data processing code is slow.
system.time({
n <- 1e5
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
#> user system elapsed
#> 82.060 28.440 110.582
Why exactly does it take so long? Is it because for
loops are slow as
a general rule? Let us find out empirically.
library(proffer)
px <- pprof({
n <- 1e5
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
#> ● url: http://localhost:57517
#> ● host: localhost
#> ● port: 57517
When we navigate to http://localhost:64610
and look at the flame
graph, we see [<-.data.frame()
(i.e. x[i, ] <- x[i, ] + 1
) is taking
most of the runtime.
So we refactor the code to avoid data frame row assignment. Much faster,
even with a for
loop!
system.time({
n <- 1e5
x <- rnorm(n)
y <- rnorm(n)
for (i in seq_len(n)) {
x[i] <- x[i] + 1
y[i] <- y[i] + 1
}
x <- data.frame(x = x, y = y)
})
#> user system elapsed
#> 0.036 0.000 0.041
Moral of the story: before you optimize, throw away your assumptions and run your code through a profiler. That way, you can spend your time optimizing where it counts!
The pprof
server is a background
processx
process, and you can
manage it with the processx
methods described
here. Remember
to terminate the process with $kill()
when you are done with it.
# px is a process handler.
px <- pprof({
n <- 1e4
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
#> ● url: http://localhost:50195
#> ● host: localhost
#> ● port: 50195
# Summary of the background process.
px
#> PROCESS 'pprof', running, pid 10451.
px$is_alive()
# [1] TRUE
# Error messages, some of which do not matter.
px$read_error()
#> [1] "Main binary filename not available.\n"
# Terminate the process when you are done.
px$kill()
As with Jupyter notebooks, you can serve pprof
from one computer and
use it from another computer on the same network. On the server, you
must
- Find the server’s host name or IP address in advance.
- Supply
"0.0.0.0"
as thehost
argument.
system2("hostname")
#> mycomputer
px <- pprof({
n <- 1e4
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
}, host = "0.0.0.0")
#> ● url: http://localhost:610712
#> ● host: localhost
#> ● port: 610712
Then, in the client machine navigate a web browser to the server’s host
name or IP address and use the port number printed above,
e.g. https://mycomputer:61072
.
For old versions of proffer
(0.0.2 and below) refer to these older
installation
instructions
instead of the ones below.
The latest release of proffer
is available on
CRAN.
install.packages("proffer")
Alternatively, you can install the development version from GitHub.
# install.packages("remotes")
remotes::install_github("r-prof/proffer")
The proffer
package requires the RProtoBuf
package, which may
require installation of additional system dependencies on Linux. See its
installation
instructions.
proffer
requires
- Go: https://go.dev/doc/install
- Graphviz: https://www.graphviz.org/download
pprof
: https://github.com/google/pprof (already comes with Go)
pprof
itself is already installed with Go. We highly recommend you use
Go’s default copy of pprof
because compatibility
issues could arise if you
install the latest pprof
manually.
Mac and Windows installers of Go and Graphviz are available at the links
above. On Linux, you can install Go (and thus pprof
) directly from R:
library(proffer)
install_go() # Also installs pprof if on Linux.
First, run pprof_sitrep()
to see if proffer
can already find all the
required non-R dependencies. Then, run test_pprof()
to see if pprof
actually works for you. If both checks pass, you are done with
installation.
Otherwise, open your your .Renviron
file and define special
environment variables that point to system dependencies. The
edit_r_environ()
function in the usethis
package can help
you. Configuration varies according to your platform and installation
method.
PROFFER_PPROF_BIN=/home/YOU/go/pkg/tool/linux_amd64/pprof
PROFFER_GO_BIN=/home/YOU/go/bin/go
PROFFER_GRAPHVIZ_BIN=/usr/bin/dot
PROFFER_PPROF_BIN=/usr/local/bin/pprof
PROFFER_GO_BIN=/usr/local/bin/go
PROFFER_GRAPHVIZ_BIN=/usr/local/bin/dot
PROFFER_PPROF_BIN=C:\Go\pkg\tool\windows_amd64\pprof.exe
PROFFER_GO_BIN=C:\Go\bin\go.exe
PROFFER_GRAPHVIZ_BIN="C:\Program Files (x86)\Graphviz2.38\bin\dot.exe"
Run pprof_sitrep()
again to verify that everything is installed and
configured correctly.
library(proffer)
pprof_sitrep()
#> • Call test_pprof() to test installation.
#>
#> ── Requirements ────────────────────────────────────────────────────────────────
#> ✔ pprof '�]8;;file:///home/landau/go/pkg/tool/linux_amd64/pprof�/home/landau/go/pkg/tool/linux_amd64/pprof�]8;;�'
#> ✔ Graphviz '�]8;;file:///usr/bin/dot�/usr/bin/dot�]8;;�'
#>
#> ── Go ──────────────────────────────────────────────────────────────────────────
#> ✔ Go binary '�]8;;file:///home/landau/go/bin/go�/home/landau/go/bin/go�]8;;�'
#> ✔ Go folder '�]8;;file:///home/landau/go�/home/landau/go�]8;;�'
#>
#> ── Custom ──────────────────────────────────────────────────────────────────────
#> ✔ `PROFFER_PPROF_BIN` '�]8;;file:///home/landau/go/pkg/tool/linux_amd64/pprof�/home/landau/go/pkg/tool/linux_amd64/pprof�]8;;�'
#> ✔ `PROFFER_GO_BIN` '�]8;;file:///home/landau/go/bin/go�/home/landau/go/bin/go�]8;;�'
#> ✔ `PROFFER_GRAPHVIZ_BIN` '�]8;;file:///usr/bin/dot�/usr/bin/dot�]8;;�'
#>
#> ── System ──────────────────────────────────────────────────────────────────────
#> ℹ pprof system path missing '�]8;;file:///home/landau/go/bin/pprof�/home/landau/go/bin/pprof�]8;;�'
#> • See <�]8;;https://github.com/google/pprof�https://github.com/google/pprof�]8;;�> to install pprof.
#> ✔ Go binary system path '�]8;;file:///usr/bin/go�/usr/bin/go�]8;;�'
#> ✔ Graphviz system path '�]8;;file:///usr/bin/dot�/usr/bin/dot�]8;;�'
#>
#> ── Deprecated ──────────────────────────────────────────────────────────────────
#> ✔ `pprof_path` env variable omitted.
If all dependencies are accounted for, proffer
should work. Test it
out with test_pprof()
. On a local machine, it should launch a browser
window showing an instance of pprof
.
library(proffer)
test_pprof()
We encourage participation through
issues and pull
requests. proffer
has a
Contributor Code of
Conduct.
By contributing to this project, you agree to abide by its terms.
Profilers identify bottlenecks, but the do not offer solutions. It helps to learn about fast code in general so you can think of efficient alternatives to try.
- http://adv-r.had.co.nz/Performance.html
- https://www.r-bloggers.com/2016/01/strategies-to-speedup-r-code/
- https://www.r-bloggers.com/2013/04/faster-higher-stonger-a-guide-to-speeding-up-r-code-for-busy-people/
- https://cran.r-project.org/package=data.table/vignettes/datatable-intro.html
The profvis
is much easier to
install than proffer
and equally easy to invoke.
library(profvis)
profvis({
n <- 1e5
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
However, profvis
-generated flame graphs can be difficult to
read and slow to
respond to mouse clicks.
proffer
uses pprof
to create
friendlier, faster visualizations.