You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just to track the performance of the Dense layer on CPU. I use the following script:
using BenchmarkTools, Flux
using Zygote: pullback
using LinearAlgebra
BLAS.set_num_threads(1)
functionperf_test(n)
r =rand(Float32, n, n, relu)
d =Dense(n, n)
println(" FORW")
@btimesum($d($r))
println(" GRADIENT")
@btimegradient(() ->sum($d($r)), $(Flux.params(d)))
@btimegradient((d) ->sum(d($r)), $d)
println(" PULLBACK")
y, back =pullback((d) ->sum(d(r)), d)
@btimepullback((d) ->sum(d($r)), $d)
@btime$back(1f0)
endprintln("SMALL NET n=2")
perf_test(2)
println("MEDIUM NET n=20")
perf_test(20)
println("LARGE NET n=200")
perf_test(200)
println("VERY LARGE NET n=2000")
perf_test(2000)
and on my system:
julia>versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-0913:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU:Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
WORD_SIZE:64
LIBM: libopenlibm
LLVM: libLLVM-10.0.1 (ORCJIT, skylake)
(Flux) pkg> st
Project Flux v0.12.0-dev
Status `~/.julia/dev/Flux/Project.toml`
[1520ce14] AbstractTrees v0.3.3
[79e6a3ab] Adapt v2.3.0
[052768ef] CUDA v2.3.0
[944b1d66] CodecZlib v0.7.0
[5ae59095] Colors v0.12.4
[d9f16b24] Functors v0.1.0
[e5e0dc1b] Juno v0.8.4
[1914dd2f] MacroTools v0.5.6
[872c559c] NNlib v0.7.7
[189a3867] Reexport v0.2.0
[2913bbd2] StatsBase v0.33.2
[a5390f91] ZipFile v0.9.3
[e88e6eb3] Zygote v0.5.15
[8bb1440f] DelimitedFiles
[37e2e46d] LinearAlgebra
[44cfe95a] Pkg
[de0858da] Printf
[9a3f8284] Random
[ea8e919c] SHA
[10745b16] Statistics
[8dfed614] Test
I obtain the following output
SMALL NET n=2
FORW
99.930 ns (2 allocations:192 bytes)
GRADIENT
2.096 μs (40 allocations:2.92 KiB)
1.045 μs (31 allocations:1.77 KiB)
PULLBACK
167.077 ns (5 allocations:512 bytes)
814.164 ns (24 allocations:928 bytes)
MEDIUM NET n=20
FORW
1.049 μs (2 allocations:3.53 KiB)
GRADIENT
5.334 μs (38 allocations:12.95 KiB)
4.222 μs (31 allocations:11.86 KiB)
PULLBACK
1.383 μs (5 allocations:5.52 KiB)
2.747 μs (24 allocations:5.98 KiB)
LARGE NET n=200
FORW
205.632 μs (4 allocations:312.66 KiB)
GRADIENT
643.443 μs (44 allocations:941.05 KiB)
626.491 μs (37 allocations:939.95 KiB)
PULLBACK
219.883 μs (8 allocations:469.20 KiB)
405.049 μs (27 allocations:470.39 KiB)
VERY LARGE NET n=2000
FORW
214.841 ms (4 allocations:30.52 MiB)
GRADIENT
637.410 ms (44 allocations:91.56 MiB)
637.142 ms (37 allocations:91.56 MiB)
PULLBACK
217.240 ms (8 allocations:45.78 MiB)
418.468 ms (27 allocations:45.78 MiB)
Some observations:
the expected O(n^3) asymptotic scaling only kicks in at the largest sizes
the pullback is ~2x slower than the forward (and even slower at very small sizes)
for the smallest networks, there is a significant speed difference between the 2 grandient's calling styles.
The text was updated successfully, but these errors were encountered:
Maybe this should use Dense(n, n, relu), as gradient(sum, rand(Float32, 3))[1] isa Zygote.Fill which I think gets you a generic * (but wouldn't happen in real use), while gradient(x->sum(relu, x), rand(Float32, 3))[1] isa Array. This shaves off an order of magnitude at n=200.
This is just to track the performance of the Dense layer on CPU. I use the following script:
and on my system:
I obtain the following output
Some observations:
grandient
's calling styles.The text was updated successfully, but these errors were encountered: