-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gradient buffer not declared in mnist test #153
base: main
Are you sure you want to change the base?
Conversation
the gradientbuffer is commented out and then never used in training. Throws a compilation error when testing.
What version are you using? julia> @time SimpleChains.train_batched!(
p,
lenetloss,
xtrain4,
SimpleChains.ADAM(3e-4),
10
)
5.798421 seconds (2.23 M allocations: 142.881 MiB, 1.38% gc time, 36.31% compilation time)
44426-element StrideArray{Float32, 1, (1,), Tuple{Int64}, Tuple{Nothing}, Tuple{Static.StaticInt{1}}, Vector{Float32}}:
-0.08823633
0.10486636
0.2803934
-0.18992418
-0.28120536
-0.15369925
0.09791076
-0.024629174
0.36833355
-0.1904188
-0.077544756
0.24522759
-0.22895455
-0.18702294
⋮
-0.3233441
-0.4580964
-0.12954155
0.11518178
0.008558591
0.00065564807
-0.015164471
-0.005364319
-0.02529709
0.038416866
-0.020364538
0.017151155
-0.024268592
0.019850327
julia> if VERSION >= v"1.10"
@test_opt SimpleChains.train_batched!(
p,
lenetloss,
xtrain4,
SimpleChains.ADAM(3e-4),
10
)
end
Test Passed
(@simplechainsm) pkg> st -m SimpleChains
Status `~/.julia/environments/simplechainsm/Manifest.toml`
[de6bee2f] SimpleChains v0.4.6 You can check that tests pass on the main branch. |
So maybe more complicated than I thought ;) I am using an AMD CPU, could that be the cause? Output for the patched version of simplechains: (@v1.9) pkg> dev SimpleChains julia> import SimpleChains (@v1.9) pkg> test SimpleChains
Platform Info: |
this is the output after I remove my patched version of simplechains and add the official package back: (@v1.9) pkg> rm SimpleChains (@v1.9) pkg> add SimpleChains (@v1.9) pkg> test SimpleChains
Platform Info: [333055] signal (11.1): Speicherzugriffsfehler |
The earliest version where it works is 4.2 (@v1.9) pkg> add [email protected] julia> exit() (@v1.9) pkg> test SimpleChains
Platform Info: |
I just tested on the official build of julia 1.9.3 as well as SimpleChains 0.4.6 and noticed that the tests fail if I only use one thread. But starting julia with -t auto -p auto the tests pass. |
I can confirm this is true for me, too. I'll look into it. |
This might be a LoopVectorization issue. It works without crashing or error when I start Julia with # using LoopVectorization: matmul_params, @turbo
using LoopVectorization: matmul_params
macro turbo(ex)
esc(ex)
end
macro turbo(ex0, ex1)
esc(ex1)
end
macro turbo(ex0, ex1, ex2)
esc(ex2)
end |
That works, but is also highly undesirable speedwise. They explicitly write that they dont do any memory or bounds checking in LoopVectorisation.jl. Since it also works when I manually preallocate the gradient buffer, maybe there is an error in the memory allocation in SimpleChains.jl before @turbo is used? with @turbo (2 threads) |
This comment was before your fix: bisecting leaves me at this commit of SimpleChains.jl: I suspected StrideArraysCore but changing to version 0.4.17 didnt help All of these seem to work fine with more than one thread |
the gradientbuffer is commented out and then never used in training. Throws a compilation error when testing.