Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup matrix multiplications #129

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

CarloLucibello
Copy link
Contributor

@CarloLucibello CarloLucibello commented Dec 4, 2020

While multiplication of generic matrices by Zeros is already handled,
multiplication by Ones is not, while for Fill we only have right multiplication.

As you can from the benchmarks below, currently we have the follwing:

  • For Ones: left and right multiplication are on par with Array mul. up to n=5, than they degrade very very badly
  • For Fill:
    • we currently handle right multiplication, with performance slightly worse than Array mul at any n
    • left mult. is as bas the Ones case

After this PR, we improve a lot on large matrices, but we have an hit on very small ones. I'm not sure what we should do here, maybe branch based on the input matrix size?

using FillArrays, BenchmarkTools
import Base: *
using Test

for n in [2, 5, 10, 20, 200, 2000]
    x, y = rand(n, n), rand(n, n)
    o, f = Ones(n, n), Fill(1.0, n, n)

    println("RIGHT n=$n")
    @btime $x * $y
    @btime $x * $o
    @btime $x * $f

    println("LEFT n=$n")
    @btime $x * $y
    @btime $o * $y
    @btime $f * $y

    println()
end

BEFORE this PR

RIGHT n=2
  41.888 ns (1 allocation: 112 bytes)
  37.016 ns (1 allocation: 112 bytes)
  70.724 ns (2 allocations: 224 bytes)
LEFT n=2
  39.934 ns (1 allocation: 112 bytes)
  35.955 ns (1 allocation: 112 bytes)
  38.029 ns (1 allocation: 112 bytes)

RIGHT n=5
  191.627 ns (1 allocation: 288 bytes)
  314.760 ns (5 allocations: 512 bytes)
  232.231 ns (2 allocations: 576 bytes)
LEFT n=5
  206.960 ns (1 allocation: 288 bytes)
  313.476 ns (5 allocations: 512 bytes)
  312.221 ns (5 allocations: 512 bytes)

RIGHT n=10
  361.573 ns (1 allocation: 896 bytes)
  916.708 ns (5 allocations: 1.09 KiB)
  421.156 ns (2 allocations: 1.75 KiB)
LEFT n=10
  349.810 ns (1 allocation: 896 bytes)
  1.000 μs (5 allocations: 1.09 KiB)
  1.017 μs (5 allocations: 1.09 KiB)

RIGHT n=20
  1.177 μs (1 allocation: 3.25 KiB)
  6.289 μs (5 allocations: 3.47 KiB)
  1.546 μs (2 allocations: 6.50 KiB)
LEFT n=20
  1.153 μs (1 allocation: 3.25 KiB)
  5.824 μs (5 allocations: 3.47 KiB)
  5.801 μs (5 allocations: 3.47 KiB)

RIGHT n=200
  125.403 μs (2 allocations: 312.58 KiB)
  6.416 ms (8 allocations: 312.91 KiB)
  157.784 μs (4 allocations: 625.16 KiB)
LEFT n=200
  119.907 μs (2 allocations: 312.58 KiB)
  6.929 ms (8 allocations: 312.91 KiB)
  6.752 ms (8 allocations: 312.91 KiB)

RIGHT n=2000
  288.344 ms (2 allocations: 30.52 MiB)
  9.324 s (8 allocations: 30.52 MiB)
  271.799 ms (4 allocations: 61.04 MiB)
LEFT n=2000
  291.239 ms (2 allocations: 30.52 MiB)
  8.874 s (8 allocations: 30.52 MiB)
  8.580 s (8 allocations: 30.52 MiB)

AFTER this PR

RIGHT n=2
  41.765 ns (1 allocation: 112 bytes)
  316.559 ns (7 allocations: 384 bytes)
  313.585 ns (7 allocations: 384 bytes)
LEFT n=2
  38.332 ns (1 allocation: 112 bytes)
  135.900 ns (3 allocations: 304 bytes)
  139.733 ns (3 allocations: 304 bytes)

RIGHT n=5
  189.016 ns (1 allocation: 288 bytes)
  401.985 ns (7 allocations: 624 bytes)
  383.285 ns (7 allocations: 624 bytes)
LEFT n=5
  184.575 ns (1 allocation: 288 bytes)
  224.525 ns (3 allocations: 544 bytes)
  225.150 ns (3 allocations: 544 bytes)

RIGHT n=10
  335.204 ns (1 allocation: 896 bytes)
  623.455 ns (7 allocations: 1.27 KiB)
  624.654 ns (7 allocations: 1.27 KiB)
LEFT n=10
  362.707 ns (1 allocation: 896 bytes)
  450.010 ns (3 allocations: 1.19 KiB)
  494.892 ns (3 allocations: 1.19 KiB)

RIGHT n=20
  1.153 μs (1 allocation: 3.25 KiB)
  1.223 μs (7 allocations: 3.80 KiB)
  1.288 μs (7 allocations: 3.80 KiB)
LEFT n=20
  1.159 μs (1 allocation: 3.25 KiB)
  1.399 μs (3 allocations: 3.72 KiB)
  1.400 μs (3 allocations: 3.72 KiB)

RIGHT n=200
  118.943 μs (2 allocations: 312.58 KiB)
  46.169 μs (8 allocations: 316.19 KiB)
  46.383 μs (8 allocations: 316.19 KiB)
LEFT n=200
  120.457 μs (2 allocations: 312.58 KiB)
  89.995 μs (4 allocations: 316.11 KiB)
  87.646 μs (4 allocations: 316.11 KiB)

RIGHT n=2000
  212.879 ms (2 allocations: 30.52 MiB)
  6.143 ms (8 allocations: 30.55 MiB)
  6.172 ms (8 allocations: 30.55 MiB)
LEFT n=2000
  232.102 ms (2 allocations: 30.52 MiB)
  20.061 ms (4 allocations: 30.55 MiB)
  20.516 ms (4 allocations: 30.55 MiB)

src/fillalgebra.jl Show resolved Hide resolved
*(x::ZerosMatrix, y::OnesMatrix) = mult_zeros(x, y)
*(x::ZerosMatrix, y::FillMatrix) = mult_zeros(x, y)
*(x::FillMatrix, y::ZerosMatrix) = mult_zeros(x, y)
*(x::OnesMatrix, y::ZerosMatrix) = mult_zeros(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are massively underestimating the number of ambiguities we need to take into account. For example we need to take into account AdjointAbsVec, TransposeAbsVec, AbstractTriangular, Adjoint, Transpose, .... A somewhat more exhaustive list is here:

https://github.com/JuliaMatrices/ArrayLayouts.jl/blob/9fbed3a8e49b26d4ccff5608635b8c986f702b7a/src/mul.jl#L158

It's also a bad idea to overload Ones and Fills seperatly since that just means more ambiguities downstream. The following is better:

*(a::AbstractMatrix, b::AbstractFill{<:Any,2}) = fill_mul(a, b)
fill_mul(a, b::Ones) = ...
fill_mul(a, b::Zeros) = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully I addressed both concerns

Copy link
Contributor

@mcabbott mcabbott Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that overloading something other than * might be simpler, and avoid all ambiguities? They are pretty thorny, as every other package that tries to play the same game introduces new possibilities...

Without special handling, it will end up after similar at some generic_mul!(C, A::Fill, B). Could one overload that instead?

Or maybe that's too late, since similar(::Filll)::Array, but is there somewhere in the chain between * and BLAS or whatever at which this could be caught?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is precisely what ArrayLayouts.jl is designed to avoid, and would also give us the mul! variants. One option would be to make FillArrays.jl depend on ArrayLayouts.jl (it’s currently the other way around).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait how does that work? Is this only if you call something special like ArrayLayouts.mul or does it stick its fingers into ordinary A * B?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would do AbstractFill <: LayoutArray which then catches all the necessary * ovverrides

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it. Overloading for one supertype stops all its children from stepping on each others' toes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although that's probably the right direction, It involves a lot of churn and the coordination of two packages, so I would merge this as it is for the time being

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the coordination of two packages

FYI I wrote both packages so this isn't quite so bad. The only reason I haven't done this is FillArrays.jl is clean and simple, so needs a good reason to make it more complicated. I'm not sure we are there yet.

@codecov
Copy link

codecov bot commented Dec 4, 2020

Codecov Report

Merging #129 (5a999ba) into master (ba49d16) will increase coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #129      +/-   ##
==========================================
+ Coverage   91.36%   91.41%   +0.05%     
==========================================
  Files           4        4              
  Lines         463      466       +3     
==========================================
+ Hits          423      426       +3     
  Misses         40       40              
Impacted Files Coverage Δ
src/fillalgebra.jl 99.20% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba49d16...8350786. Read the comment docs.

@CarloLucibello
Copy link
Contributor Author

Now I'm testing that we have no ambiguity error when multiplying a fill by Adjoint / Transpose / Triangular / Symmetric.
Is there any other type I'm missing?

@dlfivefifty
Copy link
Member

Please bump the minor version and I'll merge (needs to minor since new ambiguities counts as breaking)

@CarloLucibello
Copy link
Contributor Author

merge?

@dlfivefifty
Copy link
Member

Added a few more tests, lets see if they pass

@dlfivefifty
Copy link
Member

The tests I added picked up a ton of ambiguities. I think we also need similar tests for Symmetric{<:Any,<:AbstractFill}, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants