Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up dot product of symmetric matrices #24

Open
phui opened this issue Dec 11, 2019 · 3 comments
Open

Speed up dot product of symmetric matrices #24

phui opened this issue Dec 11, 2019 · 3 comments

Comments

@phui
Copy link

phui commented Dec 11, 2019

I am no expert in matrix computation.

For a symmetric matrix output, skipping computation of entries in the low-triangular area will speed up the computation by roughly a factor of two. This happens a lot when you are trying to find the best matches within the same set, aka self dot-product.

For example, at line 100 of https://github.com/ing-bank/sparse_dot_topn/blob/master/sparse_dot_topn/sparse_dot_topn_source.cpp, can we add a condition to check if head < i, with a flag check on an additional argument symmetric?

@sarimak
Copy link

sarimak commented Jan 10, 2020

While we can safely skip the diagonal for the self dot-product (which contains ones due to the L2-normalization of the rows and the absolute similarity to self is not a real neighbour), reusing values from upper-diagonal matrix for calculating the lower-diagonal matrix IMO is not that easy because of top-N pruning of the neighbors.

When calculating the similarity for (x, y) we can linearly scan the already calculated neighbors if they contain the similarity for (y, x) when x < y but even if not, we cannot assume it is zero or below-treshold and we need to calculate the similarity for (x, y) again. The reason is that maybe the similarity did not fit into top-N for row y but still may fit into top-N for row x.

And the likelihood that we will have to calculate the similarity anyway is quite high. So in most cases we would just make things worse by scanning the symmetric row's neighbours.

@sarimak
Copy link

sarimak commented Jan 30, 2020

Update: We can do it differently - iterate over all rows in the outermost loop, iterate over portion of a row in the inner loop (and the slice length is shorter by one every row = upper triangular matrix) and we can store each calculated similarity to both symmetric positions (x, y) and (y, x). This way we don't have to deal with top N trimming of the neighbors and similarities.

But this makes things more complicated when using multiple threads: While we could evenly divide the work among N threads by splitting the outermost loop (each thread processes its portion of rows), due to shortening of the "columns" in the inner loop the work would not be distributed evenly among the threads.

What's worse: While originally we had embarassingly parallel algorithm with no data races among threads (all threads read the same matrix but each thread writes to own portion of the resulting neighbors and similarities), writing the symmetric similarity now breaks this feature and the threads now need to lock the resulting neighbors/similarities to maintain thread safety. The additional complexity may still pay off but maybe not that much.

@phui
Copy link
Author

phui commented Jan 30, 2020

How about adding a set empty_entries that saves what pairs of (x, y) are below threshold, and always map (y, x) to (x, y) for y > x when performing saves and checks for symmetric output? This way you don't need to sync for empty_entries, since a not-up-to-date empty_entries would just add a bit computation, but doesn't impact correctness of the output.


Update:

Hmm, I realize this is a bad idea for sparse computation...

Maybe you can iterate (x, y) diagonally like

image

While iterating, save results to both (x, y) and (y, x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants