Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

manish7294 · 2018-06-15T04:35:38Z

Benchmarking scripts for Mlpack, Shogun & Matlab's LMNN.

zoq · 2018-06-18T14:00:17Z

methods/shogun/lmnn.py

+            raise Exception("unknown parameters")
+
+          # Perform LMNN.
+          prep = ShogunLMNN(feat, labels, k)


Let's use self.k here.

Right, I again missed pushing the changes :)

vigsterkr · 2018-06-18T14:00:58Z

@iglesias @manish7294 has some question about LMNN implementation in shogun :)

manish7294 · 2018-06-18T15:39:38Z

@iglesias Hi! Hope you are doing good :)
I was doing benchmarking of LMNN on both Mlpack and Shogun. I am using shogun's KNN for measuring the accuracy metric on transformed data. But here comes the problem, if I used query set same as the training set then the KNN consider the query point itself as the first nearest neighbor(giving 100% accuracy for k =1) but it's not same as what Mlpack's KNN do, Mlpack's KNN avoid the situation of same point being the nearest neighbor and hence in this case Shogun's KNN second nearest neighbor is Mlpack's KNN first nearest neighbor.
So, can you think of a way so that both the libraries are on the same page for benchmarking purpose? Or, Is there any way to avoid the point itself being consider as first nearest neighbor in Shogun. Any kind of help will be appreciated :)

iglesias · 2018-06-19T09:48:32Z

Hi @manish7294. So this is about KNN.

What about the following: in mlpack use KNN with k=k' and in Shogun KNN with k=k'+1. Then you could compare the nearest neighbors given by mlpack with the 2nd to (k+1)th nearest neighbors given by Shogun.

Does this make sense to you?

manish7294 · 2018-06-19T11:36:42Z

@iglesias Thanks for looking into this. We thought about that but it would create problem in calculating predictions. As in case of Shogun's KNN prediciton, the point itself will always be counted as the positive nearest neighbor and will definitely give some amount of error in the measuring accuracy.
Let me take an example: Let's say have 2 classes we want to have k = 2, so here we will be taking Mlpack's k as k = 2 and Shogun's k as k = 3. And suppose we got the first neighbor as of class 1 and second of class 2. So, ideally(as per what we do in Mlpack's KNN) we should be predicting point as of class 1. But here earlier we took Shogun's k as k as k = 3 and suppose in this case the point itself has label 2 as it will always be counted in the prediction), giving predicition as label 2 which would be different from Mlpack's KNN.
On a second thought, If I assume that Shogun has a distance weighted KNN(Not sure) running behind the scenes, then the output will always be the labels of the point itself, no matter what k we choose.

rcurtin · 2018-06-19T13:11:19Z

Would it be possible to just get the k+1 nearest neighbors from Shogun itself? If that is the case, then we can do the classification ourselves (and ensure that we are always doing it the same way). If we can't get those from Shogun, maybe it is better to use a different technique for the nearest neighbor search. I don't think it makes such a huge difference either way, since we aren't timing the nearest neighbor search, we are just using it for finding the true nearest neighbors to calculate a metric.

manish7294 · 2018-06-19T16:56:17Z

Thanks all for helping out. Hopefully, the new custom KNN accuracy predictor will work as expected :)

rcurtin · 2018-06-19T18:35:44Z

methods/shogun/lmnn.py

+              count += 1
+
+      accuracy = (count / NN.shape[1]) * 100
+      return [time, accuracy]


This looks correct to me. But I think maybe it would be useful to split this into a separate function that we could put into methods/metrics/? It seems to me like we could calculate a few metrics using variations of this code (but that code could all be one function):

1-NN accuracy

3-NN accuracy

distance-weighted 3-NN accuracy

5-NN accuracy

distance-weighted 5-NN accuracy

And the input to the function could be the nearest neighbor indices and distances and the labels, plus the number of neighbors to use in the metric calculation and whether or not to use distance calculation. Does that make sense? What do you think?

This seems about right. But one thing bugging me --- You said we can calculate 3-NN, 5-NN from this code, which is possible but what significance would 5-NN have if we train our lmnn let's say for k = 3.

Right, so LMNN is optimizing for k=3 if you run it with k=3, but it may still improve the 5-NN performance. Given that it would be so easy to add, I don't see any issue with doing that, but you are right, that may not be the most interesting metric to compare. Still it is not a problem to add too many metrics. :)

manish7294 · 2018-06-19T19:04:01Z

No problem then, I will add it for sure :)

rcurtin · 2018-06-27T19:33:29Z

Everything looks good to me, but the build will fail until LMNN is part of the newest release of mlpack. So we can wait to merge until then, and we can be sure to release a newer version of mlpack soon-ish (before the end of the summer) with the LMNN support. For now, you can use the branch to run timing tests.

zoq · 2018-06-28T17:22:00Z

Do you think it would be useful to include mlpack-master as another library, that would allow us to merge this earlier.

rcurtin · 2018-07-09T16:23:50Z

@mlpack-jenkins test this please

rcurtin · 2018-07-09T18:07:08Z

@mlpack-jenkins test this please

rcurtin · 2018-07-09T18:11:50Z

@mlpack-jenkins test this please

mlpack-jenkins · 2018-07-09T18:23:02Z

Can one of the admins verify this patch?

manish7294 added 7 commits June 15, 2018 09:58

LMNN Benchmarks

315fc20

update datasets

9902eb5

update shogun lmnn

45f6e1a

Added metrics

5e0c6b2

update metrics

9027de2

solve num_targets issue

874e862

Small update

f6eb97d

zoq reviewed Jun 18, 2018

View reviewed changes

Resolve k error

e34d072

Add KNN accuracy predictor

fa38179

rcurtin reviewed Jun 19, 2018

View reviewed changes

manish7294 added 3 commits June 20, 2018 10:40

Add KNNAccuracy to metrics

a488dba

update datasets

cd6c562

Added benchmarks for matlab's LMNN

d6584a4

manish7294 changed the title ~~Benchmarking Scripts for mlpack_lmnn & shogun_lmnn~~ Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN Jun 21, 2018

manish7294 added 2 commits June 21, 2018 14:14

typo rectify

9e47eba

removed a parameter

4d27754

removed extra L-BFGS related parameters

e993b25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

manish7294 commented Jun 15, 2018 •

edited

Loading

zoq Jun 18, 2018

manish7294 Jun 18, 2018

vigsterkr commented Jun 18, 2018

manish7294 commented Jun 18, 2018

iglesias commented Jun 19, 2018

manish7294 commented Jun 19, 2018

rcurtin commented Jun 19, 2018

manish7294 commented Jun 19, 2018

rcurtin Jun 19, 2018

manish7294 Jun 19, 2018

rcurtin Jun 19, 2018

manish7294 commented Jun 19, 2018

rcurtin commented Jun 27, 2018

zoq commented Jun 28, 2018

rcurtin commented Jul 9, 2018

rcurtin commented Jul 9, 2018

rcurtin commented Jul 9, 2018

mlpack-jenkins commented Jul 9, 2018

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

Are you sure you want to change the base?

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

Conversation

manish7294 commented Jun 15, 2018 • edited Loading

zoq Jun 18, 2018

Choose a reason for hiding this comment

manish7294 Jun 18, 2018

Choose a reason for hiding this comment

vigsterkr commented Jun 18, 2018

manish7294 commented Jun 18, 2018

iglesias commented Jun 19, 2018

manish7294 commented Jun 19, 2018

rcurtin commented Jun 19, 2018

manish7294 commented Jun 19, 2018

rcurtin Jun 19, 2018

Choose a reason for hiding this comment

manish7294 Jun 19, 2018

Choose a reason for hiding this comment

rcurtin Jun 19, 2018

Choose a reason for hiding this comment

manish7294 commented Jun 19, 2018

rcurtin commented Jun 27, 2018

zoq commented Jun 28, 2018

rcurtin commented Jul 9, 2018

rcurtin commented Jul 9, 2018

rcurtin commented Jul 9, 2018

mlpack-jenkins commented Jul 9, 2018

manish7294 commented Jun 15, 2018 •

edited

Loading