Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

manish7294
Copy link
Member

@manish7294 manish7294 commented Jun 15, 2018

Benchmarking scripts for Mlpack, Shogun & Matlab's LMNN.

raise Exception("unknown parameters")

# Perform LMNN.
prep = ShogunLMNN(feat, labels, k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use self.k here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I again missed pushing the changes :)

@vigsterkr
Copy link

@iglesias @manish7294 has some question about LMNN implementation in shogun :)

@manish7294
Copy link
Member Author

@iglesias Hi! Hope you are doing good :)
I was doing benchmarking of LMNN on both Mlpack and Shogun. I am using shogun's KNN for measuring the accuracy metric on transformed data. But here comes the problem, if I used query set same as the training set then the KNN consider the query point itself as the first nearest neighbor(giving 100% accuracy for k =1) but it's not same as what Mlpack's KNN do, Mlpack's KNN avoid the situation of same point being the nearest neighbor and hence in this case Shogun's KNN second nearest neighbor is Mlpack's KNN first nearest neighbor.
So, can you think of a way so that both the libraries are on the same page for benchmarking purpose? Or, Is there any way to avoid the point itself being consider as first nearest neighbor in Shogun. Any kind of help will be appreciated :)

@iglesias
Copy link

Hi @manish7294. So this is about KNN.

What about the following: in mlpack use KNN with k=k' and in Shogun KNN with k=k'+1. Then you could compare the nearest neighbors given by mlpack with the 2nd to (k+1)th nearest neighbors given by Shogun.

Does this make sense to you?

@manish7294
Copy link
Member Author

@iglesias Thanks for looking into this. We thought about that but it would create problem in calculating predictions. As in case of Shogun's KNN prediciton, the point itself will always be counted as the positive nearest neighbor and will definitely give some amount of error in the measuring accuracy.
Let me take an example: Let's say have 2 classes we want to have k = 2, so here we will be taking Mlpack's k as k = 2 and Shogun's k as k = 3. And suppose we got the first neighbor as of class 1 and second of class 2. So, ideally(as per what we do in Mlpack's KNN) we should be predicting point as of class 1. But here earlier we took Shogun's k as k as k = 3 and suppose in this case the point itself has label 2 as it will always be counted in the prediction), giving predicition as label 2 which would be different from Mlpack's KNN.
On a second thought, If I assume that Shogun has a distance weighted KNN(Not sure) running behind the scenes, then the output will always be the labels of the point itself, no matter what k we choose.

@rcurtin
Copy link
Member

rcurtin commented Jun 19, 2018

Would it be possible to just get the k+1 nearest neighbors from Shogun itself? If that is the case, then we can do the classification ourselves (and ensure that we are always doing it the same way). If we can't get those from Shogun, maybe it is better to use a different technique for the nearest neighbor search. I don't think it makes such a huge difference either way, since we aren't timing the nearest neighbor search, we are just using it for finding the true nearest neighbors to calculate a metric.

@manish7294
Copy link
Member Author

Thanks all for helping out. Hopefully, the new custom KNN accuracy predictor will work as expected :)

count += 1

accuracy = (count / NN.shape[1]) * 100
return [time, accuracy]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. But I think maybe it would be useful to split this into a separate function that we could put into methods/metrics/? It seems to me like we could calculate a few metrics using variations of this code (but that code could all be one function):

  • 1-NN accuracy
  • 3-NN accuracy
  • distance-weighted 3-NN accuracy
  • 5-NN accuracy
  • distance-weighted 5-NN accuracy

And the input to the function could be the nearest neighbor indices and distances and the labels, plus the number of neighbors to use in the metric calculation and whether or not to use distance calculation. Does that make sense? What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems about right. But one thing bugging me --- You said we can calculate 3-NN, 5-NN from this code, which is possible but what significance would 5-NN have if we train our lmnn let's say for k = 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so LMNN is optimizing for k=3 if you run it with k=3, but it may still improve the 5-NN performance. Given that it would be so easy to add, I don't see any issue with doing that, but you are right, that may not be the most interesting metric to compare. Still it is not a problem to add too many metrics. :)

@manish7294
Copy link
Member Author

No problem then, I will add it for sure :)

@manish7294 manish7294 changed the title Benchmarking Scripts for mlpack_lmnn & shogun_lmnn Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN Jun 21, 2018
@rcurtin
Copy link
Member

rcurtin commented Jun 27, 2018

Everything looks good to me, but the build will fail until LMNN is part of the newest release of mlpack. So we can wait to merge until then, and we can be sure to release a newer version of mlpack soon-ish (before the end of the summer) with the LMNN support. For now, you can use the branch to run timing tests.

@zoq
Copy link
Member

zoq commented Jun 28, 2018

Do you think it would be useful to include mlpack-master as another library, that would allow us to merge this earlier.

@rcurtin
Copy link
Member

rcurtin commented Jul 9, 2018

@mlpack-jenkins test this please

2 similar comments
@rcurtin
Copy link
Member

rcurtin commented Jul 9, 2018

@mlpack-jenkins test this please

@rcurtin
Copy link
Member

rcurtin commented Jul 9, 2018

@mlpack-jenkins test this please

@mlpack-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants