Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrain distilled images with minibatch-SGD #36

Open
JialunSong opened this issue Jan 13, 2021 · 4 comments
Open

retrain distilled images with minibatch-SGD #36

JialunSong opened this issue Jan 13, 2021 · 4 comments

Comments

@JialunSong
Copy link

Hey I am very interested in this work, and have some questions to ask.
I used 20 images per class in MINIST dataset-distillation by using
python main.py --mode distill_basic --dataset MNIST --arch LeNet \--distill_steps 1 --train_nets_type known_init --n_nets 1 \--test_nets_type same_as_train
and achieved 96.54 testing accuracy.
But when I use these distilled images as training data to retrain a same initial model as used in distillation step by minibatch-SGD, the testing accuracy dropped to 62% and the overfitting occurred. My question is
(1)Is it just because the different way of optimization?
(2)Why optimized the network in the way of yours can avoid overfitting even used only 1 sample per class in MINIST dataset-distillation?
(3)How to use distilled images to retrain a good model in normal training way such as minibatch-SGD?

@ssnl
Copy link
Owner

ssnl commented Jan 14, 2021

The images may be optimized to jointly give some gradient. If you want to be order/batch agnostic, you can try modifying the distillation procedure to apply the images in randomly ordered batches.

@JialunSong
Copy link
Author

Thanks for your reply. May be I express my question in a wrong way. I aim to use the distilled images which have been generated to achieve the best testing performance (such as the MINIST distilled data which have achieved 96.54% accuracy) to retrain a model from scratch.
I optimized model with these distilled images by minibatch-SGD(shuffle=True) and the distilled images is freeze in this term, only the network parameters is updated to achieve a good classification on MINIST test data.
I don't aim to change the data distillation procedure in a randomly ordered batches way.
Is it possible to use the distilled data to retrain a model which performs as good as the final model in the data distillation term?

@ssnl
Copy link
Owner

ssnl commented Jan 14, 2021

You expressed well and I understood exactly what you meant. What I was saying is that if you want the images to be able to be applied in a certain way (e.g., randomly ordered and batched), it is best to modify the training to suit that, because they might overfit to the fixed ordering and batching used in training. Hence doing these randomly in training is also important.

@JialunSong
Copy link
Author

I understand what you said now. I'II try that and share the next results. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants