Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss becomes nan when I try to train MMT for 100 epochs #52

Open
TheBobbyliu opened this issue Mar 29, 2022 · 0 comments
Open

Loss becomes nan when I try to train MMT for 100 epochs #52

TheBobbyliu opened this issue Mar 29, 2022 · 0 comments

Comments

@TheBobbyliu
Copy link

Hello:
I downloaded your code and Market1501-UDA-MMT config.yaml in your ModelZoo.
At the beginning, I trained the model with configuration the same as downloaded, and I get the correct results on Market1501 after training for 50 epochs, which is mAP 81.0% / R-1 92.3%.
However, after adjusting the total epoch from 50 to 100, all losses become nan on the epoch 64.

Here's the log:
************************* Finished updating pseudo label *************************n
Epoch: [64][ 0/400] Time 0.618 (0.618) Acc@1 46.88% (46.88%) cross_entropy 5.810 (5.810) soft_entropy 5.974 (5.974) softmax_triplet 0.095 (0.095) soft_softmax_triplet 0.113 (0.113)
Epoch: [64][ 10/400] Time 0.412 (0.432) Acc@1 70.31% (52.41%) cross_entropy 4.323 (5.268) soft_entropy 5.372 (6.020) softmax_triplet 0.236 (0.333) soft_softmax_triplet 0.338 (0.314)
Epoch: [64][ 20/400] Time 0.410 (0.422) Acc@1 51.56% (53.65%) cross_entropy 5.469 (5.231) soft_entropy 6.135 (5.920) softmax_triplet 0.829 (0.415) soft_softmax_triplet 0.786 (0.417)
Epoch: [64][ 30/400] Time 0.411 (0.418) Acc@1 59.38% (53.43%) cross_entropy 5.301 (5.288) soft_entropy 6.075 (5.977) softmax_triplet 0.159 (0.350) soft_softmax_triplet 0.162 (0.367)
Epoch: [64][ 40/400] Time 0.403 (0.416) Acc@1 48.44% (53.58%) cross_entropy 5.748 (5.311) soft_entropy 6.841 (6.009) softmax_triplet 0.650 (0.363) soft_softmax_triplet 0.841 (0.390)
Epoch: [64][ 50/400] Time 0.411 (0.421) Acc@1 90.62% (56.43%) cross_entropy 4.243 (5.240) soft_entropy 6.419 (6.050) softmax_triplet 0.711 (0.392) soft_softmax_triplet 0.642 (0.412)
Epoch: [64][ 60/400] Time 0.411 (0.420) Acc@1 84.38% (60.40%) cross_entropy 4.272 (5.112) soft_entropy 5.915 (6.013) softmax_triplet 0.178 (0.400) soft_softmax_triplet 0.181 (0.413)
Epoch: [64][ 70/400] Time 0.411 (0.419) Acc@1 82.81% (63.34%) cross_entropy 4.216 (5.005) soft_entropy 5.374 (5.953) softmax_triplet 0.143 (0.400) soft_softmax_triplet 0.147 (0.416)
Epoch: [64][ 80/400] Time 0.411 (0.418) Acc@1 68.75% (65.12%) cross_entropy 4.149 (4.915) soft_entropy 4.728 (5.885) softmax_triplet 0.135 (0.395) soft_softmax_triplet 0.136 (0.408)
Epoch: [64][ 90/400] Time 0.410 (0.420) Acc@1 79.69% (66.86%) cross_entropy 4.661 (4.848) soft_entropy 6.480 (5.843) softmax_triplet 0.494 (0.403) soft_softmax_triplet 0.562 (0.410)
Epoch: [64][100/400] Time 0.409 (0.420) Acc@1 89.06% (68.83%) cross_entropy 3.709 (4.748) soft_entropy 5.128 (5.814) softmax_triplet 0.023 (0.396) soft_softmax_triplet 0.024 (0.410)
Epoch: [64][110/400] Time 0.409 (0.419) Acc@1 87.50% (70.33%) cross_entropy 3.880 (4.690) soft_entropy 5.349 (5.803) softmax_triplet 0.034 (0.400) soft_softmax_triplet 0.036 (0.420)
Epoch: [64][120/400] Time 0.412 (0.418) Acc@1 89.06% (71.73%) cross_entropy 4.067 (4.626) soft_entropy 6.327 (5.783) softmax_triplet 0.033 (0.378) soft_softmax_triplet 0.067 (0.399)
Epoch: [64][130/400] Time 0.403 (0.417) Acc@1 81.25% (72.52%) cross_entropy 4.177 (4.597) soft_entropy 6.546 (5.798) softmax_triplet 0.301 (0.375) soft_softmax_triplet 0.345 (0.396)
Epoch: [64][140/400] Time 0.410 (0.419) Acc@1 98.44% (73.85%) cross_entropy 3.169 (4.514) soft_entropy 4.617 (5.756) softmax_triplet 0.005 (0.365) soft_softmax_triplet 0.006 (0.388)
Epoch: [64][150/400] Time 0.411 (0.419) Acc@1 96.88% (75.03%) cross_entropy 3.230 (4.454) soft_entropy 5.414 (5.738) softmax_triplet 0.034 (0.358) soft_softmax_triplet 0.042 (0.380)
Epoch: [64][160/400] Time 0.411 (0.418) Acc@1 79.69% (75.90%) cross_entropy 3.972 (4.407) soft_entropy 5.432 (5.724) softmax_triplet 0.157 (0.362) soft_softmax_triplet 0.159 (0.386)
Epoch: [64][170/400] Time 0.411 (0.418) Acc@1 82.81% (76.35%) cross_entropy 3.553 (4.380) soft_entropy 4.720 (5.720) softmax_triplet 0.021 (0.358) soft_softmax_triplet 0.022 (0.381)
Epoch: [64][180/400] Time 0.411 (0.419) Acc@1 89.06% (77.10%) cross_entropy 3.552 (4.347) soft_entropy 5.564 (5.717) softmax_triplet 0.395 (0.358) soft_softmax_triplet 0.557 (0.379)
Epoch: [64][190/400] Time 0.412 (0.419) Acc@1 90.62% (77.83%) cross_entropy 3.651 (4.306) soft_entropy 5.183 (5.714) softmax_triplet 0.300 (0.359) soft_softmax_triplet 0.304 (0.381)
Epoch: [64][200/400] Time 0.412 (0.418) Acc@1 90.62% (78.39%) cross_entropy 3.595 (4.274) soft_entropy 4.808 (5.704) softmax_triplet 0.449 (0.358) soft_softmax_triplet 0.450 (0.381)
Epoch: [64][210/400] Time 0.413 (0.418) Acc@1 92.19% (78.92%) cross_entropy 3.538 (4.244) soft_entropy 5.474 (5.696) softmax_triplet 0.094 (0.354) soft_softmax_triplet 0.184 (0.376)
Epoch: [64][220/400] Time 0.750 (0.419) Acc@1 87.50% (79.31%) cross_entropy 3.932 (4.221) soft_entropy 6.867 (5.692) softmax_triplet 1.262 (0.357) soft_softmax_triplet 1.503 (0.380)
Epoch: [64][230/400] Time 0.412 (0.419) Acc@1 93.75% (79.82%) cross_entropy 3.531 (4.186) soft_entropy 5.543 (5.678) softmax_triplet 0.045 (0.351) soft_softmax_triplet 0.052 (0.374)
Epoch: [64][240/400] Time 0.412 (0.419) Acc@1 95.31% (80.30%) cross_entropy 3.225 (4.156) soft_entropy 5.377 (5.669) softmax_triplet 0.010 (0.344) soft_softmax_triplet 0.106 (0.369)
Epoch: [64][250/400] Time 0.451 (0.419) Acc@1 92.19% (80.66%) cross_entropy 3.677 (4.138) soft_entropy 6.138 (5.661) softmax_triplet 0.128 (0.346) soft_softmax_triplet 0.222 (0.369)
Epoch: [64][260/400] Time 0.441 (0.421) Acc@1 87.50% (80.92%) cross_entropy 3.759 (4.119) soft_entropy 5.217 (5.653) softmax_triplet 0.451 (0.345) soft_softmax_triplet 0.512 (0.367)
Epoch: [64][270/400] Time 0.453 (0.423) Acc@1 85.94% (81.24%) cross_entropy 3.600 (4.103) soft_entropy 5.220 (5.654) softmax_triplet 0.057 (0.350) soft_softmax_triplet 0.062 (0.372)
Epoch: [64][280/400] Time 0.451 (0.424) Acc@1 100.00% (81.68%) cross_entropy 2.953 (4.078) soft_entropy 5.246 (5.647) softmax_triplet 0.054 (0.342) soft_softmax_triplet 0.060 (0.365)
Epoch: [64][290/400] Time 0.452 (0.425) Acc@1 89.06% (81.97%) cross_entropy 3.536 (4.061) soft_entropy 4.614 (5.650) softmax_triplet 0.003 (0.345) soft_softmax_triplet 0.005 (0.368)
Epoch: [64][300/400] Time 0.449 (0.426) Acc@1 93.75% (82.25%) cross_entropy 3.347 (4.047) soft_entropy 5.860 (5.653) softmax_triplet 0.228 (0.353) soft_softmax_triplet 0.345 (0.373)
Epoch: [64][310/400] Time 0.413 (0.427) Acc@1 89.06% (82.50%) cross_entropy 3.414 (4.033) soft_entropy 5.487 (5.649) softmax_triplet 0.049 (0.352) soft_softmax_triplet 0.055 (0.374)
Epoch: [64][320/400] Time 0.411 (0.427) Acc@1 98.44% (82.80%) cross_entropy 2.885 (4.014) soft_entropy 4.540 (5.649) softmax_triplet 0.002 (0.353) soft_softmax_triplet 0.003 (0.373)
Epoch: [64][330/400] Time 0.413 (0.426) Acc@1 96.88% (82.99%) cross_entropy 3.345 (4.004) soft_entropy 5.611 (5.654) softmax_triplet 0.732 (0.361) soft_softmax_triplet 0.729 (0.379)
Epoch: [64][340/400] Time 0.410 (0.426) Acc@1 89.06% (83.21%) cross_entropy 3.622 (3.990) soft_entropy 5.257 (5.652) softmax_triplet 0.116 (0.360) soft_softmax_triplet 0.207 (0.379)
Epoch: [64][350/400] Time 0.406 (0.425) Acc@1 93.75% (83.40%) cross_entropy 3.257 (3.978) soft_entropy 5.324 (5.645) softmax_triplet 0.071 (0.359) soft_softmax_triplet 0.072 (0.376)
Epoch: [64][360/400] Time 0.188 (0.422) Acc@1 45.31% (82.93%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
Epoch: [64][370/400] Time 0.182 (0.416) Acc@1 46.88% (81.94%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
Epoch: [64][380/400] Time 0.183 (0.410) Acc@1 39.06% (80.98%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
Epoch: [64][390/400] Time 0.181 (0.404) Acc@1 45.31% (80.07%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
==> Val on the no.0 model

************************* Start validating market1501 on epoch 64 *************************n
Val: [ 0/18] Time 0.112 (0.112) Data 0.071 (0.071)
Val: [10/18] Time 0.030 (0.038) Data 0.000 (0.006)

Mean AP: 2.0%
CMC Scores:
top-1 0.4%
top-5 1.4%
top-10 1.4%
Validating time: 0:00:00.967822

************************* Finished validating *************************

==> Val on the no.1 model

************************* Start validating market1501 on epoch 64 *************************n
Val: [ 0/18] Time 0.109 (0.109) Data 0.068 (0.068)
Val: [10/18] Time 0.031 (0.038) Data 0.000 (0.006)

Mean AP: 96.5%
CMC Scores:
top-1 98.1%
top-5 99.5%
top-10 99.9%
Validating time: 0:00:01.165367

************************* Finished validating *************************

  • Finished epoch 64 mAP: 96.5% best: 96.5% *
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant