Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3 Performance Signal Detected by TorchBench Userbenchmark "torch-nightly" on '2.5.0.dev20240623+cu124' #2335

Open
xuzhao9 opened this issue Jun 26, 2024 · 0 comments

Comments

@xuzhao9
Copy link
Contributor

xuzhao9 commented Jun 26, 2024

TorchBench CI has detected a performance signal or runtime regression, and bisected its result.

Control PyTorch commit: 92ca17d85def4a62aee04fcea3576cd0c07a0554
Control PyTorch version: 2.5.0.dev20240622+cu124

Treatment PyTorch commit: 920ebccca2644881ece4f9e07b4a4b4787b8f2b1
Treatment PyTorch version: 2.5.0.dev20240623+cu124

Bisection result:

[
    {
        "commit1": "92ca17d85d",
        "commit1_time": "2024-06-21 18:46:15 +0000",
        "commit1_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "92ca17d85def4a62aee04fcea3576cd0c07a0554",
                "pytorch_version": "2.5.0a0+git92ca17d",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "92ca17d85def4a62aee04fcea3576cd0c07a0554"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 51.747023,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 1.04296875,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875,
                "test_eval[maml-cuda-eager]_latency": 691.731945,
                "test_eval[maml-cuda-eager]_cmem": 0.875,
                "test_eval[maml-cuda-eager]_gmem": 2.64764404296875,
                "test_eval[maml_omniglot-cuda-eager]_latency": 1.183565,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5556640625,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875,
                "test_eval[speech_transformer-cuda-eager]_latency": 5105.80114,
                "test_eval[speech_transformer-cuda-eager]_cmem": 0.8349609375,
                "test_eval[speech_transformer-cuda-eager]_gmem": 1.72772216796875,
                "test_eval[tacotron2-cuda-eager]_latency": 1035.042174,
                "test_eval[tacotron2-cuda-eager]_cmem": 1.1416015625,
                "test_eval[tacotron2-cuda-eager]_gmem": 2.77069091796875,
                "test_train[lennard_jones-cuda-eager]_latency": 6.826406,
                "test_train[lennard_jones-cuda-eager]_cmem": 0.7734375,
                "test_train[lennard_jones-cuda-eager]_gmem": 1.54217529296875,
                "test_train[maml_omniglot-cuda-eager]_latency": 1577.187655,
                "test_train[maml_omniglot-cuda-eager]_cmem": 0.916015625,
                "test_train[maml_omniglot-cuda-eager]_gmem": 1.66131591796875,
                "test_train[tacotron2-cuda-eager]_latency": 2916.437102,
                "test_train[tacotron2-cuda-eager]_cmem": 1.3037109375,
                "test_train[tacotron2-cuda-eager]_gmem": 17.03631591796875
            }
        },
        "commit2": "9103b40a47",
        "commit2_time": "2024-06-21 20:53:52 +0000",
        "commit2_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "9103b40a4729e60c8c3c118fac6d11f5498f3f14",
                "pytorch_version": "2.5.0a0+git9103b40",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "9103b40a4729e60c8c3c118fac6d11f5498f3f14"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 59.186955,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 1.0048828125,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875
            }
        }
    },
    {
        "commit1": "9103b40a47",
        "commit1_time": "2024-06-21 20:53:52 +0000",
        "commit1_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "9103b40a4729e60c8c3c118fac6d11f5498f3f14",
                "pytorch_version": "2.5.0a0+git9103b40",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "9103b40a4729e60c8c3c118fac6d11f5498f3f14"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 59.186955,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 1.0048828125,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875
            }
        },
        "commit2": "40e8675fcb",
        "commit2_time": "2024-06-21 21:40:23 +0000",
        "commit2_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "40e8675fcbb233c98ec532607d5cd421ec850253",
                "pytorch_version": "2.5.0a0+git40e8675",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "40e8675fcbb233c98ec532607d5cd421ec850253"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 49.116422,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 0.9619140625,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875
            }
        }
    },
    {
        "commit1": "1c75ddff35",
        "commit1_time": "2024-06-21 23:29:20 +0000",
        "commit1_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "1c75ddff3576a0bd7ed664476c49020c35875ab5",
                "pytorch_version": "2.5.0a0+git1c75ddf",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "1c75ddff3576a0bd7ed664476c49020c35875ab5"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 49.31496,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 0.96484375,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875,
                "test_eval[maml_omniglot-cuda-eager]_latency": 1.216096,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5556640625,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875,
                "test_train[lennard_jones-cuda-eager]_latency": 6.985475,
                "test_train[lennard_jones-cuda-eager]_cmem": 0.7734375,
                "test_train[lennard_jones-cuda-eager]_gmem": 1.54217529296875,
                "test_train[tacotron2-cuda-eager]_latency": 2958.360509,
                "test_train[tacotron2-cuda-eager]_cmem": 1.28515625,
                "test_train[tacotron2-cuda-eager]_gmem": 17.03631591796875
            }
        },
        "commit2": "c5b9ee7408",
        "commit2_time": "2024-06-21 23:56:00 +0000",
        "commit2_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "c5b9ee7408e8ee093eb303b76d8d7fc31902fe5f",
                "pytorch_version": "2.5.0a0+gitc5b9ee7",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "c5b9ee7408e8ee093eb303b76d8d7fc31902fe5f"
            },
            "metrics": {
                "test_eval[maml_omniglot-cuda-eager]_latency": 1.041913,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5546875,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875
            }
        }
    },
    {
        "commit1": "c5b9ee7408",
        "commit1_time": "2024-06-21 23:56:00 +0000",
        "commit1_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "c5b9ee7408e8ee093eb303b76d8d7fc31902fe5f",
                "pytorch_version": "2.5.0a0+gitc5b9ee7",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "c5b9ee7408e8ee093eb303b76d8d7fc31902fe5f"
            },
            "metrics": {
                "test_eval[maml_omniglot-cuda-eager]_latency": 1.041913,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5546875,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875
            }
        },
        "commit2": "5b14943213",
        "commit2_time": "2024-06-22 02:13:28 +0000",
        "commit2_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "5b1494321341a5abd72076d8e984f0f9ff3bc69e",
                "pytorch_version": "2.5.0a0+git5b14943",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "5b1494321341a5abd72076d8e984f0f9ff3bc69e"
            },
            "metrics": {
                "test_eval[maml_omniglot-cuda-eager]_latency": 1.121915,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5556640625,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875,
                "test_train[lennard_jones-cuda-eager]_latency": 6.848617,
                "test_train[lennard_jones-cuda-eager]_cmem": 0.7744140625,
                "test_train[lennard_jones-cuda-eager]_gmem": 1.54217529296875,
                "test_train[tacotron2-cuda-eager]_latency": 2932.165867,
                "test_train[tacotron2-cuda-eager]_cmem": 1.2900390625,
                "test_train[tacotron2-cuda-eager]_gmem": 17.03631591796875
            }
        }
    },
    {
        "commit1": "858fb05dac",
        "commit1_time": "2024-06-22 02:57:44 +0000",
        "commit1_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "858fb05dac4136073db5a133f056b7475264518e",
                "pytorch_version": "2.5.0a0+git858fb05",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "858fb05dac4136073db5a133f056b7475264518e"
            },
            "metrics": {
                "test_eval[maml_omniglot-cuda-eager]_latency": 1.018404,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5556640625,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875,
                "test_train[lennard_jones-cuda-eager]_latency": 7.00891,
                "test_train[lennard_jones-cuda-eager]_cmem": 0.7744140625,
                "test_train[lennard_jones-cuda-eager]_gmem": 1.54217529296875,
                "test_train[tacotron2-cuda-eager]_latency": 2954.511054,
                "test_train[tacotron2-cuda-eager]_cmem": 1.2900390625,
                "test_train[tacotron2-cuda-eager]_gmem": 17.03631591796875
            }
        },
        "commit2": "f42d5b6dca",
        "commit2_time": "2024-06-22 04:05:55 +0000",
        "commit2_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "f42d5b6dca75ee020355fc75532347ca2734b117",
                "pytorch_version": "2.5.0a0+gitf42d5b6",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "f42d5b6dca75ee020355fc75532347ca2734b117"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 49.715281,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 0.96875,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875,
                "test_eval[maml_omniglot-cuda-eager]_latency": 0.973894,
                "test_eval[maml_omniglot-cuda-eager]_cmem": 0.5546875,
                "test_eval[maml_omniglot-cuda-eager]_gmem": 1.53826904296875,
                "test_train[lennard_jones-cuda-eager]_latency": 6.000149,
                "test_train[lennard_jones-cuda-eager]_cmem": 0.7724609375,
                "test_train[lennard_jones-cuda-eager]_gmem": 1.54217529296875,
                "test_train[maml_omniglot-cuda-eager]_latency": 1480.744462,
                "test_train[maml_omniglot-cuda-eager]_cmem": 0.9169921875,
                "test_train[maml_omniglot-cuda-eager]_gmem": 1.66131591796875,
                "test_train[tacotron2-cuda-eager]_latency": 2583.248778,
                "test_train[tacotron2-cuda-eager]_cmem": 1.2900390625,
                "test_train[tacotron2-cuda-eager]_gmem": 17.03631591796875
            }
        }
    },
    {
        "commit1": "88a35b5b64",
        "commit1_time": "2024-06-22 12:38:22 +0000",
        "commit1_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "88a35b5b641ed566f306071b173dde93ef3f9567",
                "pytorch_version": "2.5.0a0+git88a35b5",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "88a35b5b641ed566f306071b173dde93ef3f9567"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 48.726122,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 0.9619140625,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875
            }
        },
        "commit2": "72e3aca227",
        "commit2_time": "2024-06-22 12:38:22 +0000",
        "commit2_digest": {
            "name": "torch-nightly",
            "environ": {
                "pytorch_git_version": "72e3aca227ae1e3dc1b91aee415cf27b0cb22f2b",
                "pytorch_version": "2.5.0a0+git72e3aca",
                "device": "NVIDIA A100-SXM4-40GB",
                "git_commit_hash": "72e3aca227ae1e3dc1b91aee415cf27b0cb22f2b"
            },
            "metrics": {
                "test_eval[doctr_det_predictor-cuda-eager]_latency": 57.742862,
                "test_eval[doctr_det_predictor-cuda-eager]_cmem": 0.994140625,
                "test_eval[doctr_det_predictor-cuda-eager]_gmem": 3.49334716796875
            }
        }
    }
]

cc @xuzhao9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant