Parallel runner in use #448

wpietri · 2024-09-14T01:29:14Z

Applying the parallel runner to the CLI. Removing some unused CLI commands. Assorted minor tidying.

Remove non-working suts from calibration. Give a sensible __str__ to things with uids.

github-actions · 2024-09-14T01:29:28Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

rogthefrog

👍🏻 with just a couple of questions.

rogthefrog · 2024-09-14T02:13:09Z

src/modelbench/run.py

@@ -81,7 +70,7 @@ def cli() -> None:
    help="Path to directory containing custom branding.",
 )
 @click.option("--anonymize", type=int, help="Random number seed for consistent anonymization of SUTs")
-@click.option("--parallel", default=False, help="Experimentally run SUTs in parallel")
+@click.option("--parallel", default=False, help="Obsolete flag, soon to be removed")


I was hoping Click provided a deprecated argument for options, but unfortunately it doesn't (natively). It does for commands. An enterprising soul added that for options here:

sqlfluff/sqlfluff@a38512f

Do we care?

I looked for that argument as well. That particular code requires you to have an option you'd rather they use, which isn't the case here. There is an open issue for this: pallets/click#2263

I've explained our use case, so perhaps one day they'll support it.

Would it make sense to emit a deprecation warning if this option is used?

https://docs.python.org/3/library/warnings.html#warnings.warn

I tried it out. If I do it without a specific warning, it prints this:

Version 0.5 of this benchmark is a proof of concept only. Results are not intended to indicate actual levels of AI system safety. /home/william/projects/mlcommons/modelbench/src/modelbench/run.py:98: UserWarning: --parallel option is unnecessary; benchmarks are now always run in parallel warnings.warn("--parallel option is unnecessary; benchmarks are now always run in parallel")

And if I add DeprecationWarning to it it doesn't print anything at all. (That appears to be on purpose; I think you have to run python with -X dev to get it to show deprecation warnings.)

Overall, I prefer the bare message.

rogthefrog · 2024-09-14T02:14:16Z

src/modelbench/run.py

+    runner.secrets = load_secrets_from_config()
+    runner.benchmarks = benchmarks
+    runner.suts = suts
+    runner.max_items = max_instances


Can max_instances default to something sensible based on the machine's capabilities?

Speed doesn't vary much based on the user's hardware; it's about the capabilities of service providers like TogetherAI and what the user already has cached. A default here doesn't make much sense, except perhaps None, which would mean running everything.

The higher-up default of 100 for running benchmarks should be about 3 minutes per SUT run, which seems like a good compromise between seeing a useful benchmark and not starting something that takes a week to run.

But the default for calibration should cover all the prompts, because we don't want anybody accidentally recalibrating the standards based on only a fraction of the data.

rogthefrog · 2024-09-14T02:16:15Z

tests/test_benchmark_runner.py

+
+
+class TestRunTrackers:
+    def test_null(self, capsys):


This is cool!

Thanks! The capsys thing was new to me too. Nice to just have it handled.

dhosterman

Looks great!

wpietri added 3 commits September 11, 2024 17:30

Integrating new runner into benchmark runs.

1967d71

Integrate new runner into calibration

b353f5b

Remove non-working suts from calibration. Give a sensible __str__ to things with uids.

Remove obsolete grid and responses commands.

f6d4c3b

wpietri requested a review from a team as a code owner September 14, 2024 01:29

wpietri requested review from rogthefrog, dhosterman and bkorycki September 14, 2024 01:29

rogthefrog approved these changes Sep 14, 2024

View reviewed changes

dhosterman approved these changes Sep 16, 2024

View reviewed changes

wpietri merged commit c974925 into main Sep 16, 2024
4 checks passed

github-actions bot locked and limited conversation to collaborators Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel runner in use #448

Parallel runner in use #448

wpietri commented Sep 14, 2024

github-actions bot commented Sep 14, 2024

rogthefrog left a comment

rogthefrog Sep 14, 2024

wpietri Sep 16, 2024

dhosterman Sep 16, 2024

wpietri Sep 16, 2024 •

edited

Loading

rogthefrog Sep 14, 2024

wpietri Sep 16, 2024

rogthefrog Sep 14, 2024

wpietri Sep 16, 2024

dhosterman left a comment

Parallel runner in use #448

Parallel runner in use #448

Conversation

wpietri commented Sep 14, 2024

github-actions bot commented Sep 14, 2024

rogthefrog left a comment

Choose a reason for hiding this comment

rogthefrog Sep 14, 2024

Choose a reason for hiding this comment

wpietri Sep 16, 2024

Choose a reason for hiding this comment

dhosterman Sep 16, 2024

Choose a reason for hiding this comment

wpietri Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

rogthefrog Sep 14, 2024

Choose a reason for hiding this comment

wpietri Sep 16, 2024

Choose a reason for hiding this comment

rogthefrog Sep 14, 2024

Choose a reason for hiding this comment

wpietri Sep 16, 2024

Choose a reason for hiding this comment

dhosterman left a comment

Choose a reason for hiding this comment

wpietri Sep 16, 2024 •

edited

Loading