Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel runner in use #448

Merged
merged 3 commits into from
Sep 16, 2024
Merged

Parallel runner in use #448

merged 3 commits into from
Sep 16, 2024

Conversation

wpietri
Copy link
Contributor

@wpietri wpietri commented Sep 14, 2024

Applying the parallel runner to the CLI. Removing some unused CLI commands. Assorted minor tidying.

@wpietri wpietri requested a review from a team as a code owner September 14, 2024 01:29
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@rogthefrog rogthefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 with just a couple of questions.

@@ -81,7 +70,7 @@ def cli() -> None:
help="Path to directory containing custom branding.",
)
@click.option("--anonymize", type=int, help="Random number seed for consistent anonymization of SUTs")
@click.option("--parallel", default=False, help="Experimentally run SUTs in parallel")
@click.option("--parallel", default=False, help="Obsolete flag, soon to be removed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping Click provided a deprecated argument for options, but unfortunately it doesn't (natively). It does for commands. An enterprising soul added that for options here:

sqlfluff/sqlfluff@a38512f

Do we care?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked for that argument as well. That particular code requires you to have an option you'd rather they use, which isn't the case here. There is an open issue for this: pallets/click#2263

I've explained our use case, so perhaps one day they'll support it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to emit a deprecation warning if this option is used?

https://docs.python.org/3/library/warnings.html#warnings.warn

Copy link
Contributor Author

@wpietri wpietri Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out. If I do it without a specific warning, it prints this:

Version 0.5 of this benchmark is a proof of concept only. Results are not intended to indicate actual levels of AI system safety.

/home/william/projects/mlcommons/modelbench/src/modelbench/run.py:98: UserWarning: --parallel option is unnecessary; benchmarks are now always run in parallel
  warnings.warn("--parallel option is unnecessary; benchmarks are now always run in parallel")

And if I add DeprecationWarning to it it doesn't print anything at all. (That appears to be on purpose; I think you have to run python with -X dev to get it to show deprecation warnings.)

Overall, I prefer the bare message.

runner.secrets = load_secrets_from_config()
runner.benchmarks = benchmarks
runner.suts = suts
runner.max_items = max_instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can max_instances default to something sensible based on the machine's capabilities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speed doesn't vary much based on the user's hardware; it's about the capabilities of service providers like TogetherAI and what the user already has cached. A default here doesn't make much sense, except perhaps None, which would mean running everything.

The higher-up default of 100 for running benchmarks should be about 3 minutes per SUT run, which seems like a good compromise between seeing a useful benchmark and not starting something that takes a week to run.

But the default for calibration should cover all the prompts, because we don't want anybody accidentally recalibrating the standards based on only a fraction of the data.



class TestRunTrackers:
def test_null(self, capsys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The capsys thing was new to me too. Nice to just have it handled.

Copy link
Collaborator

@dhosterman dhosterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@wpietri wpietri merged commit c974925 into main Sep 16, 2024
4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants