Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR]: Support running formatters in parallel #355

Open
calebzulawski opened this issue Jul 31, 2024 · 4 comments
Open

[FR]: Support running formatters in parallel #355

calebzulawski opened this issue Jul 31, 2024 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@calebzulawski
Copy link
Contributor

calebzulawski commented Jul 31, 2024

What is the current behavior?

Files are passed via xargs to the formatter tool and run serially

Describe the feature

I have multiple C++ projects with large numbers of files (~1000) that take a very long time to format serially. We can use xargs -P to invoke multiple instances of the formatter. I'm not sure how the maximum number of processes should be determined.

@calebzulawski calebzulawski added the enhancement New feature or request label Jul 31, 2024
@alexeagle
Copy link
Member

It's unusual to want to format all ~1000 files - perhaps you should call the formatter from a tool like pre-commit that knows which files have been changed?
You probably saw that we use rules_multirun which already runs different formatter tools in parallel. So I imagine there's a way to extend this to run a single formatter tool over parallel shards of the files for that language. However I suspect the added complexity won't be worth maintaining, given that the intended "typical" usage is to only format changed files.

@alexeagle alexeagle added the question Further information is requested label Aug 3, 2024
@calebzulawski
Copy link
Contributor Author

calebzulawski commented Aug 3, 2024

Does format_test only check changed files since the last run of the test? If that's the case, then I agree that it's probably not a common use case. The situation I ran into was the test exceeding the 300 second timeout on the first run.

Regarding sharding, I think using xargs -n max_files -P processes (the format script already uses xargs) would handle that, but I'm not sure what a reasonable default would be.

@alexeagle
Copy link
Member

format_test can't tell which files have changed. It would need some connection to version control, which is out of scope here. Also if you list the files to check, then it's hermetic and should be a cache hit, so vcs logic would make it non-hermetic (you could get a cache hit on a run that only checked a subset of files)

@alexeagle
Copy link
Member

More thoughts:

  • I think the best answer is what we've done in Aspect Workflows https://docs.aspect.build/workflows/features/lint#formatting - you can run a free trial of our whole platform, or you could just make a CI step that does the equivalent bazel run format && git status --porcelain
  • Bazel has built-in sharding, so we could make format_test support the shard_count attribute. Again though I'm loathe to add anything to format_test, I don't really want to support it at all
  • You can do "userland sharding" in your BUILD file, something like
SRCS = glob("**/*.c")

[
  format_test(
    name = "check_format_" + i,
    srcs = SRCS.subset(...)
  )
  for i in range(0, 10)
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants