Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI workflow file (validator.yml) is in need of maintenance (dependency versions, disk space, speed) #167

Open
joshuacwnewton opened this issue Jun 25, 2024 · 2 comments · May be fixed by #168
Assignees
Labels
bug Something isn't working

Comments

@joshuacwnewton
Copy link
Member

joshuacwnewton commented Jun 25, 2024

The validator.yml GitHub Actions CI workflow file is currently failing on all PRs, blocking merges even for approved PRs.

@mguaypaq helped to summarize many of the various issues:

12:37 PM
It's been a while, but from what I remember:

  • The code is unnecessarily slow to run (because it re-runs many calls to pybids, in particular)
  • It's still pinned to the very much EOL python 3.7
  • Some of its dependencies are not backwards compatible
  • It relies on an old file naming scheme that we've since changed
  • It may be running out of disk space on the github actions runner? But there's a code comment somewhere in the workflow files about how to get more space?
  • Some of the checks that it does seem... wrong? Like, a missing abs() around a comparison of two floating point numbers for approximate equality, for example
  • It also looks like the return code of the checker is just ignored by the workflow file? But it doesn't even install anymore, and that results in a workflow failure
  • No automatic retries on failed gets (with -J8)

I'm very interested in fixing the workflow to help unblock current and future PRs! (I plan to start with the more "maintenance"-y tasks, then move on to the "correctness" tasks related to the validation itself.)

@joshuacwnewton joshuacwnewton added the bug Something isn't working label Jun 25, 2024
@joshuacwnewton joshuacwnewton self-assigned this Jun 25, 2024
@mguaypaq
Copy link
Member

Another small point: this line is very flaky, and tends to make the workflow fail:

Usually it's just a transient network error while downloading a few of the files. So, we could make it more robust with a simple:

# try a second time if a few downloads failed the first time
git annex get -J8 || git annex get

@joshuacwnewton
Copy link
Member Author

A hopeful start: The disk space issues have a very quick, very neat solution. 🎉

At the start of the workflow, things look like this:

  66G   74G /mnt           /dev/sdb1
  21G   73G /              /dev/root
  99M  105M /boot/efi      /dev/sda15

Our PWD is associated with the "21G free" disk. But, note /mnt, which has 66GB free (!!!). (Notably, it looks like this tempdisk used to be 14GB.)

We can take advantage of all of this extra space using the Maximize build disk space action. After running this action, df looks like this:

  87G   87G /home/runner/work/data-multi-subject/data-multi-subject /dev/mapper/buildvg-buildlv
 512M   73G /                                                       /dev/root
 100M   74G /mnt                                                    /dev/sdb1
  99M  105M /boot/efi                                               /dev/sda15

Thanks to the Logical Volume Manager (LVM), We now have access to the entirety of the 87G, with a step that takes 2 seconds, as opposed to the 3+ minutes it takes to remove unwanted software.

I was a little concerned about whether this would result in slower RW times (thanks to LVM), but the git annex step seems to take ~10m either way. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants