Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Actions error collection script only reads latest attempt #437

Closed
timmc-edx opened this issue Sep 5, 2023 · 7 comments
Closed

GitHub Actions error collection script only reads latest attempt #437

timmc-edx opened this issue Sep 5, 2023 · 7 comments
Assignees
Labels

Comments

@timmc-edx
Copy link
Member

timmc-edx commented Sep 5, 2023

A/C:

The Actions error collection script only collects the status of the most recent attempt on each job. Since we re-run most of our failed jobs, this script can't see most of the error information we're interested in.

See openedx/edx-platform#32671 for an example of where this information would have been useful.

While we're in there, it might also be useful to turn this into a multi-stage script with caching. Currently there's a risk of getting rate-limited partway through a run, at which point all of the in-memory collected information is lost. It might be better to split the script so that it first gets all of the commits in the desired time range, writes that to file, and then gets job and attempt information -- but only for jobs that it hasn't already cached on disk. This would speed up future runs.

@RafayGhafoor
Copy link

@robrap @timmc-edx, I would like to work on this task and I am thinking of using pyGithub library for the integration. Please let me know if I can work on this task.

@robrap
Copy link
Contributor

robrap commented Oct 16, 2023

@RafayGhafoor: That sounds good and we're here to answer questions. Good luck.

@rgraber
Copy link
Contributor

rgraber commented Nov 16, 2023

@RafayGhafoor are you still working on this?

@RafayGhafoor
Copy link

RafayGhafoor commented Nov 16, 2023

@rgraber, I had been working on solving the task and went in to send a PR to enable github cli to rerun failed jobs based on annotated messages but the related issue created for PR didn't get any traction.

Normally, what I had in mind was to integrate github cli (gh) with the workflow which automatically reruns the job if the status for failed job has annotated message of "Lost connection....".

Since, the issue didn't get any follow up, I have lost motivation to work on it but I think a custom script which has the rights to rerun the failed jobs could be a possible solution which only operates on jobs failed due to losing communication to the server. Following are the steps that I had thought of adding as a last step to the CI:

  • Getting current running event and supplying it to the custom script.
  • The custom script checking if there's any failed job with annotated message of "Lost communication..." and triggering a rerun.
  • Wrapping this whole logic in retry so the action is retried at least x times with y delay to ensure successful run.

@robrap
Copy link
Contributor

robrap commented Jan 19, 2024

@feanil: This might be a useful ticket for the Maintenance WG as well, because it would give you a view of issues across PRs to help with prioritization of tickets like #528.

@robrap robrap assigned timmc-edx and openback and unassigned openback Jan 26, 2024
@timmc-edx
Copy link
Member Author

I made an attempt at fixing this in #544, which also includes some other improvements. But... it turns out all of the attempt objects for a workflow run are pointing to the same check suite! (The most recent one, naturally -- which means we lose any errors that provoke someone to re-run their tests.) This is blocked unless we can find a solution. I've posted about the issue at https://github.com/orgs/community/discussions/103026.

@jristau1984
Copy link

This is now a Product Feedback submission, since it appears that this is just a bug in the API: https://github.com/orgs/community/discussions/124000

@jristau1984 jristau1984 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

6 participants