Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflow spec files in WMAgents upon site list changes #12039

Open
amaltaro opened this issue Jul 11, 2024 · 2 comments
Open

Update workflow spec files in WMAgents upon site list changes #12039

amaltaro opened this issue Jul 11, 2024 · 2 comments

Comments

@amaltaro
Copy link
Contributor

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
This is a sub-task of this meta-issue: #8323
Towards providing a feature that allows workflow site lists to be changed while workflows are active in the system.

Describe the solution you'd like
The expected behavior for this ticket is simple, but how it is supposed to be implemented is still not completely clear.

What is expected from this ticket is: whenever a workflow is active in the system (from assigned to running) and it is updated with new site lists (SiteWhitelist/SiteBlacklist), the agent(s) working on that given workflow need to update the local workflow spec file. A potential component candidate for this would be WorkflowUpdater, if we can make this development simple enough.

Describe alternatives you've considered
How an agent knows when it has to update the workflow spec or not? I would say the following are valid options:
a) it does not have to know, it will simply download/update spec files every x hours (12? 24?)
b) if there is document timestamp, we might be able to use that information.

Otherwise, we would need to keep this record somewhere, and have agents using that information to decide which workflows need to have the spec updated.

Additional context
None

@amaltaro amaltaro changed the title Update workflow spec files in WMAgents Update workflow spec files in WMAgents upon site list changes Jul 11, 2024
@vkuznet vkuznet self-assigned this Sep 17, 2024
@vkuznet
Copy link
Contributor

vkuznet commented Sep 17, 2024

Alan, in order to proceed with this issue please clarify the following:

Therefore, the solution (b) will require modification of WMBS database schema, while solution (a) requires storing previous state somehow.

Please clarify which approach should be implemented as I don't want to waste development time if it will not be required. Since WorkflowUpdater already fetches all workflows in its algorithm it seems logical to proceed with option (a) but as I mentioned we will need to keep around a previous list of workflows somewhere. If this is desired option please clarify where to store this information and how.

@vkuznet
Copy link
Contributor

vkuznet commented Sep 18, 2024

After discussion with Alan we end-up with two possible solution:

  1. Use PubSub model and NATS (or similar) server where new changes to workflow will be published at reqmgr2 site and consumed by WMAgents
    • here ReqMgr2 will be publisher, while WMAgents will be subscribers
    • we can also extend it to other components
  2. Use polling model and fetching docs from reqmgr2/CouchDB to agents. This workflow should compare specs and act upon any changes
    • here we will follow synchronous model, i.e. someone will post update to ReqMgr2, the WMagents will run polling cycle to fetch all docs and compare them with information present in ReqMgr2, then act upon it.

The solution (1) requires setting up NATS (or similar) server (this is referring as infrastructure, see CMS NATS) and developing publisher and subscriber clients. To simplify this I provided very basic example which can be run on any laptop using python. Please find it in this document.

The solution (2) will require careful evaluation of scalability issues like:

  • fetching O(1000) workflows at each polling cycle (concurrency issue and RAM utilization)
    • if we'll send O(1000) requests to ReqMgr2 from each agent we must guarantee that it will sustain such load (requests can be send in chunks of 100)
    • if we'll pack all docs from CouchDB into a single payload we'll face RAM spike at WMAgent consuming such document, or we must introduce streaming via NDJSON to avoid such possibility
  • walking through O(1000) workflows to identify changes and act upon it
    • the sequential loop can run a long time to walk through each spec
    • the async loop is prone to proper managing errors, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

2 participants