Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling forking of process and thread within DYAD. #60

Open
hariharan-devarajan opened this issue Jan 10, 2024 · 2 comments
Open

Handling forking of process and thread within DYAD. #60

hariharan-devarajan opened this issue Jan 10, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@hariharan-devarajan
Copy link
Collaborator

Currently, when a fork happens, the dyad context is not re-initialized, which potentially causes UCX endpoint creation errors. We have to investigate what to reinitialize.

Current thoughts

  1. Reintialize DYAD CTX.
  2. check if UCX can be reinitialized from the forked process.
@hariharan-devarajan hariharan-devarajan self-assigned this Jan 10, 2024
@ilumsden
Copy link
Collaborator

I'm pretty sure we will have to reinitialize everything. At bare minimum, we will need to reinitialize the DTL because the UCX context and worker cannot be shared across processes. I am also pretty sure that anything Flux related (e.g., the flux_t handle) will need to be reinitialized.

@JaeseungYeom
Copy link
Contributor

JaeseungYeom commented Jan 17, 2024

We will support two modes of child process creation. Forking and Spawning. We will not support threading for now until we are confident that multi-process support is robust.
For process creation, it is important to understand various mechanisms by which a new process is created so that we can identify solutions to trigger initialization upon creation. python multiprocessing fork seems to rely on system fork while python spawn does not. Python multiprocessing supports at-fork custom callback. According to Hari, pytorch offers similar capability in itself. In some cases, we may need to intercept creation calls and add dyad initialization. I will at least add a call to reinitialize, and define an environment variable to select the re-initialization behavior rather than the default one with which initialization will be skipped if dyad context object exits. PR #63

@JaeseungYeom JaeseungYeom added the enhancement New feature or request label Jan 17, 2024
@JaeseungYeom JaeseungYeom self-assigned this Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants