Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multiprocessing #92

Open
wants to merge 99 commits into
base: developer
Choose a base branch
from
Open

add multiprocessing #92

wants to merge 99 commits into from

Conversation

LFT18
Copy link

@LFT18 LFT18 commented Apr 19, 2024

No description provided.

target_idx = con_dataset_names.index(target_dataset_name) # dataset index
splits = np.cumsum([0] + baseline_dataset.con_shapes)
slice_ = slice(*splits[target_idx : target_idx + 2])

num_features = baseline_dataset.con_shapes[target_idx]
#num_features = baseline_dataset.con_shapes[target_idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Or does it still need to be changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's correct now. I cannot check right now because I've been having problems connecting to Esrum all morning, but I'll check as soon as it works again (hopefully soon)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't connect, it's very annoying :( . I'll let you know as soon as I can again, but I think the code should be fine

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to connect finally today at noon :). The file was correct, because those functions are not used at all for multiprocessing, I had just changed that to test some things with the previous functions. But it is true that it led to confusion, so I reverted the changes so that the not used functions have their original code

Henry added 4 commits July 9, 2024 10:18
- see if results betw. single and multirun match.
- decide wheather to log2 transform
- default: do not in order to allow negative features which are then standard normalized
# This will flatten the array, so we get all bayes_abs for all perturbed features
# vs all continuous features in one 1D array
# Then, we sort them, and get the indexes in the flattened array. So, we get an
# list of sorted indexes in the flatenned array
sort_ids = np.argsort(bayes_abs, axis=None)[::-1] # 1D: N x C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the flattening here means that the probabilities and FDR are calculated across perturbations (meaning that actually pertubations increase the number of total probabilities)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

@enryH
Copy link
Member

enryH commented Jul 9, 2024

To solve: Reloading trained models from single process with single process yields not exactly the same fdr array everytime. The multiprocess version in general higher fdr values -> we need to figure out why

@enryH
Copy link
Member

enryH commented Jul 10, 2024

I finally found the issue. The multiprocessing did no yet have categorical vs continous pertubations implemented. I moved the masking into the main function and update the bayes_worker fct accordingly. I think it's ready to be checked now @ri-heme

@enryH enryH marked this pull request as ready for review July 10, 2024 12:15
@enryH enryH requested review from mpielies and ri-heme July 12, 2024 09:21
Copy link
Collaborator

@ri-heme ri-heme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Henry. Left some comments here and there (some type hints were changed, some comments are still there, and the some questions/suggestions).

Thanks for your help!

.gitignore Outdated Show resolved Hide resolved
src/move/tasks/encode_data.py Outdated Show resolved Hide resolved
)
if reconstruction_path.exists():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never True, right? Because saving the reconstructions is commented out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but like this it is easier to compare bayes_approach with bayes_parallel

@@ -270,7 +259,8 @@ def _bayes_approach(
mask = feature_mask[:, [i]] | nan_mask # 2D: N x C
diff = np.ma.masked_array(mean_diff[i, :, :], mask=mask) # 2D: N x C
prob = np.ma.compressed(np.mean(diff > 1e-8, axis=0)) # 1D: C
bayes_k[i, :] = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8)
computed_bayes_k = np.log(prob + 1e-8) - np.log(1 - prob + 1e-8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create this variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to show where a worker function could be called.

src/move/tasks/identify_associations.py Show resolved Hide resolved
src/move/tasks/analyze_latent.py Outdated Show resolved Hide resolved
src/move/tasks/identify_associations.py Outdated Show resolved Hide resolved
src/move/conf/schema.py Outdated Show resolved Hide resolved
@enryH
Copy link
Member

enryH commented Jul 12, 2024

@ri-heme just merge if you thinks it's fine now:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants