Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forcing generation scales memory usage unexpectedly #113

Open
raehik opened this issue Dec 13, 2023 · 2 comments
Open

Forcing generation scales memory usage unexpectedly #113

raehik opened this issue Dec 13, 2023 · 2 comments

Comments

@raehik
Copy link
Collaborator

raehik commented Dec 13, 2023

Forcing generation (see lib.data.compute_forcings_and_coarsen_cm2_6()) is done per time point, independent of any other time point. We operate on "lazy" Dask arrays, which only download backing data when scheduled and can stream outputs out to file.

Since we don't need to hold forcings in memory after calculation (we can just write them to file), we should be able to change --ntimes timepoints we compute forcings for without largely impacting memory usage. But that doesn't appear to be the case. When testing with a single Dask worker, peak memory usage roughly doubled between --ntimes 50 and --ntimes 100.

Note that this "should" is reliant on Dask scheduling operations efficiently, which may not be a guarantee. A user can guide it in a few ways. See #107 , where this cropped up.

@dorchard
Copy link
Collaborator

@CemGultekin1 have you come across any memory issues with the gz code as well in your own explorations?

@dorchard
Copy link
Collaborator

@raehik can you isolate this and make an MWE that we could ask the Dask developers about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants