Forcing generation scales memory usage unexpectedly #113

raehik · 2023-12-13T10:59:05Z

Forcing generation (see lib.data.compute_forcings_and_coarsen_cm2_6()) is done per time point, independent of any other time point. We operate on "lazy" Dask arrays, which only download backing data when scheduled and can stream outputs out to file.

Since we don't need to hold forcings in memory after calculation (we can just write them to file), we should be able to change --ntimes timepoints we compute forcings for without largely impacting memory usage. But that doesn't appear to be the case. When testing with a single Dask worker, peak memory usage roughly doubled between --ntimes 50 and --ntimes 100.

Note that this "should" is reliant on Dask scheduling operations efficiently, which may not be a guarantee. A user can guide it in a few ways. See #107 , where this cropped up.

The text was updated successfully, but these errors were encountered:

dorchard · 2023-12-19T16:30:14Z

@CemGultekin1 have you come across any memory issues with the gz code as well in your own explorations?

dorchard · 2023-12-19T16:31:46Z

@raehik can you isolate this and make an MWE that we could ask the Dask developers about?

raehik mentioned this issue Dec 13, 2023

Forcing generation is too memory hungry again #107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forcing generation scales memory usage unexpectedly #113

Forcing generation scales memory usage unexpectedly #113

raehik commented Dec 13, 2023 •

edited

Loading

dorchard commented Dec 19, 2023

dorchard commented Dec 19, 2023

Forcing generation scales memory usage unexpectedly #113

Forcing generation scales memory usage unexpectedly #113

Comments

raehik commented Dec 13, 2023 • edited Loading

dorchard commented Dec 19, 2023

dorchard commented Dec 19, 2023

raehik commented Dec 13, 2023 •

edited

Loading