Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error when store_window=True in pspec_run #354

Open
adeliegorce opened this issue Apr 20, 2022 · 4 comments
Open

Memory error when store_window=True in pspec_run #354

adeliegorce opened this issue Apr 20, 2022 · 4 comments

Comments

@adeliegorce
Copy link
Contributor

The option in pspec_run that allows to compute and store (approximate) window functions leads to a memory overload.
For example, for two datasets being combined to give a PSPec object with uvp.Ntimes = 8052, uvp.Nblpairs = 63, uvp.Npols = 1 and one spectral window (164 frequency channels), the memory usage between a case with and without window functions is about 25Gb.
Note: uvp.window_function_array is defined as an array of float64.
First spotted in the validation notebook hera-validation/test-series/0/test-0.2.0.ipynb.

@nkern
Copy link
Member

nkern commented Apr 20, 2022

This is kind of an insurmountable issue if one wants to compute the windows func for all baselines and times. For many estimators however the wf doesn’t change along these axes, so in the H1C analysis for example we run the pipeline w store_window False, run our bl and time averaging, then go back and compute the wf once.

I think leaving the current capability to compute and store all bltimes is needed for future estimators that may not have time abs bl independent wf, but if you want to take this problem on we could write a method that identifies time and blpaor independent estimators and just stores one or a few wfs in memory

@adeliegorce
Copy link
Contributor Author

Okay, I see. I guess I was thinking about the second option you mention where you do not store the same array a thousand times but it might be difficult to make general enough. Also, do you think it needs to be float64 and not 32?

@nkern
Copy link
Member

nkern commented Apr 20, 2022

Yeah I think it should be fairly straightforward to do this.

Two things can be done to make it easier to store blpair and time-independent window funcs.

  1. for time independent WF shrink the "time" axis of the wf to 1 in memory (i.e. the uvp.window_function_array). however, when you go to actually call the window func in uvp.get_window_func it is automatically inflated to the full Ntimes for you. This could be directed by a uvp.time_dep_wf=False flag, which would be set by the user in pspec run.

  2. for blpair independent WF this could be as simple as just storing the wf for a single blpair, and in the get_window_func method you simply assume that only 1 blpair is stored and modify the blpairts indexing array to accomodate that. this would require that uvpspec hold only a single WF per object, which could similarly be directed by a uvp.blpair_dep_wf=False flag.

These changes might require updates to the check function in the limit of the above kwargs being False.
lmk if all of this makes sense. no pressure to actually do this if you are not able to take it on--you can always do the hack I did for the H1C analysis, but if you'd like to see this handled more elegantly in uvpspec then this is certainly one way to go. lmk, thanks!

@adeliegorce
Copy link
Contributor Author

Thanks for the suggestions! I'll keep it in the back of my head and do it eventually :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants