Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_datacontainers is slow #859

Open
steven-murray opened this issue Jan 12, 2023 · 0 comments
Open

build_datacontainers is slow #859

steven-murray opened this issue Jan 12, 2023 · 0 comments

Comments

@steven-murray
Copy link
Contributor

In the same profile for which #858 was reported, I also found that ~3k seconds (about 15% of total time or 20% of total 'read' time) was taking in the build_datacontainers method, where it seems like it's the _get_slice method that is taking the time. Looking into this a bit further, I think that most of the time is taken because we're copying the data_array contents. Now, this is definitely a good thing to do by default, because you don't want a view of your array hanging around waiting to be inadvertently modified, but in some cases it doesn't really matter if we have a copy or not, and so I think it might be useful to have the option of making this non-copying.

Total time: 3129.79 s
File: /lustre/aoc/projects/hera/heramgr/anaconda3/envs/h6c/lib/python3.10/site-packages/hera_cal/io.py
Function: build_datacontainers at line 685

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   685                                               def build_datacontainers(self):
   686                                                   '''Turns the data currently loaded into the HERAData object into DataContainers.
   687                                                   Returned DataContainers include useful metadata specific to the data actually
   688                                                   in the DataContainers (which may be a subset of the total data). This includes
   689                                                   antenna positions, frequencies, all times, all lsts, and times and lsts by baseline.
   690                                           
   691                                                   Returns:
   692                                                       data: DataContainer mapping baseline keys to complex visibility waterfalls
   693                                                       flags: DataContainer mapping baseline keys to boolean flag waterfalls
   694                                                       nsamples: DataContainer mapping baseline keys to interger Nsamples waterfalls
   695                                                   '''
   696                                                   # build up DataContainers
   697      2383      14136.0      5.9      0.0          data, flags, nsamples = odict(), odict(), odict()
   698      2383   19193325.0   8054.3      0.6          meta = self.get_metadata_dict()
   699   2474287    5506177.0      2.2      0.2          for bl in meta['bls']:
   700   2471904  884136579.0    357.7     28.2              data[bl] = self._get_slice(self.data_array, bl)
   701   2471904  865655205.0    350.2     27.7              flags[bl] = self._get_slice(self.flag_array, bl)
   702   2471904  864163095.0    349.6     27.6              nsamples[bl] = self._get_slice(self.nsample_array, bl)
   703      2383   25802682.0  10827.8      0.8          data = DataContainer(data)
   704      2383   24716964.0  10372.2      0.8          flags = DataContainer(flags)
   705      2383   27281524.0  11448.4      0.9          nsamples = DataContainer(nsamples)
   706                                           
   707                                                   # store useful metadata inside the DataContainers
   708      9532      30303.0      3.2      0.0          for dc in [data, flags, nsamples]:
   709     71490     378699.0      5.3      0.0              for attr in ['ants', 'data_ants', 'antpos', 'data_antpos', 'freqs', 'times', 'lsts', 'times_by_bl', 'lsts_by_bl']:
   710     64341  412908902.0   6417.5     13.2                  setattr(dc, attr, copy.deepcopy(meta[attr]))
   711                                           
   712      2383       4949.0      2.1      0.0          return data, flags, nsamples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant