Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irregular number of data points considered for each LST time bin #746

Open
matyasmolnar opened this issue Oct 21, 2021 · 0 comments
Open

Comments

@matyasmolnar
Copy link
Contributor

matyasmolnar commented Oct 21, 2021

In attempting to return unaveraged LST-binned data files, I noticed that the number of data points considered for each time bin was irregular.

I'm working on the 18 nights of H1C_IDR2.2, with the only change being ntimes_per_file = 30 in the lstbin_grp1of1.toml file.

If I turn on return_no_avg=True on pickle the final data container, as I do here, I don't get the expected 2 (time integrations) x 18 (days) data points for each time bin.

Inspecting the unaveraged data file:

import pickle

with open('/lustre/aoc/projects/hera/mmolnar/LST_bin/binned_files/no_avg/zen.grp1.of1.LST.1.40949.HH.OCRSLU.uvh5.pkl', 'rb') as f:
    no_avg_dc = pickle.load(f)

# trying a sample baseline
for count, i in enumerate(no_avg_dc[(12, 13, 'ee')]):
    print(count, len(i))

returning

0 36
1 36
2 36
3 36
4 36
5 36
6 36
7 36
8 36
9 36
10 36
11 36
12 36
13 37
14 37
15 37
16 37
17 37
18 37
19 37
20 37
21 37
22 37
23 37
24 37
25 37
26 37
27 37
28 37
29 37

Further changing this line to:
bin_count.append(np.sum(np.ones_like(d, dtype=bool) * n_c, axis=0)) further confirms this as the bin counts obtained here are also 36 and 37.

I also tried to hack the lst_bin function to return unaveraged data without using return_no_avg = True , by creating appending the unaveraged data to lists and creating arrays out of them - see from this line here (specifically the real_unavg
and d_unavg variables). This still returned an inconsistent number of data points (36 for the first 12 indices and 37 for the others).

In summary:

  • d in the lst_bin function does not have a regular number of data points for each time bin
  • n is somehow zero-ing the extra row of data so that in practice when averaging (e.g. here real_avg.append(np.sum(d.real * n, axis=0) / norm)) is done, 2 x no_nights are indeed considered, hence why this effect is not easily caught.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant