Should the cookbook default to stricter requirements when merging/concatenating data? #319

dougiesquire · 2023-01-25T00:15:10Z

Motivating example here: COSIMA/cosima-recipes#229. Files with the same naming in a single experiment are on different domains depending on the output* directory. Should the cookbook check whether indexes are the same for data being merged/concatenated?

E.g. passing join="exact" to the open_mfdataset() call within the cosima-cookbook will then return an error when indexes to be aligned are not equal. This could currently be passed through kwargs, but should it be the default?

The text was updated successfully, but these errors were encountered:

rmholmes · 2023-01-25T00:24:57Z

Thanks for catching this @dougiesquire.

I'm not sure this is a cookbook issue. I think it's more an issue with the data itself. I don't think it's a good idea to have output defined on different regions using the same file name. I'd suggest that a good way to deal with this issue is to rename the ocean_daily_3d_u_%.nc files in output196-output279 to something like ocean_daily_3d_u_southern_ocean_%.nc. They can then be separated using the nc_file argument to cc.querying.getvar.

But I guess even then, it would still be useful to flag it so that the user knows they have to use nc_file.

dougiesquire · 2023-01-25T00:43:47Z

Thanks @rmholmes. I wasn't meaning to suggest that the issue is with the cookbook, but having join="exact" as default would've saved me a bunch of time yesterday. I.e. it could be useful for helping to find/flag issues with the data.

My guess is that most uses of the cookbook are to query/load datasets that should have consistent indexes. So having join="exact" as default could make sense - users could always override the kwarg if they want to merge inconsistent data.
But, I'm probably just not across the full range of cookbook use cases.

dougiesquire · 2023-01-25T00:46:21Z

But yes, for fixing the specific issue with 01deg_jra55v13_ryf9091, changing the name of the nc files sounds sensible to me. Who would be in charge of doing that?

rmholmes · 2023-01-25T00:47:05Z

Sorry @dougiesquire, I didn't completely take in your comment here as I'd copied my response across from the cosima-recipes issue you'd put up. I'd support a move to the stricter requirements.

I think @AndyHoggANU ran that simulation.

angus-g · 2023-01-25T01:02:26Z

I agree that join="exact" seems like a sensible default, it's a bit crazy that xarray tries to concatenate datasets like that in the first place!

aidanheerdegen · 2023-01-25T01:38:39Z

Is there any time penalty with join="exact"? ISTR folks complaining about xarray doing time consuming checks on coordinates under some circumstances, but it may be me misremembering, or the issue may no longer be a problem.

angus-g · 2023-01-25T01:45:14Z

I think the checks by compat are more expensive than those for join (which just tries to align dimension sizes)? Probably one of those things where it's best to just benchmark it.

dougiesquire · 2023-01-25T01:51:21Z

Agreed. I wouldn't expect any difference in speed for data that can be joined.

As @angus-g mentioned, there are other kwargs that can be changed to improve performance, but they require making some assumptions about the data being loaded. I don't know whether these are justified for the COSIMA data?

EDIT: see the Note here: https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets

angus-g added the 👨‍👩‍👧‍👦👨‍👩‍👦‍👦👨‍👩‍👧‍👧 user experience label Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the cookbook default to stricter requirements when merging/concatenating data? #319

Should the cookbook default to stricter requirements when merging/concatenating data? #319

dougiesquire commented Jan 25, 2023

rmholmes commented Jan 25, 2023 •

edited

Loading

dougiesquire commented Jan 25, 2023

dougiesquire commented Jan 25, 2023

rmholmes commented Jan 25, 2023 •

edited

Loading

angus-g commented Jan 25, 2023

aidanheerdegen commented Jan 25, 2023

angus-g commented Jan 25, 2023

dougiesquire commented Jan 25, 2023 •

edited

Loading

Should the cookbook default to stricter requirements when merging/concatenating data? #319

Should the cookbook default to stricter requirements when merging/concatenating data? #319

Comments

dougiesquire commented Jan 25, 2023

rmholmes commented Jan 25, 2023 • edited Loading

dougiesquire commented Jan 25, 2023

dougiesquire commented Jan 25, 2023

rmholmes commented Jan 25, 2023 • edited Loading

angus-g commented Jan 25, 2023

aidanheerdegen commented Jan 25, 2023

angus-g commented Jan 25, 2023

dougiesquire commented Jan 25, 2023 • edited Loading

rmholmes commented Jan 25, 2023 •

edited

Loading

rmholmes commented Jan 25, 2023 •

edited

Loading

dougiesquire commented Jan 25, 2023 •

edited

Loading