Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename extension issues in cat.search() #74

Closed
headmetal opened this issue May 18, 2023 · 2 comments
Closed

Filename extension issues in cat.search() #74

headmetal opened this issue May 18, 2023 · 2 comments

Comments

@headmetal
Copy link

I'm not sure if this is an intended behaviour or a bug (and I apologise in advance if this is in fact intended):

In the following case, if I attempt to explicitly nominate a given filename ocean_month.1mon from the available keys() it returns an empty data_dict:

image

However, if I add a wildcard and exclude .1mon from the filename, the data_dict is populated as expected:

image

I'm guessing the .1mon isn't the real file extension, but is in fact part of the filename - so is messing up loading the file?

@dougiesquire
Copy link
Collaborator

Thanks @headmetal . ocean_month.1mon is not a filename, it's a "key" for a dataset in the intake-esm datastore. The keys are intake-esm's way of knowing how to concatenate all the files in the datastore into "datasets". For ACCESS-OM2 datastores, the keys are made up of two fields from the table (see subcat.df to see the table):

  • the file_id, which is parsed from the filename (the ocean_month part in your case)
  • the frequency (the 1mon part)

It you want to load a dataset directly by key, you can use something like:

subcat["ocean_month.1mon"].to_dataset_dict()

Alternatively, in this specific case where the filename ocean_month alone defines a unique dataset, you could get the same data by querying on the filename field as you have done above, e.g.:

subcat.search(filename="ocean_month").to_dataset_dict()

However, it isn't always guaranteed that filename alone defines a unique dataset. E.g. I've come across model runs containing two files with the same name containing data at different frequencies. That's why the frequency info is also needed in the key.

I hope this helps and doesn't make things even less clear. Some more info on keys in intake-esm can be found here: https://intake-esm.readthedocs.io/en/stable/how-to/understand-keys-and-how-to-change-them.html

@dougiesquire
Copy link
Collaborator

This issue originated from a confusing line in example_usage.ipynb. I've added a note for how to improve this in #75 (comment), so I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants