Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require_all_on fails with a derived variable registry #436

Open
3 tasks done
aulemahal opened this issue Jan 18, 2022 · 0 comments
Open
3 tasks done

require_all_on fails with a derived variable registry #436

aulemahal opened this issue Jan 18, 2022 · 0 comments

Comments

@aulemahal
Copy link
Contributor

aulemahal commented Jan 18, 2022

Here's a quick checklist in what to include:

  • Include a detailed description of the bug or suggestion
  • Output of intake_esm.show_versions()
  • Minimal, self-contained copy-pastable example that generates the issue if possible. Please be concise with code posted.

Description

When searching a catalog that has a derived variable registry and specifying require_all_on, the output is incorrect.

csv file:

simulation_id,ensemble_id,model_institution_id,model_id,experiment_id,timestep_id,domain_id,member_id,variable_id,file
A,CMIP6,CCCma,CanESM,historical,day,NAM,r1i1p1,tasmax,file1
A,CMIP6,CCCma,CanESM,historical,day,NAM,r1i1p1,pr,file2
B,CMIP6,CCCmb,CanESM,historical,day,NAM,r1i1p1,pr,file3
C,CMIP6,CCCmc,CanESM,historical,day,NAM,r1i1p1,tasmax,file4
D,CMIP6,CCCmd,CanESM,historical,day,NAM,r1i1p1,pr,file5
D,CMIP6,CCCmd,CanESM,historical,day,NAM,r1i1p1,prsn,file6
D,CMIP6,CCCmd,CanESM,historical,day,NAM,r1i1p1,tasmax,file7

json file:

{
    "esmcat_version": "0.1.0",
    "assets": {
        "column_name": "file",
        "format": "netCDF"
    },
    "aggregation_control": {
        "variable_column_name": "variable_id",
        "groupby_attrs": ["simulation_id", "domain_id", "timestep_id"],
        "aggregations": [
            {"type": "join_new", "attribute_name": "member_id"},
            {"type": "union", "attribute_name": "variable_id"}
        ],
    },
    "attributes" : [],
    "catalog_file": "test.csv"
}

What I Did

import ast
import intake
import intake_esm

dvr = intake_esm.DerivedVariableRegistry()

@dvr.register(variable='prsn', query={'variable_id': 'pr'})
def prsn(ds):
    return ds.pr

cat = intake.open_esm_datastore('test.json', registry=dvr)
cat.search(variable_id=['tasmax', 'prsn'], require_all_on=['simulation_id']).df

gives

  simulation_id ensemble_id model_institution_id model_id experiment_id timestep_id domain_id member_id variable_id   file
0             D       CMIP6                CCCmd   CanESM    historical         day       NAM    r1i1p1        prsn  file6
1             D       CMIP6                CCCmd   CanESM    historical         day       NAM    r1i1p1      tasmax  file6
2             A       CMIP6                CCCma   CanESM    historical         day       NAM    r1i1p1          pr  file2
3             B       CMIP6                CCCmb   CanESM    historical         day       NAM    r1i1p1          pr  file3
4             D       CMIP6                CCCmd   CanESM    historical         day       NAM    r1i1p1          pr  file5
  • Line 4 is unneeded, but I can live with this.
  • Line 3 is erroneous because B does not have "tasmax", so it shouldn't have been in the output.
  • A doesn't include the tasmax variable. It does exist, so I expected it to be there.

I don't know how to solve this one though...

Version information: output of intake_esm.show_versions()

INSTALLED VERSIONS

cftime: 1.5.1.1
dask: 2021.12.0
fastprogress: 0.2.7
fsspec: 2021.11.1
gcsfs: 2021.11.1
intake: 0.6.4
intake_esm: 2021.8.17.post43+dirty
netCDF4: 1.5.8
pandas: 1.3.5
requests: 2.26.0
s3fs: 2021.11.1
xarray: 0.20.2
zarr: 2.10.3

@aulemahal aulemahal changed the title require_all_on fails with a derived variable reigstry require_all_on fails with a derived variable registry Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant