Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DATA REQUEST] Add OM4_025.JRA_RYF #175

Open
2 of 5 tasks
anton-seaice opened this issue Jun 25, 2024 · 24 comments
Open
2 of 5 tasks

[DATA REQUEST] Add OM4_025.JRA_RYF #175

anton-seaice opened this issue Jun 25, 2024 · 24 comments
Assignees
Labels
data request Add data to the catalog

Comments

@anton-seaice
Copy link
Collaborator

Description of the data product

<Please replace this text with a description of the data product to add to the ACCESS-NRI catalog. What data does it contain? What format is it in? Who is it useful for?>

Location of the data product on Gadi

Checklist

Add a "x" between the brackets to all that apply

  • This data product is stable (unlikely to change substantially or move)
  • This data product is of use to the broader community
  • This data product is documented:
  • This data product is licensed under
  • Those who want to access this data can be added to the project that houses it
@anton-seaice anton-seaice added the data request Add data to the catalog label Jun 25, 2024
@anton-seaice
Copy link
Collaborator Author

Following on from COSIMA/cosima-recipes#369 , I am suggesting maybe adding OM4_025.JRA_RYF to the intake catalog.

@dougiesquire - As this is a different model configuration, I guess this would require a new datastore "builder", so maybe its not worth the effort? The runs are used in cosima recipes to show examples of handling MOM6 data.

@adele-morrison - Are their companion runs to OM4_025.JRA_RYF which also should be added? Can you help with the "Description of the data product" and "Location of the data product on Gadi" sections, and then I will edit the original post please?

@anton-seaice
Copy link
Collaborator Author

I tried using the access-om3 builder, and got these errors when using builder.parser:

{'INVALID_ASSET': '/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ice_daily.nc', 'TRACEBACK': 'Traceback (most recent call last):\n File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.04/lib/python3.10/site-packages/access_nri_intake/source/builders.py", line 329, in parser\n raise ParserError(f"Cannot determine realm for file {file}")\naccess_nri_intake.source.builders.ParserError: Cannot determine realm for file /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ice_daily.nc\n'}

{'INVALID_ASSET': '/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ocean_daily.nc', 'TRACEBACK': 'Traceback (most recent call last):\n File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.04/lib/python3.10/site-packages/access_nri_intake/source/builders.py", line 329, in parser\n raise ParserError(f"Cannot determine realm for file {file}")\naccess_nri_intake.source.builders.ParserError: Cannot determine realm for file /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF/output000/19000101.ocean_daily.nc\n'}

@dougiesquire
Copy link
Collaborator

Ah, yet another permutation of file naming. It might be safest just to write a dedicated builder, which is straightforward. I guess it would be an Om4Builder?

Is this output structured in a similar way to the regional MOM6 output? If so, it may be worth thinking about writing a builder that handles both?

@adele-morrison
Copy link

Apologies for being slow. Yes, lets add the panan experiments to Intake. We'd still to like delete a bunch of the daily data for the 1/20th panan, is that ok to do after it's added to Intake? After that frees up space on ol01 ideally I'd also like to move the 1/10th panan from ik11 to ol01. But the current locations are as follows:
/g/data/ol01/outputs/mom6-panan/panant-0025-zstar-ACCESSyr2/
and
/g/data/ol01/outputs/mom6-panan/panant-005-zstar-ACCESSyr2/
and
/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/

@dougiesquire
Copy link
Collaborator

We'd still to like delete a bunch of the daily data for the 1/20th panan, is that ok to do after it's added to Intake?

I think if we know this is going to happen then it would be better to wait until it is done. We can get a Builder set up and ready to go though.

@marc-white
Copy link
Collaborator

@anton-seaice could you please add the precise location(s) of the data on Gadi?

@anton-seaice
Copy link
Collaborator Author

@adele-morrison is more on top of it than I am ? Noting the comments above about possibly moving it.

/g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF seem to be appropriate to get some sample data if thats what you are after ?

@adele-morrison
Copy link

Yes that’s the right location. Would be great to get this in the catalog so we can keep switching all the COSIMA recipes over. What do you need in terms of documentation?

@adele-morrison
Copy link

I don’t think there’s any plans to move OM4_025.JRA_RYF. The panan data location is still in flux. I will try to keep that moving forward.

@marc-white
Copy link
Collaborator

OK, I'll start taking a look at the current data structure and builders to see what needs to happen to get these data ingested. Stay tuned...

@marc-white
Copy link
Collaborator

The filenames all look pretty coherent, but there's a couple of things I haven't been able to work out on my own:

  • What is the 'static' frequency, e.g., 19000101.ocean_static.nc? I'm assuming this is some sort of snapshot - should this file be ingested?
  • There are 'scalar' versions of some files, e.g., 19000101.ocean_annual.nc and 19000101.ocean_scalar_annual.nc. Again, what do these represent, and should they be ingested?

@minghangli-uni
Copy link

What is the 'static' frequency, e.g., 19000101.ocean_static.nc

It contains fields that do not change in frequency, such as grid-related data. It is saved once per run.

19000101.ocean_annual.nc

contains annually-averaged 2d fields

19000101.ocean_scalar_annual.nc

contains annually-averaged 0d fields

@anton-seaice
Copy link
Collaborator Author

I think we want all of those files - there is a frequency = 'fx' for the static files which exists in OM2 and OM3 datastores (and maybe others)

@marc-white
Copy link
Collaborator

Ah yes, I've found the fx frequency down in the utils package - I might variable that out so it's clearer

@marc-white
Copy link
Collaborator

Dumping this here so I can find it later (for building workable test data): https://stackoverflow.com/questions/15141563/python-netcdf-making-a-copy-of-all-variables-and-attributes-but-one

@marc-white
Copy link
Collaborator

I now have what I think is a functional AccessOm4Builder that works on /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF. Are there some other data locations that I should be attacked as a check?

@dougiesquire
Copy link
Collaborator

@marc-white, we definitely don't want to call this AccessOm4Builder. The "OM4"
data at /g/data/ik11/outputs/mom6-om4-025/OM4_025.JRA_RYF is from GFDL OM4 (I think - @adele-morrison can you confirm?), not an ACCESS model.

I'd suggest seeing if the data mentioned in this comment can use the same builder. If so, then we could possibly call the builder Mom6Builder

@marc-white
Copy link
Collaborator

/g/data/ol01/outputs/mom6-panan/panant-0025-zstar-ACCESSyr2/
and
/g/data/ol01/outputs/mom6-panan/panant-005-zstar-ACCESSyr2/
and
/g/data/ik11/outputs/mom6-panan/panant-01-zstar-ACCESSyr2/

I've updated the Builder to be able to read the filenames found in these directories. However, I've come across an interesting conundrum whilst trying to test the resulting catalog; the data in those three directories are, when ingested in to the catalog, pretty much identical, to the point where I can't figure out how to, say, get only the data from 0025-zstar (without resorting to the obvious solution of building a catalog only from that directory). This is causing me to have issues in forming a Dask array, because the catalog doesn't understand how to merge the files (I think it is ending up with three 'layers' of the same time series, and bombs out).

For the uninitiated like myself, what is the difference between these three runs, and how can I differentiate between them in an intake/access-nri-intake way?

@dougiesquire
Copy link
Collaborator

@marc-white, each of the experiments should be separate intake-esm datastores within the catalog.

@marc-white
Copy link
Collaborator

HI @anton-seaice and @adele-morrison , I'm now at the point where I'm ready to try an all-up ingest of the data. However, the metadata.yaml for OM4_025.JRA_RYF is incomplete, and doesn't exist for the mom6-panan datasets. Could you please add one for each dataset? Instructions are here: https://access-nri-intake-catalog.readthedocs.io/en/latest/management/building.html#metadata-yaml-files

@adele-morrison
Copy link

adele-morrison commented Aug 30, 2024

I've updated the metadata.yaml for OM4_025.JRA_RYF. I think @AndyHoggANU ran it, so some of the entries are currently just me guessing what the simulation is.

We're not quite ready to add the panan simulations ending in zstar-ACCESSyr2 to Intake yet (as above), because we still need to delete a bunch of that data and shift the 1/10th deg to ol01.

But we could add /g/data/ik11/outputs/mom6-panan/panant-01-zstar-v13 and panant-01-hycom1-v13 to Intake now.

@adele-morrison
Copy link

@AndyHoggANU any chance you want to create the metdata.yamls for panant-01-zstar-v13 and panant-01-hycom1-v13? Or @julia-neme perhaps you could do the panant-01-zstar-v13 one? That's what you used in your paper right?

@adele-morrison
Copy link

I've confirmed with @AndyHoggANU and metadata.yaml for OM4_025.JRA_RYF is good to go.

@AndyHoggANU
Copy link

OK, there are metadata.yaml files for both panant experiments now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data request Add data to the catalog
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

7 participants