Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searchable database for CODA #22

Open
ChrisC28 opened this issue May 15, 2024 · 5 comments
Open

Searchable database for CODA #22

ChrisC28 opened this issue May 15, 2024 · 5 comments
Assignees

Comments

@ChrisC28
Copy link
Contributor

Can we produce a searchable database for CODA?

Issues to think about:

  • File format? Parquet?
  • Do we need to change the format or structure of the underlying CODA netcdf files (ie. is grouped yearly, by data originator and platform optimal for a searchable database? Would day-by-day or profile-by-profile work better?)
  • How do we interogate? Polars? https://docs.pola.rs/
@Thomas-Moore-Creative Thomas-Moore-Creative self-assigned this May 15, 2024
@Thomas-Moore-Creative
Copy link
Contributor

Thomas-Moore-Creative commented Jul 18, 2024

Yesterday @ChrisC28 & @BecCowley we discussed this issue and talked about the following use case:

for all the ARGO (or CTD, or ... ) data type in a given region, over a time period, over a depth range, with given variable . .

  • quickly and easily make plots of where this data is on a map
  • grab all the data, record by record in a tabular format
  • grab all the data as analysis-ready profiles

@Thomas-Moore-Creative
Copy link
Contributor

Thomas-Moore-Creative commented Aug 14, 2024

starting with 2012 as an example: /g/data/es60/users/thomas_moore/CODA/2012

(base) tm4888@gadi-login-04 /g/data/es60/users/thomas_moore/CODA/2012 ls -l
total 119157796
-rw-rw-r-- 1 tm4888 es60     2621287 May 28 16:13 AIMS_CODA_2012_ctd.nc
-rw-rw-r-- 1 tm4888 es60    39738448 May 21 10:40 MNF_CODA_2012_ctd.nc
-rw-rw-r-- 1 tm4888 es60    49086082 May 21 10:39 RAN_CODA_2012_ctd.nc
-rw-rw-r-- 1 tm4888 es60 15309446859 May 15 21:31 WOD2018_CODA_2012_ctd.nc
-rw-rw-r-- 1 tm4888 es60 33029760782 May 15 23:46 WOD2018_CODA_2012_gld.nc
-rw-rw-r-- 1 tm4888 es60    85704294 May 15 23:05 WOD2018_CODA_2012_mrb.nc
-rw-rw-r-- 1 tm4888 es60   129700151 May 15 21:21 WOD2018_CODA_2012_osd.nc
-rw-rw-r-- 1 tm4888 es60 71305005747 May 15 22:54 WOD2018_CODA_2012_pfl.nc
-rw-rw-r-- 1 tm4888 es60  2066312325 May 15 21:35 WOD2018_CODA_2012_xbt.nc

@Thomas-Moore-Creative
Copy link
Contributor

Heya @ChrisC28 do variables like Temperature_WODflag need to be float64?
CleanShot 2024-08-14 at 15 01 21@2x

@Thomas-Moore-Creative
Copy link
Contributor

@ChrisC28 ... can you see / reach /oa-decadal-climate/work/observations/CARSv2_ancillary/CODA/CODAv1/?

tube locked up on me and then suddenly everything from /oa-decadal-climate/work/ was "gone"?

@BecCowley
Copy link
Contributor

The CODA netcdf files contain different numbers of variables for each type (eg, WOD files have more variables than MNF files).
When converting to parquet formats, the schemas will be different for each and with my small amount of knowledge on parquet collections, that will mean separate loading. Or, will we fill in variables with empty where they are missing (eg, WOD_id filled with NaN in MNF files) to make a consistent and complete dataset?
Thoughts @Thomas-Moore-Creative, @ChrisC28?

Thomas-Moore-Creative added a commit that referenced this issue Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants