Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

Thomas-Moore-Creative · 2023-02-15T00:41:27Z

Work done by @ChrisC28 & @BecCowley has identified gaps in even the most common baseline ocean observations ( temperature & salinity ) over all time for large areas of the Australian EEZ / region in state-of-the-art global databases like the WOD.

For example this preliminary plot from @ChrisC28 shows the lack of any observations in the WOD for much of the Australian shelf & coast over all time. And regions with coverage may still suffer from seasonal aliasing?

But are we missing local Australian data holdings not in WOD?

Important considerations for using the AODN could include:

How can we search across all observations and all platforms and over all time in these specific regions?
Does AODN data have any flags or metadata on what is / is not in the WOD or other global databases?
If we can identify observations not seen in the WOD can we make that selection across all platforms and grab a single data object with rich metadata on it's provenance?
Double checking for any duplication in WOD data already available.

mhidas · 2023-02-23T03:56:52Z

Thanks @Thomas-Moore-Creative - good quesions, and a great use case for us to know about!

mhidas · 2023-02-23T04:43:47Z

As I mentioned in our meeting today, in the short term you might find it helpful to query our catalogue of all moorings data files (from both the National Moorings Network and the Deep Water Moorings facilities of IMOS). Here's some info I copied from a Wiki. Unfortunately it's on a private Wiki, and it's a bit outdated, but mostly still relevant. I'll try to update the essential info...

Introduction

Moorings facility operators often want to know details of the data files they have provided. A common question is "Which files have I already uploaded?". There is an easy way to find answers to these questions directly, by accessing AODN web services (more specifically a Web Feature Service).

What is a web service?

A web service is a system that accepts requests and returns results over the Web (via HTTP). A request can be typed directly into the address bar of a browser, or given as an argument to a command-line tool (like curl). Often requests are generated by other software that interacts with the service.

The AODN Portal

As an example, web services make it possible to find, preview and download data using the AODN Portal. Behind the scenes, the portal combines three services that it talks to via the web:

A GeoNetwork metadata catalogue, to find data collections;
A Web Map Service (WMS) to generate map tiles (served by GeoServer)
A Web Feature Service (WFS) to provide data downloads (served by GeoServer).

These services can also be accessed directly, at

https://catalogue-portal.aodn.org.au for the metadata; and
http://geoserver-123.aodn.org.au/ for WMS/WFS

Web Feature Service (WFS)

A standard of the Open Geospatial Consortium (OGC)
Allows geographic features (spatial extent + data) to be accessed via the Web.
Allows filtering based on spatial extent and attributes.
Served by GIS software (e.g. GeoServer, QGIS, etc...) based on data in a database or files.
GIS software can also import data from WFS.

In this context a "feature" is a spatial entity (e.g. a point or line) with a set of attributes attached to it (the data). Think of it as a row in a table, where each column is one of the attributes, and one of the columns is the "geometry" specifying the spatial extent of the feature (in the horizontal plane only).

Information about published moorings files

We have set up a WFS called imos:moorings_all_map which allows you to obtain metadata about all currently public data files from the IMOS moorings facilities (National Mooring Network, and Deep Water Moorings). Each feature/row refers to a single file and provides the following details:

file_id: This is just an id in the database, not very useful to you.
url: This is the full path of the file within the AODN storage hierarchy.
Appending this path to 'https://data.aodn.org.au/' generates a downloadable URL (paste it in a browser's address
bar or use command-line tools like wget or curl).
date_created: Date of file creation (from the global attribute)
date_published: Date the file was first added to our database.
date_updated: Date its details were last updated (e.g. if a new file of the same name is uploaded - usually only happens for real-time files.)
size: File size in bytes.
feature_type (global attribute for Discrete Sampling Geometries e.g. "timeSeries", "profile")
file_version: Just the last digit from the "FV0x" label (see IMOS File Nameing Convention).
toolbox_version: Version of the IMOS Matlab Toolbox used to process the file (global attribute)
toolbox_input_file: Name of input file used in the IMOS Matlab Toolbox to generate the file (global attribute)
compliance_checks_passed: Checks applied by the pipeline before publishing. Usually "cf" and "imos:1.4".
(global attribute) NO
LONGER USED
compliance_checker_version: Code version of the IOOS Compliance Checker used. (global attribute)
NO LONGER USED
compliance_checker_imos_version: Code version of the IMOS checker plugin used.
(global attribute) NO LONGER USED
realtime: (true/false)
data_mode: "real-time" or "delayed"
site_code (global attribute)
platform_code (global attribute)
deployment_code (global attribute)
data_category: A general category for the type of data in the file, e.g. "Temperature", "CTD_timeseries",
"Velocity", etc... (currently not cleary defined for some types of file; may be updated or removed in the future)
instrument (global attribute)
instrument_serial_number (global attribute)
instrument_nominal_depth (global attribute)
time_deployment_start (global attribute)
time_deployment_end (global attribute)
time_coverage_start (global attribute)
time_coverage_end (global attribute)
latitude (from LATITUDE variable)
longitude (from LONGITUDE variable)
variables: Comma-separated list of variable names in the file
standard_names: Comma-separated list of standard_name variable attributes names in the file (where applicable)
long_names: Comma-separated list of long_name variable attributes names in the file
geom: The "geometry" of the data in the file, which is a simple point for all mooring timeseries and profiles.

These are boolean properties to allow easier filtering on the presence of certain types of parameter in the file:

has_water_temperature
has_air_temperature
has_salinity
has_water_pressure
has_air_pressure
has_sea_water_velocity
has_oxygen
has_chlorophyll
has_fluorescence
has_wave_parameters

An additional column (FID, always first) is added by the server and can simply be ignored. Refer to the [IMOS NetCDF
Conventions](https://s3-ap-southeast-2.amazonaws.com/content.aodn.org.
au/Documents/IMOS/Conventions/IMOS_NetCDF_Conventions.pdf) for the meaning of the global attributes harvested.

How to query the moorings_all_map WFS?

You can download the entire table in comma-separated-values format (CSV - can be opened in e.g. Excel) by pasting the following request into your browser's address bar. (I have broken it up so it's a bit easier to see what the request is made up of, but you have to put it all on one line, with no spaces):

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv

To save you copy/pasting, here is a direct link to the same request. However, this will tell you everything about all ~40,000 files (download size about 22Mb), which is probably a lot more than you're interested in. Instead, you can apply filters to the table.

For example, to get the list of files for the Palm Passage mooring (GBRPPS) uploaded since the start of the year, add a cql_filter like this (only the last line is new):

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv&
        cql_filter=date_published AFTER 2018-01-01T00:00:00 AND site_code='GBRPPS'

Again, combine the request into one line, with no spaces between the arguments. Since the filter itself needs to contain spaces, they can be replaced with the code '%20'. Or just click (or copy & edit) this link. Now you'll only get the lines you're interested in.

You can also select which columns (properties) you're interested in by adding a propertyName argument, and the downloaded file will only include those columns. E.g. if you only want the file path, deployment code and instrument details for all delayed-mode files:

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv&
        cql_filter=realtime = FALSE&
        propertyName=url,deployment_code,instrument,instrument_serial_number,instrument_nominal_depth

Direct link

How to make it easier

Of course after a while it would become quite tedious typing these long requests into your browser, so better two get a program to do it. Here are a couple of examples of WFS access from a Python script:

More info

See the GeoServer WFS documentation for more details and advanced features.

mhidas · 2023-02-23T05:03:20Z

Here's a much more recent example where I'm using the same WFS (actually a subset of it) to query some metadata related to mooring configurations: https://github.com/aodn/aodn-public-notebooks/blob/d9ee9785221a7d75dbf58371a02db6aaa6ff2687/moorings/common.py#L16

croachutas · 2023-03-09T23:39:53Z

And similar issues apply elsewhere in the world no doubt (I know for a fact that NZ has a lot of data that hadn't made it into their public repositories, colleagues at MetOcean Solutions chased a lot of it up, but unsure if they're able to make it available to us... MetOcena also has T profiles from the Mangōpare sensors developed & deployed during the Moana project which they can't share publicly due to various arrangements with the fisheries industry).

In that light it might be worthwhile thinking about CARS v2.0 not just as a product in the form of a 'static' atlas but as a code base to allow other groups to create regional versions of the atlas with data that might not be generally available.

Thomas-Moore-Creative · 2023-03-31T03:45:36Z

@mhidas - ignorant question. Given https://data.aodn.org.au/imos-data is an S3 bucket should we be able to access it via s3fs? For example with NOAA S3 buckets I'm used to being able to do something like the following:

import s3fs
# Initialize s3 client with s3fs
fs = s3fs.S3FileSystem(anon=True)
# list contents of bucket
fs.ls('https://data.aodn.org.au/imos-data')

I'm no expert in the s3fs package or using S3 but this fails for me?

mhidas · 2023-04-03T03:39:58Z

@Thomas-Moore-Creative You're correct, it's a public S3 bucket, and its name is just imos-data (https://data.aodn.org.au/imos-data is a web front-end to it).
s3fs only needs the bucket name (and optional prefix). So try this:

fs.ls('imos-data')
fs.ls('imos-data/IMOS/ANMN')
# etc...

Thomas-Moore-Creative · 2023-04-03T04:04:47Z

@Thomas-Moore-Creative ... So try this:

Thanks for helping me with those basics, @mhidas!

mhidas · 2023-04-03T05:15:19Z

👍 No worries.

Something worth noting is that there are often multiple data products in there based on the same original observations. I'm not so familiar with the other IMOS facilities, but for the moorings there are generally at least 4-5 levels of product:

Original "raw" data - these are the closest to raw data we publish in NetCDF format. Some pre-processing may be done in instrument-specific software before conversion to NetCDF, but otherwise nothing has been altered or removed, and no quality control has been applied. These files have the label "FV00" in the file name (and global attribute file_version set to 0).
"Quality-controlled" data - these include the same data as above, plus additional computed variables (e.g. depth, salinity) and quality-control flags based on somewhat standard automated QC tests. These files have the label "FV01" in the file name (and global attribute file_version set to 1). Each file holds just one deployment's worth of data from one instrument, so there are usually hundreds of them per mooring site.
"Aggregated timeseries" - These simply aggregate the above FV01 files into a single file per site and measured parameter.
"Hourly timeseries" - These aggregate all data at a site into 2 files (one for all current velocity data, one for everything else), picking out only the "good" data and re-binning it to a common hourly timestamp.
"Gridded timeseries" - These take the hourly product and interpolate vertically to a set of pre-defined depths. We only do this where the water column is sampled at multiple depths, which means it only includes temperature at this stage.

You can learn more about the last three products here.

Thomas-Moore-Creative · 2024-05-07T00:10:53Z

2024 AODN hackathon

Where we'd like to be - are we already there?

a data pipeline for ocean observations that works with python tools on any laptop, supercomputer, or cloud instance.
a searchable catalog of all data that can be filtered by variable, platform/instrument type, time, space, WOD inclusion, other metadata.
returning a lazy object with rich metadata - provenance, QC, in/out WOD
returning a lazy object in xarray? pandas dask dataframe? parquet?

my_data = get_aodn(variable='temperature',time=(2005-01,2018-12),latitude=(-40,-20),longitude=(140,180),WOD='False')

Goals:

What backend service is future-proof for AODN services? (s3?)
What is a MVP we can build now to show the promise of this capability? XBT? CTD? Combination?
Identify any gaps and barriers at the AODN level that could be addressed with time to enable this for all ocean observations in the archive?

Thomas-Moore-Creative added the question Further information is requested label Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

Thomas-Moore-Creative commented Feb 15, 2023

mhidas commented Feb 23, 2023

mhidas commented Feb 23, 2023

mhidas commented Feb 23, 2023 •

edited

Loading

croachutas commented Mar 9, 2023

Thomas-Moore-Creative commented Mar 31, 2023

mhidas commented Apr 3, 2023

Thomas-Moore-Creative commented Apr 3, 2023

mhidas commented Apr 3, 2023

Thomas-Moore-Creative commented May 7, 2024 •

edited

Loading

Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

Comments

Thomas-Moore-Creative commented Feb 15, 2023

But are we missing local Australian data holdings not in WOD?

mhidas commented Feb 23, 2023

mhidas commented Feb 23, 2023

Introduction

What is a web service?

The AODN Portal

Web Feature Service (WFS)

Information about published moorings files

How to query the moorings_all_map WFS?

How to make it easier

More info

mhidas commented Feb 23, 2023 • edited Loading

croachutas commented Mar 9, 2023

Thomas-Moore-Creative commented Mar 31, 2023

mhidas commented Apr 3, 2023

Thomas-Moore-Creative commented Apr 3, 2023

mhidas commented Apr 3, 2023

Thomas-Moore-Creative commented May 7, 2024 • edited Loading

2024 AODN hackathon

Where we'd like to be - are we already there?

Goals:

mhidas commented Feb 23, 2023 •

edited

Loading

Thomas-Moore-Creative commented May 7, 2024 •

edited

Loading