Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ESACCI-SST cmorizer to v3.0 #3697

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
caf735a
add downloader
LisaBock Mar 19, 2024
7658254
first try with daily data
LisaBock Mar 19, 2024
88450d9
modify downloader for daily data
LisaBock Mar 20, 2024
8678f73
add fix type
LisaBock Mar 27, 2024
50bae58
Merge remote-tracking branch 'public/main' into update_esacci_sst
LisaBock May 29, 2024
3d13203
fix cmorizer
LisaBock May 30, 2024
a03421e
first try downloader v3.0
LisaBock May 30, 2024
c9b6f5d
Merge remote-tracking branch 'public/main' into update_esacci_sst
LisaBock Jun 28, 2024
4c279fb
clean downloader
LisaBock Jun 28, 2024
04e1d62
add monthly sst
LisaBock Jun 28, 2024
f3dd97d
update cmor_config
LisaBock Jul 1, 2024
1104156
update reference
LisaBock Jul 1, 2024
f3af2f7
save monthly data
LisaBock Jul 1, 2024
756b409
fix codacy
LisaBock Jul 1, 2024
faa6ffb
fix codacy
LisaBock Jul 1, 2024
576dc8a
tosStderr added
LisaBock Jul 1, 2024
2922a3a
update doc table
LisaBock Jul 1, 2024
d7d64f9
update time period
LisaBock Jul 1, 2024
5cbbf57
fix codacy
LisaBock Jul 1, 2024
820e4da
fix codacy
LisaBock Jul 1, 2024
7080d8d
fix
LisaBock Jul 1, 2024
f04e9cf
fix memory issue
LisaBock Jul 10, 2024
cf05609
rm hardcoded years
LisaBock Jul 10, 2024
1ef0ab8
fix
LisaBock Jul 10, 2024
c785eb8
fix
LisaBock Jul 10, 2024
28866f2
fix
LisaBock Jul 10, 2024
02d0aef
fix date
LisaBock Jul 10, 2024
82c47f7
fix date
LisaBock Jul 10, 2024
c358b72
fix style
LisaBock Jul 11, 2024
96dc7a5
Update doc/sphinx/source/input.rst
LisaBock Jul 11, 2024
2a835d2
Merge branch 'update_esacci_sst' of github.com:ESMValGroup/ESMValTool…
LisaBock Jul 11, 2024
b3d9e75
Update years
LisaBock Jul 11, 2024
4590c1b
Update years
LisaBock Jul 11, 2024
0ee1513
specify global attrs
LisaBock Jul 15, 2024
53df005
fix syntax
LisaBock Jul 16, 2024
d3829fc
changes regarding the review
LisaBock Jul 16, 2024
fd2057d
Merge branch 'fix_set_global_atts' into update_esacci_sst
LisaBock Jul 16, 2024
fb29914
recipes/examples/recipe_check_obs.yml
LisaBock Jul 16, 2024
cf88af8
clear
LisaBock Jul 16, 2024
fe950be
add comment
LisaBock Jul 16, 2024
c13daad
rm import
LisaBock Jul 16, 2024
77de1fe
adjust years
LisaBock Jul 16, 2024
0384623
Merge remote-tracking branch 'public/main' into update_esacci_sst
LisaBock Jul 17, 2024
6e9fc0d
rm regridding of uncertainty field
LisaBock Jul 17, 2024
6346b30
reduce time period for tosStderr in recipe_check_obs.yml
LisaBock Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/sphinx/source/input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ A list of the datasets for which a CMORizers is available is provided in the fol
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| ESACCI-SOILMOISTURE | sm (Eday, Lmon), smStderr (Eday) | 2 | Python |
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| ESACCI-SST | ts, tsStderr (Amon) | 2 | NCL |
| ESACCI-SST | tos (Omon, Oday), tosStderr (Oday) | 3 | Python |
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| ESACCI-WATERVAPOUR | prw (Amon) | 3 | Python |
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
Expand Down
42 changes: 21 additions & 21 deletions esmvaltool/cmorizers/data/cmor_config/ESACCI-SST.yml
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
---
# Common global attributes for Cmorizer output
filename: '{year}{month}15_regridded_sst.nc'

attributes:
dataset_id: ESACCI-SST
version: '2.2'
tier: 2
version: 3.0-L4-analysis
tier: 3
modeling_realm: sat
project_id: OBS
source: 'http://surftemp.net/regridding/index.html'
reference: ["esacci-sst", "esacci-sst-bias-correction"]
comment: "Note that the variable tsStderr is an uncertainty not a standard error."
project_id: OBS6
source: 'dx.doi.org/10.5285/4a9654136a7148e39b7feb56f8bb02d2'
reference: ["esacci-sst"]

# Variables to cmorize (here use only filename prefix)
# Variables to cmorize (here use only filename ending)
variables:
ts:
mip: Amon
raw: sst
file: ESACCI-SST_sat_L4-GHRSST-SSTdepth-OSTIA-GLOB
tsStderr:
mip: Amon
raw: sst_uncertainty
file: ESACCI-SST_sat_L4-GHRSST-SSTdepth-OSTIA-GLOB
tos:
mip: [Oday, Omon]
raw: analysed_sst
frequency: day
filename: ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_CDR3.0-v02.0-fv01.0.nc
start_year: 1980
end_year: 2021

# uncomment this part to produce sst cmorized data for ocean realm (Omon, tos)
# tos:
# mip: Omon
# raw: sst
# file: ESACCI-SST_sat_L4-GHRSST-SSTdepth-OSTIA-GLOB
tosStderr:
mip: [Oday]
raw: analysed_sst_uncertainty
frequency: day
filename: ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_CDR3.0-v02.0-fv01.0.nc
start_year: 1980
end_year: 2021
8 changes: 4 additions & 4 deletions esmvaltool/cmorizers/data/datasets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -549,12 +549,12 @@ datasets:
Put all files under a single directory (no subdirectories with years).

ESACCI-SST:
tier: 2
source: ftp://anon-ftp.ceda.ac.uk/neodc/esacci/sst/data/
last_access: 2019-02-01
tier: 3
source: ftp3.ceda.ac.uk/neodc/eocis/data/global_and_regional/sea_surface_temperature/
last_access: 2024-07-01
info: |
Download the data from:
lt/Analysis/L4/v01.1/
CDR_v3/Analysis/L4/v3.0.1/
Put all files under a single directory (no subdirectories with years).

ESACCI-WATERVAPOUR:
Expand Down
74 changes: 74 additions & 0 deletions esmvaltool/cmorizers/data/downloaders/datasets/esacci_sst.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
"""Script to download ESACCI-SST."""
# Import required python modules
import logging
import os

from datetime import datetime

from dateutil import relativedelta

from esmvaltool.cmorizers.data.downloaders.ftp import FTPDownloader

logger = logging.getLogger(__name__)


def download_dataset(config, dataset, dataset_info, start_date, end_date,
overwrite):
"""Download dataset.

Parameters
----------
config : dict
ESMValTool's user configuration
dataset : str
Name of the dataset
dataset_info : dict
Dataset information from the datasets.yml file
start_date : datetime
Start of the interval to download
end_date : datetime
End of the interval to download
overwrite : bool
Overwrite already downloaded files
"""
if start_date is None:
start_date = datetime(1980, 1, 1)
if end_date is None:
end_date = datetime(2021, 12, 31)

loop_date = start_date

user = os.environ.get("ceda-user")
if user is None:
user = str(input("CEDA user name? "))
if user == "":
errmsg = ("A CEDA account is required to download CCI SST data."
" Please visit https://services.ceda.ac.uk/cedasite/"
"register/info/ to create an account at CEDA if needed.")
logger.error(errmsg)
raise ValueError

passwd = os.environ.get("ceda-passwd")
if passwd is None:
passwd = str(input("CEDA-password? "))

downloader = FTPDownloader(
config=config,
server='ftp3.ceda.ac.uk',
dataset=dataset,
dataset_info=dataset_info,
overwrite=overwrite,
user=user,
passwd=passwd,
)

downloader.connect()
downloader.set_cwd('neodc/eocis/data/global_and_regional/'
'sea_surface_temperature/CDR_v3/Analysis/L4/v3.0.1/')

while loop_date <= end_date:
year = loop_date.year
month = loop_date.strftime("%m")
day = loop_date.strftime("%d")
downloader.download_folder(f'./{year}/{month}/{day}/')
loop_date += relativedelta.relativedelta(days=1)
10 changes: 8 additions & 2 deletions esmvaltool/cmorizers/data/downloaders/ftp.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,22 @@ class FTPDownloader(BaseDownloader):
overwrite : bool
Overwrite already downloaded files
"""
def __init__(self, config, server, dataset, dataset_info, overwrite):
def __init__(self, config, server, dataset, dataset_info, overwrite,
user=None, passwd=None):
super().__init__(config, dataset, dataset_info, overwrite)
self._client = None
self.server = server
self.user = user
self.passwd = passwd

def connect(self):
"""Connect to the FTP server."""
self._client = ftplib.FTP(self.server)
logger.info(self._client.getwelcome())
self._client.login()
if self.user is None:
self._client.login()
else:
self._client.login(user=self.user, passwd=self.passwd)

def set_cwd(self, path):
"""Set current working directory in the remote.
Expand Down
181 changes: 124 additions & 57 deletions esmvaltool/cmorizers/data/formatters/datasets/esacci_sst.py
Original file line number Diff line number Diff line change
@@ -1,45 +1,35 @@
"""ESMValTool CMORizer for ESACCI-SST data.

Tier
Tier 2: other freely-available dataset.
Tier 3: need to register at CEDA

Source
http://surftemp.net/regridding/index.html
https://catalogue.ceda.ac.uk/uuid/4a9654136a7148e39b7feb56f8bb02d2

Last access
20201214
20240628

Download and processing instructions
Download the following files:
Go to http://surftemp.net/regridding/index.html
and request regridded data with the following options:
Time Resolution: monthly
Longitude Resolution: 0.5
Latitude Resolution: 0.5
Start Date: 1982-01-01
End Date: 2019-12-31
Exclude data above sea ice threshold: True
(Threshold: 100 %)
Include post-hoc SST bias adjustments: True
Output Absolute or Anomaly SST: absolute
Generate Sea Ice Fraction: True
Error Correlation in Time (Days): 7
Error Correlation In Space (Degrees): 3.0

Modification history
20201204-roberts_charles: written.
20201214-predoi_valeriu: approved.
20201214-lauer_axel: approved.
A donwnloader is provided by ESMValTool. First you need
to register.
Go to https://services.ceda.ac.uk/cedasite/register/info/
and create an account at CEDA if needed.

"""

import copy
import glob
import logging
import os

import iris
from datetime import datetime
from esmvalcore.cmor.fixes import get_time_bounds
from esmvalcore.preprocessor import regrid
from esmvaltool.cmorizers.data import utilities as utils
from esmvalcore.preprocessor import concatenate

from ...utilities import (
convert_timeunits,
fix_coords,
fix_var_metadata,
save_variable,
Expand All @@ -49,49 +39,126 @@
logger = logging.getLogger(__name__)


def extract_variable(var_info, raw_info, attrs, year):
def extract_variable(raw_info):
"""Extract to all vars."""
rawvar = raw_info['name']
constraint = iris.NameConstraint(var_name=rawvar)
try:
cube = iris.load_cube(raw_info['file'], constraint)
except iris.exceptions.ConstraintMismatchError as constraint_error:
raise ValueError(f"No data available for variable {rawvar}"
f"and year {year}") from constraint_error

# Fix cube
fix_var_metadata(cube, var_info)
convert_timeunits(cube, year)
if rawvar == 'analysed_sst_uncertainty':
tmp_cube = iris.load_cube(raw_info['file'],
iris.NameConstraint(var_name='analysed_sst'))
ancillary_var = tmp_cube.ancillary_variable('sea_water_temperature'
' standard_error')
cube = tmp_cube.copy(ancillary_var.core_data())
else:
try:
cube = iris.load_cube(raw_info['file'], constraint)
except iris.exceptions.ConstraintMismatchError as constraint_error:
raise ValueError(f"No data available for variable {rawvar} in file"
f" {raw_info['file']}") from constraint_error

# Remove ancillary data
for ancillary_variable in cube.ancillary_variables():
cube.remove_ancillary_variable(ancillary_variable)
return cube


def get_monthly_cube(cfg, var, vals, raw_info, attrs,
inpfile_pattern, year, month):
data_cubes = []
month_inpfile_pattern = inpfile_pattern.format(
year=str(year)+"{:02}".format(month))
logger.info("Pattern: %s", month_inpfile_pattern)
inpfiles = sorted(glob.glob(month_inpfile_pattern))
if inpfiles == []:
logger.error("Could not find any files with this"
" pattern %s", month_inpfile_pattern)
raise ValueError
logger.info("Found input files: %s", inpfiles)

for inpfile in inpfiles:
raw_info['file'] = inpfile
logger.info("CMORizing var %s from file type %s", var,
raw_info['file'])
data_cubes.append(extract_variable(raw_info))

cube = concatenate(data_cubes)

# regridding from 0.05x0.05 to 0.5x0.5 (not for uncertainty field
if 'Stderr' not in var:
cube = regrid(cube, target_grid='0.5x0.5', scheme='area_weighted')

# Fix dtype
utils.fix_dtype(cube)
# Fix units
cmor_info = cfg['cmor_table'].get_variable(vals['mip'][0], var)
cube.convert_units(cmor_info.units)
# Fix metadata
fix_var_metadata(cube, cmor_info)
# Fix coordinates
fix_coords(cube)
cube.coord('time').long_name = 'time'
cube.coord('latitude').long_name = 'latitude'
cube.coord('longitude').long_name = 'longitude'
# Fix monthly time bounds
time = cube.coord('time')
time.bounds = get_time_bounds(time, vals['frequency'])

# set global attributes
set_global_atts(cube, attrs)
# add comment to tosStderr
if var == 'tosStderr':
cube.attributes['comment'] = ('Note that the variable tsStderr is an '
'uncertainty not a standard error.')

return cube


def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):
"""Cmorization func call."""
cmor_table = cfg['cmor_table']
glob_attrs = cfg['attributes']
glob_attrs = copy.deepcopy(cfg['attributes'])

# run the cmorization
for var, vals in cfg['variables'].items():
var_info = cmor_table.get_variable(vals['mip'], var)
glob_attrs['mip'] = vals['mip']
raw_info = {'name': vals['raw'], 'file': vals['file']}
inpfile = os.path.join(in_dir, cfg['filename'])
logger.info("CMORizing var %s from file type %s", var, inpfile)
years = range(1982, 2020)
months = ["0" + str(mo) for mo in range(1, 10)] + ["10", "11", "12"]
for year in years:
monthly_cubes = []
for month in months:
raw_info['file'] = inpfile.format(year=year, month=month)
logger.info("CMORizing var %s from file type %s", var,
raw_info['file'])
cube = extract_variable(var_info, raw_info, glob_attrs, year)
monthly_cubes.append(cube)
yearly_cube = concatenate(monthly_cubes)
save_variable(yearly_cube,
var,
out_dir,
glob_attrs,
unlimited_dimensions=['time'])
if not start_date:
start_date = datetime(vals['start_year'], 1, 1)
if not end_date:
end_date = datetime(vals['end_year'], 12, 31)
raw_info = {'name': vals['raw']}
inpfile_pattern = os.path.join(in_dir, '{year}*' + vals['filename'])
logger.info("CMORizing var %s from file type %s", var, inpfile_pattern)
mon_cubes = []
for year in range(start_date.year, end_date.year + 1):
logger.info("Processing year %s", year)
glob_attrs['mip'] = vals['mip'][0]
for month in range(start_date.month, end_date.month + 1):
monthly_cube = get_monthly_cube(cfg, var, vals, raw_info,
glob_attrs, inpfile_pattern,
year, month)
# Save daily data
save_variable(monthly_cube,
var,
out_dir,
glob_attrs,
unlimited_dimensions=['time'])
# Calculate monthly mean
if 'Stderr' not in var:
logger.info("Calculating monthly mean")
iris.coord_categorisation.add_month_number(monthly_cube,
'time')
iris.coord_categorisation.add_year(monthly_cube, 'time')
monthly_cube = monthly_cube.aggregated_by(
['month_number', 'year'],
iris.analysis.MEAN)
monthly_cube.remove_coord('month_number')
monthly_cube.remove_coord('year')
mon_cubes.append(monthly_cube)
# Save monthly data
if 'Stderr' not in var:
yearly_cube = concatenate(mon_cubes)
glob_attrs['mip'] = vals['mip'][1]
save_variable(yearly_cube,
var,
out_dir,
glob_attrs,
unlimited_dimensions=['time'])
mon_cubes.clear()
Loading