I’ve been using the WOA23 (World Ocean Atlas 2023) fields to generate several oceanographic figures, and it got me thinking that it would be extremely useful to have some of these variables available directly through GMT’s remote-datasets (and, naturally, accessible via the PyGMT dataset interface as well).
and, at first glance, it seems reasonably straightforward to implement. However, given the workflow and the fact that this doesn’t look like something commonly added through regular issues/pull requests, I’m not entirely sure whether this is the intended or feasible pathway for contributing new datasets.
So my question is: Is there community or core-team interest in expanding the set of remote datasets—specifically to include something like WOA23 fields?
If so, I’d be happy to help outline the structure or even assist with the initial preparation.
I think it’s a good dataset to add. I’m willing to help. But first (and fundamentally), we need to make sure that the data has a license that allows for its free distribution. Then, we would need to see if there would be any problem with uploading that data to the GMT data servers (this would only be a problem if the dataset is very large, several GB).
Hi @Esteban82
Thanks for your comment.
According to the official documentation (WOA23 Product Documentation, global NetCDF attribute license), the data are “openly available to the public” and redistribution is permitted as long as proper acknowledgment to NOAA/NCEI is included (info in page 20).
Regarding dataset size: the full WOA23 collection is indeed large, given multiple variables, depth levels, grid resolutions (1° and 0.25°), and time spans. However, individual NetCDF files, especially annual climatologies for temperature or salinity, are typically modest in size (tens to a few hundred MB). If GMT chooses to include only a carefully selected subset (e.g., annual T/S climatologies), hosting them on the GMT servers should not be a problem. I can help identify and prepare the most appropriate subset for integration.
Let me know the preferred format and granularity, and I’ll assist with preprocessing and metadata adaptation.
First, obrigado André, for offering to dive into a sinuous path. I think that, at least a part of this dataset (I’m thinking immediately on the temperature) would be interesting to have accessible via our download machinery.
The data size does not seem that big to raise size problems, it’s the large amount of variables and layers that raise me some concern. Even the highest resolution will not need to be mosaiced (we start mosaicing at 5 arc min and the highest res of WOA23 is 15 arc min), so that simplifies the process. But we need to expand the naming convention (that @name_res_...) to be able to accommodate the new product, variables and depth of this dataset.
Finally, it must be checked if that name convention extension needs to be supported by code changes.
That makes sense. To keep things simple and avoid an explosion of combinations, I’d suggest a very small first step focused only on annual surface fields:
annual SST (sea surface temperature)
annual SSS (sea surface salinity)
for one or two resolutions (e.g. 1° and 0.25°). These could be exposed through very simple aliases such as:
@woa23_annual_sst_1deg, @woa23_annual_sst_0.25deg
@woa23_annual_sss_1deg, @woa23_annual_sss_0.25deg
That would give users immediate value without requiring them to think about the full WOA23 naming scheme.
In a second step, if this works well, we could prototype how a more general convention might look for multi-depth access, something along the lines of:
@woa23_t_0.25deg_depth10 (temperature, 0.25°, ~10 m level)
@woa23_s_1deg_depth100 (salinity, 1°, ~100 m level)
This would serve as a mock/testbed to see what needs to change in the @name_res_... parsing and whether GMT/PyGMT need any code updates to support that extra “depth” dimension.
If this phased approach sounds reasonable, I can help identify the exact WOA23 files for the annual SST/SSS fields and sketch a more concrete mapping from the WOA23 filenames to these GMT aliases.
first and foremost, thanks for bringing WOA 2023 to my attention! My 5 cents is below.
Besides the number of layers and variables (depths, years, months, parameters - temperature, salinity, dissolved oxygen, dissolved nutrients), WOA has quite an update history (a new release like every 5 years). It is based on the data from the ongoing measurements using profiling floats so it will be regularly updated in the future. While T and Salinity are quite stable, new parameters are also added, dissolved nutrients are very new. Might be useless to stick to a specific release or version.
Could be used to teach students how to retrieve and work with multilayered data rather than having yet another remote dataset like all DEMs.
Even without looking the code (and would need to find out where it is), our naming conventions is (example)
@earth_synbath[_rru[_reg]]
and my strong guess is that the last two optional bits, _rru and _reg must remain as is. That means all new name variability must be set at the beginning of the name (the earth_synbath part). So, your example for SST & ``SSS` would become
@woa23_sst_annual_01d, @woa23_sst_annual_15m
and similar for other variables. But we need to be able to distinguish what annual means. They have several depending on the periods used to compute it.
A issue is that I cannot make sense on the WOA23 naming convention. Really, I cannot see how woa23_[DECA]_[v][tp]_[gr].[form_end] can give a file name like woa23_decav81B0_t00_04.nc
Thanks for the great points and I fully agree. Given how frequently WOA is updated, and how many layers/variables it includes, it probably makes sense to keep things simple.
From my perspective, the annual SST and SSS fields are already more than enough for most quick-look maps (the classic Figure 1 in many papers). They’re stable, lightweight, and very useful for teaching and rapid visualization. That’s the main point to have datasets.
But, I’d be happy to collaborate. I already have some teaching material in Portuguese using PyGMT + WOA, and it would be easy to adapt it to English. We could place everything in a small GitHub repo with clear examples on how to retrieve and plot these fields.
If you think this is a good direction, I can share an initial structure and contribute the first examples.
Just a quick clarification about the WOA23 naming scheme.
The pattern used in the documentation is simple. woa23_decav81B0_t00_04.nc means:
decav81B0 → the 1981–2010 climatological period
t → temperature
00 → annual
04 → 0.25° grid
.nc → NetCDF file
It looks cryptic at first, but it is consistent once you decode the pieces (the DECA codes are the least intuitive part). That said, I completely agree that we do not need the full WOA23 dataset inside GMT. For most use cases, especially general mapping and teaching, the only fields people regularly ask for are annual SST and annual SSS. These are the “Figure-1” type products everyone uses for quick visualizations and cruise planning.
So a very simple GMT naming such as:
@woa23_sst_annual_01d
@woa23_sss_annual_01d
would already cover the vast majority of needs (I would say 99% haha), without importing all variables, depths, months, decades, nutrients, oxygen, etc. If useful, we can later add one or two more fields, but keeping this minimal, stable subset seems the most practical path.
Yes, I could decode it too but how can we know what DECA should contain (the site says only: [DECA] - decade)? In this case it seems to be decav81B0 but cryptic is indeed a good description.
Regarding Mikahil remark, I also thought that the best would be if we could jump the data hosting completely and only kind of translate their naming convention to ours but the problem is that netCDF files cannot be partially acceded (without OPeNDAP). By this I mean, requesting sub-regions, layer names, etc.
They apparently provide OPenDAP access. I never used it and don’t know what it is, but here’s an example link to the dissolved silicate data I’ve been looking at Catalog Services
They list HTTTP and OPenDAP as the methods of data access. Also NetcdfSubset (no idea what it is)
I want to get dissolved silicate at 1000 m depth, using gdal_translate it is i_mn (i for silicate, they decided to use single letters vs meaningful names and s was already occupied by salinity, i_mn is the mean value). Band 47 is 1000 m depth. (band 1 is surface):
Not surprised that they provide a OPenDAP access but the problem is that we cannot use it from GMT. It would need a netcdf lib built with OpenDAP support and I don’t know how much more.
Absolutely. What I meant is a parameter is a dataset inside a WOA .nc datafile, depths are fixed and represented as bands, at least for dissolved silicate.