Various weather plots made using GMT

Joaquim · May 3, 2025, 11:07am

Thanks for the precipitation info.

That I understand perfectly, but they have that friendly info on the page describing the tool to read/convert grib data.

Maybe this info on the GDAL’s grib driver is of interest for you (to eventually avoid the 180E/W issue)

GRIB2 files without projection on lon/lat grids have the peculiarity of using longitudes in the [0,360] range and wrapping around the antimeridian as opposed to the usual wrapping around the prime meridian of other raster datasets. Starting with GDAL 3.4.0, when reading such files, a transparent conversion of the longitudes to [-180,180] will be performed and the data will be rewrapped around the prime meridian - the split&swap mode. This behavior can be disabled by setting the GRIB_ADJUST_LONGITUDE_RANGE configuration option to NO.

timhume · May 3, 2025, 11:07am

My scripts to do all the above operations are a real mess.

Joaquim · May 3, 2025, 11:09am

You need a clean GMT.jl solution And colab now supports Juliaas well.

timhume · May 3, 2025, 11:10am

There’s nothing stopping a GRIB file wrapping around anywhere - or to be global. The format dates back to the late 1980s; there have been two major versions since then, and GRIB-3 is on the horizon. It’s very efficient at transmitting weather data, and it’s standardised a lot. But it’s used virtually nowhere else other than in meteorology.

timhume · May 3, 2025, 11:17am

Another note regarding precipitation. There is a variable in the open data called tprate, which is the precipitation rate. This is an instantaneous variable that can be plotted without differencing accumulations. But it produces strange looking rainfall fields because it’s the precipitation rate at an instant in time, and over even short periods rainfall can vary enormously.

timhume · May 3, 2025, 9:42pm

This is how I now convert the GRIB data (using an ecCodes tool, plus the NetCDF operators) to NetCDF, and then extract the region of interest.

grib_to_netcdf "${grb_file}" -k 4 -d 1 -D NC_FLOAT -o "${nc_file}"
ncks -O --msa -d longitude,140.,179.75 -d longitude,-180.,-140. \
     -d latitude,-40.,0. "${nc_file}" "${nc_file}"
ncap2 -O -s 'where (longitude < 0) longitude=longitude+360' \
     "${nc_file}" "${nc_file}"

After these operations, the rest is done with GMT.

Joaquim · May 4, 2025, 9:48pm

I added in GMT.jl#master a era5() function that downloads reanalysis datasets (after registering and creating the ~/.cdsapirc as explained elsewhere) as simple as going into a era5 dataset page (like this one picked at random), fill the check boxes and click the “Show API request code”, click “Copy” an next just do

era5(cb=true)

and it will get that file for us.

PlanetGus · May 5, 2025, 8:21am

Hi there, I’m quite interested in this topic.
If you happen to converge toward a full gdal/gmt way to deal with GRIB2, I take it !

Joaquim · May 5, 2025, 10:19am

Simple. Extract last band.

grdconvert 20250502000000-102h-oper-fc.grib2=gd+b156 -Gv.grd

PlanetGus · May 5, 2025, 11:08am

But that doesn’t cover the full file.
Last time I had to deal with GRIB2, most of the script was to deal with metadata to organize the bands and create a somewhat well structured 3D (2D+time) matrix that I could use.

The message (dis)order of the GRIB is so messed up that I started to check whether zonal and meridional winds with the same forecast time were actually in consecutive bands or not.

Plus, it is not obvious to know with certainty that GDAL has completely read the file: In the example above there are 157 layers, but there’s no information to tell you “you are expected to read 157 bands”. Right ?

Joaquim · May 5, 2025, 11:38am

I have no previous experience with GRIB files so I mostly do not understand what you are asking.

I guess that since each forecast file is for a single time, that requires downloading many of those 120 MB files.

Sorry, don’t get this. You read what ever band you want. And I would like to know if all of those gribs have always 157 bands and written by the same order.

PlanetGus · May 5, 2025, 12:30pm

@timhume will correct me if I am wrong:

The layers (or “bands” or 2D fields) are appended in the grib files one after another regardless of the chronological order or the z-level.

Let’s say you are interested in the Sea-Surface Temperature at H+24.

The file will have many metadata among which the usual coordinates’ system, the variable description (here SST) and the area of interest (say global)

In that scenario, you will have 1 band with :

an integer : absolute ref time (today)
an integer : relative forecast time (+24H)
a matrix : SST 2D field.

Now, if I want two forecasts, there’s no telling in which order they will appear. For example, H+24 can be band 2 and H+12 band 1. This is messed up, but manageable.

The trick is, if I am providing you a file, you have no way to know for sure how many bands there are. You just “hope” that GDAL read the file completely (which is not a given).

For some reason, ecCodes seems to handle correctly both the number of bands and reorganise the forecast in chronological order (at least within an array in Python).

It would be nice if grdconvert were able to do the same when converting to NetCDF.

timhume · May 5, 2025, 6:01pm

The important thing to note about the GRIB files from ECMWF (or anywhere else for that matter) is they are a collection of GRIB records.

The GRIB record is the important thing. The files are simply a collection of records (usually valid at a particular time) that have (and I’m not kidding) been concatenated together (cat record1 record 2 … record n > grib_file , or something equivalent). There is no requirement for the records to be concatenated together in any particular order.

So if you’ve got a file containing more than one GRIB record, the first thing one needs to do is ascertain what each of these records are. Typically I do it with something like this:

grib_ls gribfile

(using ecCodes from ECMWF) or

wgrib2 gribfile

(using wgrib2 from NCEP).

But there’s another way. If you look at many places disseminating GRIB, they often include index files that specify the contents of each file. But the format of said index files is not standard.

I need to examine what grdconvert can do - I was totally unaware it could handle GRIB. Now I’m really curious and will experiment and report what I see.

Joaquim · May 6, 2025, 5:16pm

Do you happen to know if it is possible to make the ecmwf.opendata py script do just a dry run and show only what is(are) the HTTP request(s)?

And also, abusing of your patience, WTF is the step? More I read about it more confused I get.

timhume · May 7, 2025, 2:17am

I don’t know if ECMWF.opendata python code can do a dry run and show the HTTP requests.

Regardless of how ECMWF.opendata works under the hood, there are two ways to progress on this matter. I’ll explain the first way (the easiest) here, and if time permits, the second way in a subsequent post.

Firstly, if you look at the Open Data directory you’ll see each GRIB file has a corresponding index file:

Name                                  Creation date             Size            Id
20250506000000-0h-oper-fc.grib2       06-05-2025 08:34     120768148     143092820
20250506000000-0h-oper-fc.index       06-05-2025 08:34         34689     143092825
20250506000000-102h-oper-fc.grib2     06-05-2025 08:34     127154353     143093911
20250506000000-102h-oper-fc.index     06-05-2025 08:34         34453     143093914
20250506000000-105h-oper-fc.grib2     06-05-2025 08:34     127594309     143093928
20250506000000-105h-oper-fc.index     06-05-2025 08:34         34454     143093931
20250506000000-108h-oper-fc.grib2     06-05-2025 08:34     127710533     143093950

These .index files are just text files which list the individual records in the GRIB file. Here’s an example from the 20250506000000-24h-oper-fc.index file:

/Users/tim/Downloads> head 20250506000000-24h-oper-fc.index
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "24", "levtype": "sfc", "param": "tcwv", "_offset": 0, "_length": 693873}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "levtype": "sfc", "step": "24", "param": "ssr", "_offset": 693873, "_length": 1308109}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "24", "levelist": "3", "levtype": "sol", "param": "vsw", "_offset": 2001982, "_length": 496249}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "24", "levelist": "150", "levtype": "pl", "param": "q", "_offset": 2498231, "_length": 593105}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "levtype": "sfc", "step": "24", "param": "str", "_offset": 3091336, "_length": 1315885}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "24", "levelist": "1", "levtype": "sol", "param": "sot", "_offset": 4407221, "_length": 628449}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "step": "24", "levelist": "4", "levtype": "sol", "param": "vsw", "_offset": 5035670, "_length": 425364}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "levtype": "sfc", "step": "24", "param": "ttr", "_offset": 5461034, "_length": 1233636}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "levtype": "sfc", "step": "24", "param": "10fg", "_offset": 6694670, "_length": 1414980}
{"domain": "g", "date": "20250506", "time": "0000", "expver": "0001", "class": "od", "type": "fc", "stream": "oper", "levtype": "sfc", "step": "24", "param": "ewss", "_offset": 8109650, "_length": 356723}

In particular, the _offset and _length fields tell the reading software where in the file this particular record begins, and how long it is. For example, parameter 10fg (wind gust speed at 10 m above the surface) starts at offset 6,694,670 bytes, and the record is 1,414,980 bytes long.

With this information, one can slice particular fields out of the big files, and save on downloads too.

timhume · May 7, 2025, 2:29am

The second method requires a more intimate understanding of GRIB records.

Each GRIB record is broken up into a number of sections. The first section (marking the beginning of the record), begins with the four octets G R I B. Then there are sections containing various metadata describing the properties of the data being transmitted. Finally, there is a big section with the actual data, usually compressed using one of a number of methods. Well actually, there is an ending section with the octets 7 7 7 7 which is after the data section.

Most sections of a GRIB record also contain octets which tell how long the section is. So one can seek through sections rapidly without having to read all the octets in them if one doesn’t need to.

With this knowledge, a GRIB decoder can first search for the beginning of a new GRIB record. In a file the first record usually begins at the first byte. But sometimes one will find a few extra bytes preceding the first GRIB record which have been added as the message is routed through the Internet or other communications network.

Having found G R I B, the decoder can then proceed to read the metadata sections (or just enough of the metadata sections as is required). The decoder can then decide “is the user interested in this record?”. If so, it will then proceed to read the data section, otherwise it can seek over the current record and start looking for the next GRIB record in the file or data stream (which usually is straight after the preceding record).

But you can see how this task rapidly becomes bigger than Ben Hur. Fortunately ECMWF have done the hard work by providing the .index files. NCEP also provide index files, but in a different format. These index files are not standardised in any way that I am aware of.

timhume · May 7, 2025, 2:32am

step is the model time step. In the case of the ECMWF data, each time step usually refers to hours since the model’s beginning time. It’s not usually the actual time step at which the numerical integrations are done at (these are usually just a few seconds long), but the step at which data products are disseminated - typically hourly (but other frequencies do exist).

And then there are complications (did you really expect it to be as simple as above?). For example, some products are accumulations or temporal averages or similar. And not always accumulations from t=0. Sometimes accumulations reset periodically (e.g. every 24 hours).

timhume · May 7, 2025, 2:40am

If all this seems horrendously complicated, it is, but it was done for good reasons. In the dark ages we didn’t have rapid retrieval of data files off SSD or whatever. And networks were s.l.o.w.

When I first started using ECMWF data, the data were sent to us in boxes of 9-track tapes. From memory, one could seek forwards or backwards on the tape faster than reading the whole contents (my memory may be wrong - relying on tape is something one is grateful to forget about).

I still remember ECMWF telling us that if we didn’t return the boxes they sent the tapes in, they were going to charge some horrendous amount for non-return of the boxes! Returning the boxes from New Zealand to Britain was a non-trivial exercise, but ECMWF’s penalty fee for non-return was considerably more than international postage of empty boxes (and 9-track tape boxes were not small affairs).

Joaquim · May 7, 2025, 7:58pm

Thanks a lot for all those (+ history). My grip with the step is on how to test what is a valid request and not. I now have a minimum testing but it certainly needs more thought.
I had learned about the byte-range request and actually managed to implement it for the surface variables.

If you want to test the GMT.jl master (] add GMT#master) you have a ecmwf() function that downloads both ERA5 and Forecast data. The Forecast is more rudimentary and currently only works when one declare one or more surface variables. It then downloads them to separate grib files. I tested this and worked

ecmwf(:forecast, date="20250506", var=["10u", "2t"])

(need to update date ofc). I should work also if date is not provided but I don’t know how to test if data is already available in the servers.

Typing ? ecmwf on the Julia command line prints a on-line help (a bit rudimentary right now).

The byte-range solution is powerful but I’m still quite lost on how to deal with all possibilities (comments are welcome). One dream possibility is to make it work with GDAL’s /vsicurl. If that is possible than it should be possible to save files directly in netCDF and even be able to use a -R

timhume · May 7, 2025, 8:08pm

What you’re trying to do is quite difficult. For the three-dimensional fields you need to stack the layers in the correct order. In the ECMWF index files there are fields called levtype and levelist which provides the information you need.

I fully support the idea of minimising dependencies. But I then gave up and reverted to using the ecCodes grib_to_netcdf utility:

grib_to_netcdf "${grb_file}" -k 4 -d 1 -D NC_FLOAT -o "${nc_file}"

because that solved my immediate problems, at the expense of ugliness.