Replacing GDAL with GMT commands due to performance issues

Dear all,

I was wondering if a combination of GMT commands could be used instead of the following code that uses GDAL:

levels="1 2 3 4 5 6 7 8 9 10"

for type in var1 var2 var3
do
  for level in $levels
  do
    gdalmdimtranslate $dirnc/mync.nc4 tmp_${level}.tif -subset 'depth('$level')' -array $type > /dev/null
  done
done

For instance, I have let’s say 10 levels stored in a NetCDF file that are sliced based on different variables (ie. var1, var2, var3). The problem is the performance of this process, it takes too long with the real files (5-variables, 50-levels NetCDF files, approximately 2 hours working on each NetCDF file). I am running this process under Linux.

Thanks for any suggestions.

I am not an expert on GMT or GDAL, but I think GMT programs call GDAL library functions to read and write from NetCDF files that are not in one of the standard GMT formats. This means that it may not be any faster to use GMT programs than the GDAL “gdalmdimtranslate” program.

Since you are calling the “gdalmdimtranslate” program many times in a loop, each time it has to open the file and go to the layer you request. Doing a similar loop with GMT programs would incur the same performance penalty. I expect you could get better performance by using GDAL functions directly from a custom Python or C/C++ program that opens the file once and then extracts the subsets. This might allow the program to keep the file in memory for all the extractions if you have a lot of memory available.

It also depends to some extent on the internal structure of the NetCDF file. We have been working on optimizing the internal structure of the NISAR data products that are NetCDF/HDF5 files, and it can make a big difference if the structure is optimized.

This looks more a question to the gdal-dev list.
But no, GMT does not use GDAL to read netCDF files unless it is explicitly instructed to do so with the =gd file name suffix or the use of grdgdal module. The CookBook has examples on how to read more elaborated nc file structures, though for complex structures things can be challenging.

Thanks Eric and Joaquim for your answers and ideas, as Eric said this is the actual issue:

I have been searching something neat like grdtrack, I think this is one of the nicest tools GMT has, simply with grdtrack profile.d -Gmync.nc?var1 I can get what I am looking for, but this happens only for the first depth level, the goal is to get all 50 levels at once “magically”. I am doing some tests with xarray (it seems that ncks/cdo have some tricks for it). I thought I could get some good results with grdcut and/or grdinterpolate but so far I don’t get what I need.

Please if you have further ideas it’d be great to know them, thanks again.

I don’t think the time taken to open a file plays any relevant role here but without a file to test no one can help debug this delay.

See the docs (again, CookBook has relevant info) ...?var1[9] opens 10th layer of var1.

I didn’t spot that in the grdtrack manual in v6.4.0:

-G<ingrid>[=<ID>|?<varname>][+b<band>][+d<divisor>][+n<invalid>][+o<offset>][+s<scale>]
     Name of grid to sample. Optionally append =<ID> for reading a specific file format or ?<varname> for a specific netCDF variable, and add any modifiers:
       +b Select a band (for images only) [0]
       +d Divide data values by the given <divisor> [1]
       +n Replace data values matching <invalid> with a NaN.
       +o Offset data values by the given <offset> [0].
       +s Scale data values by the given <scale> [1].
     Note: Any offset is added after any scaling.
     Note: Repeat -G for as many grids as you wish to sample, or use -G+l<list> to pass a list of grid file names. If the file is a Sandwell/Smith Mercator grid (IMG format) then more information is required: <imgfile>,⏎
     …<scale>,<mode>[,<maxlat>].
     Give filename and append comma-separated scale, mode, and optionally max latitude. The scale (typically 0.1 or 1) is used to multiply after read; give mode as follows:
       0: img file with no constraints coded, interpolate to get data at track.
       1: img file with constraints coded, interpolate to get data at track.
       2: img file with constraints coded, gets data only at constrained points, NaN elsewhere.
       3: img file with constraints coded, gets 1 at constraints, 0 elsewhere.
     For mode 2|3 you may want to consider the -n+t<threshold> setting.

or that option is available in the latest version 6.6.0? using ?var1[9] gives 0 values, please find here these files for testing Unique Download Link | WeTransfer

Modifiers for COARDS-compliant netCDF files

thanks for the link, yes it was my mistake, I was not searching in the right place. Still, 0 is the output of ?var1(9) or ?var1[9] with mync.nc file, does it work with 6.4.0? it’d be nice to know before upgrading to the latest version.

I can read that file easily (and very fast) with Mirone, Either Mirone or gdalinfo show that variables are 4D

Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"mync.nc":uo
  SUBDATASET_1_DESC=[1x50x241x361] eastward_sea_water_velocity (32-bit floating-point)

This works fine and instantaneously

grdconvert mync.nc?uo[0,9] -Guo_10.grd
1 Like

cool! so, values of the 10th depth index of a variable are obtained with ?var[0,9], that’s nice, thanks a lot Joaquim. I tried using depth levels instead but it seems that parentheses are not escaped correctly:

grdconvert mync.nc?uo(0.494024991989,11.404999733) -Guo_10.grd
bash: syntax error near the unexpected `('

It seems that single or double quotes are not useful here.

it’s gmt grdconvert:

gmt grdconvert "mync.nc?uo(0.494024991989,11.404999733)" -Guo_10.grd
ls  -l uo_10.grd 
-rw-rw-r-- 1 mkononets mkononets 188066 Apr 19 02:39 uo_10.grd
1 Like

well spotted!! some more coffee is needed around here, thanks a lot!!

Why quotes? It works fine for me without any quotes

Quotes are needed as unescaped round parentheses in bash is a subshell call syntax.

1 Like