Pygmt grdfilter: calculations will be distributed over 1 threads

Many of us are running lots of stuff simultaneously (think the 1200 tests that cmake runs in parallel) and it does not work to turn on multi-thread when you are running jobs on all your cores anyway. So since only a tiny subset of modules benefit from threads we decided to make that an optional setting.

We could consider a GMT_PARALLEL = on|off and then if on then -x is added by default, otherwise not. Default would be off. Or maybe better an integer which if not 0 gets added to -x. Then you could use -4 to only use all minus 4 cores.

That would be nice. GMT_PARALLEL = on|off (or similar) would be a setting in gmt.conf I assume?

Anyway there seems to be more to it. gmtinit_parse_x_option is under a #ifdef GMT_MP_ENABLED condition, but grdfilter threads are HAVE_GLIB_GTHREAD controled, so on a quick look I don’t even see how grdfilter -x does even work.

Well, grdfilter is currently keeping 12/12 cores busy in my machine, so something is working.

Then, the best would be to set it to ON by default and the test suits (that only us use) would set it to OFF. But the default should be n-1.

This will be post 6.5 but I will see if I can fix the MP vs THREAD stuff. But if @Andreas is running it now then perhaps that is not a stopper. @Andreas, please run it twice via time

time gmt grdfilter ...... no -x
time gmt grdfiler  -x

to see if it really works.

Will do.

Results are in:

Without -x:

$ time gmt grdfilter klipp.tif -D0 -Fg400+h -V -Gfilt.nc
[...]
grdfilter [INFORMATION]: Calculations will be distributed over 1 threads.
[...]
real	8m56.352s
user	8m56.288s
sys 	0m0.028s

With -x:

$ time gmt grdfilter klipp.tif -D0 -Fg400+h -V -Gfilt.nc -x
[...]
grdfilter [INFORMATION]: Calculations will be distributed over 12 threads.
[...]
real	2m16.788s
user	24m4.359s
sys 	0m1.048s

We have this actually:

#if defined(HAVE_GLIB_GTHREAD) || defined(_OPENMP)
/* This means we should enable the -x+a|[-]<ncores> common option */
#define GMT_MP_ENABLED
#endif

which is why it works.

On my macOS, filtering all of SRTM15 grid to 1x1 degree with -x gives

real 0m52.513s
user 2m13.858s
sys 0m2.611s

while no -x gives

real 1m59.279s
user 1m56.418s
sys 0m2.740s

That is not very impressive. While we dont have anOpenMP grdfilter to compare with I am pretty sure other -x modules has given may much higher speedups, like 4x with 10 cores. Of course, this is a huge file so maybe related to other things.

I remember that I had a lot of work parallelizing grdfilter.I doubt it could have been achieved with simple #omp pragmas. The parallel code cuts the grid in parallel stripes, with padding determined by filter width, process each in parallel and joins the filtered chunks, dropping the padded zones as needed.

Yep, tricker than most row-by-row calls for sure. But I was not impressed with my test. Would be fun if your Dell could run

time gmt grdfilter*SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -x -V

and compare with

time gmt grdfilter*SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gt.grd -V

Where is it?

Well, you cannot use @earth_relief_15s since that would stitch the planet. Our earth_relief.recipe says

# SRC_FILE=https://topex.ucsd.edu/pub/srtm15_plus/SRTM15_V2.5.5.nc

The comparison we want should not be done with time because it includes the time to read the 6.2 GB file, which is non negligible. What we want is the output of -Vt, a here the difference is substantial.

$ time gmt grdfilter SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -x -Vt
Elapsed time 00:00:18.125 | (grdfilter) |

real    1m2.701s
user    0m0.015s
sys     0m0.000s

j@dell-from-hell MINGW64 /c/v
$ time gmt grdfilter SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -Vt
Elapsed time 00:01:42.665 | (grdfilter) |

real    2m27.335s
user    0m0.000s
sys     0m0.031s

Very good point, and that is much better!

Unfortunately, gotta read the file though and yes that is slow and requires lots of memory.