Many of us are running lots of stuff simultaneously (think the 1200 tests that cmake runs in parallel) and it does not work to turn on multi-thread when you are running jobs on all your cores anyway. So since only a tiny subset of modules benefit from threads we decided to make that an optional setting.
We could consider a GMT_PARALLEL = on|off and then if on then -x is added by default, otherwise not. Default would be off. Or maybe better an integer which if not 0 gets added to -x. Then you could use -4 to only use all minus 4 cores.
Anyway there seems to be more to it. gmtinit_parse_x_option is under a #ifdef GMT_MP_ENABLED condition, but grdfilter threads are HAVE_GLIB_GTHREAD controled, so on a quick look I don’t even see how grdfilter -x does even work.
This will be post 6.5 but I will see if I can fix the MP vs THREAD stuff. But if @Andreas is running it now then perhaps that is not a stopper. @Andreas, please run it twice via time
time gmt grdfilter ...... no -x
time gmt grdfiler -x
$ time gmt grdfilter klipp.tif -D0 -Fg400+h -V -Gfilt.nc
[...]
grdfilter [INFORMATION]: Calculations will be distributed over 1 threads.
[...]
real 8m56.352s
user 8m56.288s
sys 0m0.028s
With -x:
$ time gmt grdfilter klipp.tif -D0 -Fg400+h -V -Gfilt.nc -x
[...]
grdfilter [INFORMATION]: Calculations will be distributed over 12 threads.
[...]
real 2m16.788s
user 24m4.359s
sys 0m1.048s
On my macOS, filtering all of SRTM15 grid to 1x1 degree with -x gives
real
0m52.513s
user
2m13.858s
sys
0m2.611s
while no -x gives
real
1m59.279s
user
1m56.418s
sys
0m2.740s
That is not very impressive. While we dont have anOpenMP grdfilter to compare with I am pretty sure other -x modules has given may much higher speedups, like 4x with 10 cores. Of course, this is a huge file so maybe related to other things.
I remember that I had a lot of work parallelizing grdfilter.I doubt it could have been achieved with simple #omp pragmas. The parallel code cuts the grid in parallel stripes, with padding determined by filter width, process each in parallel and joins the filtered chunks, dropping the padded zones as needed.
The comparison we want should not be done with time because it includes the time to read the 6.2 GB file, which is non negligible. What we want is the output of -Vt, a here the difference is substantial.
$ time gmt grdfilter SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -x -Vt
Elapsed time 00:00:18.125 | (grdfilter) |
real 1m2.701s
user 0m0.015s
sys 0m0.000s
j@dell-from-hell MINGW64 /c/v
$ time gmt grdfilter SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -Vt
Elapsed time 00:01:42.665 | (grdfilter) |
real 2m27.335s
user 0m0.000s
sys 0m0.031s