Pygmt grdfilter: calculations will be distributed over 1 threads

pwessel · September 20, 2023, 10:10am

Many of us are running lots of stuff simultaneously (think the 1200 tests that cmake runs in parallel) and it does not work to turn on multi-thread when you are running jobs on all your cores anyway. So since only a tiny subset of modules benefit from threads we decided to make that an optional setting.

We could consider a GMT_PARALLEL = on|off and then if on then -x is added by default, otherwise not. Default would be off. Or maybe better an integer which if not 0 gets added to -x. Then you could use -4 to only use all minus 4 cores.

Andreas · September 20, 2023, 10:22am

That would be nice. GMT_PARALLEL = on|off (or similar) would be a setting in gmt.conf I assume?

Joaquim · September 20, 2023, 10:28am

Anyway there seems to be more to it. gmtinit_parse_x_option is under a #ifdef GMT_MP_ENABLED condition, but grdfilter threads are HAVE_GLIB_GTHREAD controled, so on a quick look I don’t even see how grdfilter -x does even work.

Andreas · September 20, 2023, 10:32am

Well, grdfilter is currently keeping 12/12 cores busy in my machine, so something is working.

Joaquim · September 20, 2023, 10:33am

Then, the best would be to set it to ON by default and the test suits (that only us use) would set it to OFF. But the default should be n-1.

pwessel · September 20, 2023, 10:44am

This will be post 6.5 but I will see if I can fix the MP vs THREAD stuff. But if @Andreas is running it now then perhaps that is not a stopper. @Andreas, please run it twice via time

time gmt grdfilter ...... no -x
time gmt grdfiler  -x

to see if it really works.

Andreas · September 20, 2023, 10:59am

Will do.

Andreas · September 20, 2023, 11:17am

Results are in:

Without -x:

$ time gmt grdfilter klipp.tif -D0 -Fg400+h -V -Gfilt.nc
[...]
grdfilter [INFORMATION]: Calculations will be distributed over 1 threads.
[...]
real	8m56.352s
user	8m56.288s
sys 	0m0.028s

With -x:

$ time gmt grdfilter klipp.tif -D0 -Fg400+h -V -Gfilt.nc -x
[...]
grdfilter [INFORMATION]: Calculations will be distributed over 12 threads.
[...]
real	2m16.788s
user	24m4.359s
sys 	0m1.048s

pwessel · September 20, 2023, 11:30am

We have this actually:

#if defined(HAVE_GLIB_GTHREAD) || defined(_OPENMP)
/* This means we should enable the -x+a|[-]<ncores> common option */
#define GMT_MP_ENABLED
#endif

which is why it works.

pwessel · September 20, 2023, 12:17pm

On my macOS, filtering all of SRTM15 grid to 1x1 degree with -x gives

real	0m52.513s
user	2m13.858s
sys	0m2.611s

while no -x gives

real	1m59.279s
user	1m56.418s
sys	0m2.740s

That is not very impressive. While we dont have anOpenMP grdfilter to compare with I am pretty sure other -x modules has given may much higher speedups, like 4x with 10 cores. Of course, this is a huge file so maybe related to other things.

Joaquim · September 20, 2023, 12:32pm

I remember that I had a lot of work parallelizing grdfilter.I doubt it could have been achieved with simple #omp pragmas. The parallel code cuts the grid in parallel stripes, with padding determined by filter width, process each in parallel and joins the filtered chunks, dropping the padded zones as needed.

pwessel · September 20, 2023, 2:16pm

Yep, tricker than most row-by-row calls for sure. But I was not impressed with my test. Would be fun if your Dell could run

time gmt grdfilter*SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -x -V

and compare with

time gmt grdfilter*SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gt.grd -V

Joaquim · September 20, 2023, 2:19pm

Where is it?

pwessel · September 20, 2023, 2:31pm

Well, you cannot use @earth_relief_15s since that would stitch the planet. Our earth_relief.recipe says

# SRC_FILE=https://topex.ucsd.edu/pub/srtm15_plus/SRTM15_V2.5.5.nc

Joaquim · September 20, 2023, 2:57pm

The comparison we want should not be done with time because it includes the time to read the 6.2 GB file, which is non negligible. What we want is the output of -Vt, a here the difference is substantial.

$ time gmt grdfilter SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -x -Vt
Elapsed time 00:00:18.125 | (grdfilter) |

real    1m2.701s
user    0m0.015s
sys     0m0.000s

j@dell-from-hell MINGW64 /c/v
$ time gmt grdfilter SRTM15_V2.5.5.nc -Fg330 -rg -D1 -I1 -Gtx.grd -Vt
Elapsed time 00:01:42.665 | (grdfilter) |

real    2m27.335s
user    0m0.000s
sys     0m0.031s

pwessel · September 20, 2023, 4:35pm

Very good point, and that is much better!

pwessel · September 20, 2023, 4:36pm

Unfortunately, gotta read the file though and yes that is slow and requires lots of memory.