Pygmt grdfilter: calculations will be distributed over 1 threads

G’Day Everyone, I’m running a script using the following on an nc output from a pygmt.surface function.

pygmt.grdfilter(
    grid=outpath1,
    outgrid=outpath2,
    filter="c100",
    distance="0.003",
    region=[xmin, xmax, ymin, ymax],
    spacing="0.0003",
    verbose=True
)

However, when it runs, it runs super slow, when I set verbose to true I got this info: “grdfilter [INFORMATION]: Calculations will be distributed over 1 threads.”. I guess its running slow because it’s running on 1 thread. Is it possible to increase the threads the grdfilter function is running on to speed things up?

Thanks

Hmm, usually you can use cores or (-x in GMT) to use multiple CPU cores, but grdfilter doesn’t seem to support it unfortunately.

That said, I found a note in the GMT 5.2 changelog at https://docs.generic-mapping-tools.org/6.4/changes.html#new-common-options-in-gmt-5-2 that mentions grdfilter support -x, and there’s even a comment in the source code at https://github.com/GenericMappingTools/gmt/blob/01a5215286a4236bed1b268f4f072484937f3a3f/src/grdfilter.c#L18-L45 (dating back to 2010) that suggests this was possible? @pwessel @Joaquim, do you know how to enable this?

I think the issue is that a long time ago, @Joaquim paralellized it using gthreads from GLIB since OpenMP on Windows is a laugh. So GMT needs to be built accordingly.

Is there an overview of which gmt modules can use parallelization?

I’ve enabled OpenMP and threads for many years (why not utilize the stuff you’ve spent money on), but I rarely (maybe never?) see that more than one CPU having (much) load during gmt work.

-DGMT_ENABLE_OPENMP=ON
-DGMT_USE_THREADS=ON

No overview I think. You can see if a module is OpenMP if it has the -x option. That option is only available if built with OpenMP (except for movie which does its own parallelisation)…

One of the problems is that OMP is a lough anywhere. I implemented the gthreads only in grdfilter and gmt|grdokabe. Parallelization, if even possible, is a hard job. We need also a pre step of detecting the bottlenecks. The linux prof tool seems appropriate.

Great. Thanks both of you.

FWIW, we’ve enabled OpenMP builds for the GMT package on conda-forge (see https://github.com/conda-forge/gmt-feedstock/issues/262), and if you install build 14 or after for GMT 6.4.0, or get the GMT 6.5.0 devel version, that should allow you to use multiple cores. If you’re on Linux, that would mean an installation like this:

conda install gmt=6.4.0=hb5fd6f7_14

To double-check, run gmt-config --has-openmp on the command-line, and it should print yes.

With that, pygmt.grdfilter should work with the -x option. E.g.:

import pygmt

pygmt.grdfilter(
    grid="@earth_relief_30m_g",
    filter="m600",
    distance="4",
    region=[150, 250, 10, 40],
    spacing=0.5,
    outgrid="filtered_pacific.nc",
    x=8,  # set to number of cores
)

This is the same as the GMT command

gmt grdfilter @earth_relief_30m_g -Fm600 -D4 -R150/250/10/40 -I0.5 -Gfiltered_pacific.nc -x8

We should probably update the documentation of grdfilter in GMT to mention that multi-core -x can be used, because it doesn’t seem to be documented, though gmt grdfilter --help shows it.

Why would one ever want not to compile with OpenMP? Buggy?

Thank you @weiji14 that’s amazing! Really appreciate the quick response and fix :+1::pray:

Think it was just an oversight, didn’t even realize we hadn’t enabled it in the build config :sweat_smile:

Cool, let us know how fast it’s going now (e.g. using -x1 vs -x8), haven’t had time to benchmark this properly yet!

Aha, so it is supposed to be ON by default.
It could be it is - not sure - but I don’t think so.

Yep, unclear. In the cmake/ we have ConfigUserAdvancedTemplate.cmake which says

#set (GMT_ENABLE_OPENMP TRUE)

Note it is commented out so defaults to FALSE

Building the bundle has

ConfigReleaseBuild.cmake:set (GMT_ENABLE_OPENMP TRUE)

in the build-release.sh script.

I will make a PR to change the template so that folks who build will build with OpenMP if available.

Does the same apply to gthreads?

It’s also OFF by default I think;

#set (GMT_USE_THREADS TRUE)

Yes and it is also not set in the build-release.sh for macOS bundle. Will need @Joaquim to advise on this. I agree that prof etc might be needed to determine where to try -x, but where we use it (“embarrassingly simple parallel places”), it is extremely simple to apply OpenMP. We do this a long in modules and internal functions where we loop over rows. Just a special pragma comment before the loop and the compiler will split that code into N threads. I think gthreads is much more complicated - as Joaquim did in grdfilter. On my endless back burner list I want to make a grdfilter version with OpenMP instead since easier to maintain.

The OpenMP change has been merged into master

Thanks.

Just for reference; PR #7709.

All I know (:grinning_face_with_smiling_eyes:) is that I have set (GMT_USE_THREADS TRUE) in my ConfigUser.cmake and that it works

*  GLIB GTHREAD support       : enabled (2.38.2)

and that example filtertest.sh does not detect the presence of gthreads

C:\v\build>ctest -R filtertest
Test project C:/v/build
    Start 418: test/grdfilter/filtertest.sh
1/1 Test #418: test/grdfilter/filtertest.sh .....   Passed   11.43 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =  11.49 sec

but if I force it in the script

C:\v\build>ctest -R filtertest
Test project C:/v/build
    Start 418: test/grdfilter/filtertest.sh
1/1 Test #418: test/grdfilter/filtertest.sh .....   Passed    2.62 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   2.66 sec

Will try on macOS after dinner. But note the comments:

# Set location of GLIB component gthread [auto].  This is an optional (and
# experimental) option which you need to enable:

Another point here;

Just started a grdfilter run, and it’s only using 1 thread:

grdfilter [INFORMATION]: Calculations will be distributed over 1 threads.

“Strange, the default is to use all available cores, ref. -x

I then ^c'ed, and tried adding -x to the command, just to see if it makes a difference.
And voila:

grdfilter [INFORMATION]: Calculations will be distributed over 12 threads.

Point: Should -x be implicitly set? I don’t buy a computer with >> 1 core just to see them idle (as in: it’s easy to forget to add -x to your command).

Shouldn’t the use of ~all cores be opt-out, and not opt-in?