Performance issue with pygmt

I’m creating a fairly complex figure with contours, various grids with transparencies, missing values, text… This is an example:

I’m using pygmt for the first time. The figure takes much longer than I expected. I tried to recreate the figure using CLI GMT, and indeed it takes about half the time. I want to use this for a time-critical application, therefore this lack of performance is problematic.

I was wondering if I’m doing anything wrong from Python, or if instead this lack of performance is something you cannot avoid and a price to pay for using the Python API.

I have left a simple, yet runnable example in this link. Running plot_class.sh takes about 3 seconds in my laptop, while running plot_class.py takes about 5 seconds, which is nearly 100 % more. Further, although it is a bit more difficult to explain, I noted that the time it takes to run with Python is variable: sometimes it takes longer, sometimes shorter.

Could it be a problem with the sessions, or some sort of leftover files left by Python? I do not fully understand this concept in the new version of GMT…? Do you think is there a way to overcome this overhead from Python, or instead this is how pygmt works and we can do nothing about it?

We can likely do some performance improvements in PyGMT, in particular for functions that return grids/tables but I do not think that improved speed is a high priority for developers right now compared to a more Pythonic UX and feature richness. @Joaquim, how does GMT.jl compare in speed to CLI GMT? My impression is that people who care a lot about performance often prefer Julia over Python.

GMT.jl takes almost the same time as plain GMT. There are several situations where data arrays need to be copied but that won’t account for more than tenths of seconds in big arrays. A major source of delay is the fact that Julia is JIT compiled and despite many of my efforts time to first run (called TTFP - Time To First Plot in Julia slang) still accounts to several seconds because that includes compiling time. Next runs are ~equal to GMT CLI.

The problem with pyGMT (and from my very limited knowledge) is that since it didn’t wrap the GMT internal structures it resorts to writing and reading temporary files and that is very performant killing.

OK, I see. It’s a pity. I expected that python was just some sort of wrapper that just calls the internal core, so it would take the same time. I guess it’s more complicated.

Maybe this is a stupid follow up question, but then why not to just develop a library that relies on making system calls? This might be uglier, but certainly also faster. I’m used to write Bash script, now moving to Python. I’m just tempted to use GMT as CLI, but making system calls from Python.

Is there a reason I’m not seeing for why this is such a bad idea?

Please don’t take me wrong. I’m not complaining about the current design. I’m sure there are very good reasons to be like that. I just would like to understand them out of my curiosity. Thanks!

I’m not much involved in this but a good reason why system calls are uninteresting is: what it that to gain?

If using system calls one are restricted to pass data by disk files, then why not just use the plain GMT? The interest of the wrappers is to be able to pass data in/out by memory transfers and be able to do further processing in the host environment. But that is not possible with system calls.

I think the best resources for understanding the initial design decisions for PyGMT, including the avoidance of system calls, are Leo’s two SciPy talks - the recordings for these are listed in the presentation section of the PyGMT docs.

If the Python library were committed to system calls rather than using the C API, it would never have any hope of being faster than the CLI due to the overhead of setting up those system calls. Right now, PyGMT passes in-memory objects through the GMT C API but returning in-memory objects is not yet implemented such that we resort to temporary files. The current setup of avoiding system calls means that we can implement that enhancement for performance improvements, once we have the time and ideally funding to do so.

Thanks for the explanation. It makes much more sense now :smiley:

I would like to know the location where these temporary file are writen and if I can change the location as a setting, if I use local ram memory as a filesystem it would save me a lot of time.

These temporary files are created by the GMTTempFile class, which is a wrapper of the Python’s standard tempfile.NamedTemporaryFile function. Thus, the exact location of these temporary files are platform-dependent, and is something like /tmp/.

A good news is that the PyGMT team is working on getting rid of these temporary files, as mentioned in issue https://github.com/GenericMappingTools/pygmt/issues/2730, and implemented in PR https://github.com/GenericMappingTools/pygmt/pull/2729 and https://github.com/GenericMappingTools/pygmt/pull/2398. It’s unclear when these PRs will be fully reviewed and merged, but I expect it should happen in the next one or two PyGMT releases (i.e., less than half an year).