Psconvert [ERROR]: Cannot execute Ghostscript (gs)

I have a program which is calling psconvert and it has some acceptance tests, a hundred or so. I find that I get sporadic failures on those tests

psconvert [ERROR]: Cannot execute Ghostscript (gs).

when I run them in parallel. By sporadic, one in a hundred (one or two for each run of the test suite), different test fails each time, seemingly randomly. Any idea how to debug this further?

Please note

  • yes, I have gs on my path, it’s in /usr/bin/gs
  • gs version 9.55.0
  • gmt version 6.4.0
  • Linux mint (ubuntu variant)
  • each test is run in a different temporary directory

That message originates when this line in psconvert.c fails.

if (gmt_check_executable (GMT, Ctrl->G.file, "--version", NULL, GSstring)) {	/* Found Ghostscript */

And it fails probably when in gmt_support.c#L18543 the popen call fails.

	if ((fp = popen (cmd, "r")))	/* There was such a command */
		gmt_fgets (GMT, line, GMT_LEN256, fp);	/* Read first line */

So, this seems to point to a OS/popen issue.

Thanks – that’s really very strange, I’m seeing this locally and also on (GitLab) CI on multiple versions of Debian. I’ll investigate further …

I’ve done a bit of debugging on this, it seems that popen() is not the issue, in these error cases that succeeds and does not modify errno, instead it is the subsequent gmt_gets() or more accurately the fgets() therein which fails, setting errno to 4, EINTR, “Interrupted system call” leaving the line argument as “\0”;

I’m not sure where these signals are coming from, as far as I can see, there’s not use of them in GMT itself right?

Yes, you are right that it can fail also in fgets step. Regarding the usage of the line contents I see that movie.c L2704 is using the contents of it.

Could you try to trick that bit of code so that the gmt_fgets failure is not fatal and see if it doesn’t have consequences in your case. We could than think in a test that is milder when detecting the ghost.

Having read around a bit, it seems that this behaviour is for signal handling. The fgets blocks until completed, and if something goes pear-shape in the system call, might lock the program progression. So fgets returns NULL and sets errno to EINTR if interrupted by a signal, giving the caller the opportunity to check flags set by the signal handler and handle recovery. In this case there is (I think) no handler in scope, so the “correct” response should be to retry: something along the lines of

errno = 0;
while (fgets(...) == NULL)
  {
    if (errno != EINTR)
       return 1;
    errno = 0;
  }

possibly have some counter to limit the number of retries. Now I think it would be a bit ill-mannered of me to suggest that GMT implements this for all fgets calls (I see there are 50+ in the codebase), and I think can fix my usescase by doing the essentially the same retry mechanism around the call to GMT_Call_Module, since errno will still be EINTR there on failures. So could I suggest that I make that fix, check that it works and provide a link to the change here – then similar could be implemented by other GMT wrappers if affected by the same issue (I’m thinking pyGMT).

For a bit more context, the code here is a Ruby program which calls the GMT C API in a native extension, and that Ruby program is acceptance tested by BATS which is using GNU parallel for parallelisation, I’m pretty sure (but not certain) it’s one of these later two which is generating the problematic signals. It’s in the native extension that I plan to implement the fix, some more notes here.

The gmt_check_executable function (that calls the sometimes-failing code) is not called in many places. Basically by psconvert and when plotting latex expressions, so if it solves this case I think we could apply your patch directly at gmt_run_process_get_first_line. Can you find how many times that while loop would have to be executed in order to jump the first failure?

I think that the failure occurring in gmt_check_executable is just bad luck, in my particular case that happens to be running at about the same time as a signal is delivered, and occasionally it hits during its execution … I think it is reasonable for the GMT library to error in this way – if the user chooses to use it in a signal-heavy environment, then they (I) should be expected to deal with the EINTRs. Arguably the GMT commands psconvert etc could/should do similar, since the user has no visibility of errno so cannot reasonably retry on failure. But this is obviously a rare failure mode (else there would be reports) so I guess low priority.

In any case, the fix I mentioned above does seem to do the trick – in this change I modify the GMT gem so that if any of the calls to GMT API functions fails and sets errno, then the Errno::<name> exception is raised (in Ruby), and in the Ruby code I have

begin
  gmt.psconvert(...)
rescue Errno::EINTR
  retry
end

and now I run the acceptance tests in parallel numerous times and don’t see any errors.

Thanks for the pointers which lead to this fix :slight_smile:

As a sort of aside, with this new facility of making errno visible via the Ruby-C native extension, I find that all GMT programs set errno to ENOENT (file not found) on calling -^, -? or -+ , not an issue but a bit odd.

Further, amusingly

> gmt pscontour -? >& /dev/null
> echo $?
63

while

> strace gmt pscontour -? |& grep ENOENT | wc -l
63

well, I found it funny :slight_smile:

I have no idea why that happens. The only references to ENOENT I find in the GMT code is when it tests if a directory exists. As in

GMT_LOCAL int psconvert_make_dir_if_needed (struct GMTAPI_CTRL *API, char *dir) {
	struct stat S;
	int err = stat (dir, &S);
	if (err && errno == ENOENT && gmt_mkdir (dir)) {	/* Does not exist - try to create it */

Looks like most come from access(2) calls which fail to find the file (according to strace)

And I just found that on Win and a fast laptop the line

if (gmt_check_executable (GMT, Ctrl->G.file, "--version", NULL, GSstring)) {	/* Found Ghostscript */

Takes the incredibly ~0.13 sec to execute.

That’s quite a chunk! On Linux and a weak laptop

time /usr/bin/gs --version
9.55.0

real	0m0.041s
user	0m0.019s
sys	0m0.021s

I guess that would mostly be the fork? Win is (was?) notorious for slow forks, hence the popularity of threads …

I meant that line executed from psconvert that resorts to use popen, but I guess it won’t take that time on Linux.

Sure, but popen calls fork and runs shell against the arguments in the child process. One could avoid the shell by doing a fork and exec yourself, but adds complexity (and I see in a few places shell is actually required, wildcards etc).

Sure I don’t want to spend time on that. But, from memory, there is no fork on Windows. To build GMTSAR with MSCV I had to find some 3rd party implementation of fork.

My sympathy, looks like a deep rabbit hole

You might be interested in #8417 too.