Gmtmath across multiple files

On MacOS 11.6.2, GMT Version 6.4.0_19b77af_2021.12.15

I love the utility of gmtinfo across lots of files. For example, if one needs to determine column min/maxs for a bunch of files (with “data_” prefix) in a directory, this command works beautifully:

gmt gmtinfo -Aa data_*.txt

It returns the total number of records and the min/max of each column (as long as all data sets have the same number of columns).

Now, I’d like to get a common mean and standard deviation across multiple data sets. The command I’d like to use is gmtmath (to get the mean across all data sets, (say) for column 2):

gmt gmtmath -Sl -i2 data_*.txt MEAN =

The above command results in a malloc error. Is there a way for gmtmath to accept wildcards in file names?

If not, then what would be an easy way to compute a mean & STD across lots of files in a directory?

Maybe you could try with cat to combine all your files in a single one. Something like (not sure if the gmt math syntax is ok).

cat data_*.txt | gmt math -Sl -i2 MEAN =

When piping to gmt math one must specify the magic filename STDIN:

cat data_*.txt | gmt math -Sl -i2 STDIN MEAN =

Great suggestion, Esteban, and that was sort of what I wound up doing. As Paul said, STDIN is needed.

I used gmtconvert to change each individual data set into binary, feeding the output from each file using a “>>” output director to pull everything together. After the binary file was complete, then I made the computation for the mean & std. It was very fast to do it that way, as the combined file could have more than 10 million entries.

Thanks for the replies. As usual, I seem to try to twist things in ways that weren’t meant to be!

Happy New Year!

2 Likes

John, how many files are data_*.txt in your case and total number of records? I am curious about the malloc error…

Paul, these directories involve satellite passes over specific inland water bodies.

The largest of these water bodies is the Caspian Sea, in which case there are 3770 individual data*.txt files.

But I encountered the malloc error when trying to process data over much smaller Lake Tahoe, which only had 99 such files, and just shy of a total of 122,000 measurement locations on the lake.

Here is the exact error message (without -Vd) on Lake Tahoe:

> gmt gmtmath -Sl -i2 data_*.txt MEAN =
gmt(68630,0x11bb51e00) malloc: Incorrect checksum for freed object 0x7fb17e013a00: probably modified after being freed.
Corrupt value: 0x409da81460aa64c3
gmt(68630,0x11bb51e00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

The output of -Vd appears to indicate that it had trouble starting after the read-in of the third file on Lake Tahoe. Looking at the -Vd output for the previous two files, the malloc occurred right before a debug statement “Calling nc_open” took place. Earlier in the -Vd output, you can see an expanded version of the wildcard. I have visually checked the third and fourth files and couldn’t spot any kind of anomaly (or unexpected character) that might have caused the problem.

I’ve attached the first six files in a file called data_02410101.zip. Maybe you can confirm/test it.data_02410101.zip (92.2 KB)

FYI, the five cols of the .txt files contain this data: lon, lat, ht_ortho, geoid_egm2008, and ht_abv_wgs84.

Thanks John, I can probably simulate this. Unlike psxy, block, etc etc, which can read as many files as Unix allows, gmtmath and grdmath maintains a stack (in post-fix notation parlance) and that stack is only 100, which is a lot for a stack since normally you would not put numerous files on the command line since they have to be operated on by operands. If you have more than 100 on the command line it should result in an error and message

			GMT_Report (API, GMT_MSG_ERROR, "Stack overflow!\n");

but apparently you did not get that.

1 Like