Given a table (matrix) of m (cols) by n (rows), is there an easy way, using gmtmath, to compute the mean (i.e., average) across each row?
The table is in binary form, or I would use awk to transpose rows to columns, then feed into gmtmath, which works by column.
Gmtconvert could be used to put the table in text format, then transpose, but that adds to the runtime required. Actually not that bad, in terms of runtime, but gmtmath didn’t like the width of the text table, which had (after transpose) 483 columns. Got lots of warnings:
gmtmath [WARNING]: Long input record (4898 bytes) was truncated to first 4094 bytes!
Ouch! Yes, that hurts! I’m afraid that I’ve never met Julia.
Paul, is there a simple way to increase the maximum byte size for the width of the input table? Or is this limited by the c-language?
To do what I was hoping to do will require table widths on the order of 500,000 bytes (or more!). That is, I want to compute mean values over ~25,000 columns, all with one command!
Would be just as simple in PyGMY or GMT/MEX of course, but tricky in bash.
Sounds like your input file is wrong, I.e., the long direction should be the columns (same variable) and then it is easy with gmtmath, but presumably you have no control of that file format (?).
Unless you do this day long and cannot wait for a slow GMT transposing step, I would convert table to a dummy grid, transpose with grdedit, dump to binary table via grd2xyz then gmtmath on those columns.
Alternatively, do as Joaquim (or equivalent in Matlab or python, have your bash script create the mex, Julia, or python scripts, then have it call it, e.g.
In my case, I wanted to avoid NaN values, thus the need for the if-statement.
The awk command creates a single column of mean values computed for for each row.
The input file is not exactly “wrong” but each row can vary by the number of non-NaN fields. Here’s the background story:
Each row contains values every 10m. The whole row represents a “segment” and there can be rows that have 710 bins (extending up to 7,100 meters). There can be thousands of these rows (segments) in the file I’m working from.
The 4094 limit is a GMT limitation when reading text files. No way around it, not even my dummy Julia example. But no problem if the file is in binary.
I think I found a work-around. I’ll convert the binary file to text and pass it through the awk command. No need to switch rows to columns, since it just works straight-through with data in rows.
Sorry to take time away for your troubles. Just needed to search for alternatives.
Long boring flight from NEW to LAX gave me time to review a paper and implement this PR:
gmt convert -Z
which will transpose the single segment in the input; trailing text obviously gets lost. Seems to work as I added a test in the PR - give it a spin on your large files.
gmt convert -Z your.txt | gmt math -Ca -Sl STDIN MEAN =
But the command fails if I try to feed-in the whole data set (likely due to the fact that the input is binary and output is text):
> gmt gmtconvert -bi710f -Z t1m2.bin | gmt gmtmath -Ca -Sl STDIN MEAN = tmean2.txt
gmtmath [WARNING]: Long input record (6433 bytes) was truncated to first 4094 bytes!
gmtmath [WARNING]: Long input record (6378 bytes) was truncated to first 4094 bytes!
However, given that the number of rows in my input table can vary (number of columns is fixed at 710), if the number of rows is first ascertained with gmtinfo, then that count can be fed to the command to get it to complete successfully. This command sequence works:
This sequence yields a vector of mean values computed for each row of the original table. Seems very fast and very efficient. With binary input (to avoid hitting the 4094 byte limit), it is important to supply the number of rows found in the original table.
Yes, forgot that for binary we must indicate number of columns since there is no way to figure that out otherwise.,You probably could get rid of that ugly awk pipe by using -o3 in gmt info instead?
Maybe useful to add the essence of your nrows stuff to the man page since for large binary files you will need to know the number of rows since it becomes the number of columns after transposition. In fact, perhaps gmt convert can report this under -Vi so then it is available without that second call.
Now, -V will report dimension of transposed matrix.
Regarding your comment
I’m not entirely clear on what is meant by “single segment.” Perhaps this can be written so that most users will understand it
Since the 1990s, GMT datasets can be composed of one or more data segments, separated by the “> whatever comments” lines. Since transposing applies to a matrix there can only be one “matrix” in the file, hence a single segment.
.
Awesome, Paul. I was thinking along those lines: that the use of the phrase, “single segment,” referred to tables of data (line) segments with the separator “>”. Yes, in my case, I was working with a single “segment” table.
WRT the usage of -V, can an example be provided? I tried it, as follows, and wonder whether this how it should be used, where /dev/null is just a place to stuff non-important output:
> gmt gmtconvert -bi710f -V t1m2.bin > /dev/null
gmtconvert [INFORMATION]: Processing input table data
gmtconvert [INFORMATION]: Input 710 columns via binary records using format ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
gmtconvert [INFORMATION]: Reading Data Table from File t1m2.bin
gmtconvert [INFORMATION]: Write Data Table to <stdout>
gmtconvert [INFORMATION]: 1 tables concatenated, 483 records passed (input cols = 710; output cols = 710)
What would be the best way to trap the value for the number of records?
As for the crash, it is related to too long output ascii strings, but when I tried -bo I got the same so more work to be done there. Thanks for the data and example.
Need help from @Joaquim on this (need branch mathcol-check): I updated gmtinfo to handle output summary strings longer the 4096 by allocating the string, etc. However, get odd crashes every few times. Command is
gmt gmtinfo -bi710f tpart.bin
and @John_Geodesy’s file was attached in his earlier message. if you get any clues please share.