Computing mean across rows with gmtmath?

The title pretty much states the problem.

Given a table (matrix) of m (cols) by n (rows), is there an easy way, using gmtmath, to compute the mean (i.e., average) across each row?

The table is in binary form, or I would use awk to transpose rows to columns, then feed into gmtmath, which works by column.

Gmtconvert could be used to put the table in text format, then transpose, but that adds to the runtime required. Actually not that bad, in terms of runtime, but gmtmath didn’t like the width of the text table, which had (after transpose) 483 columns. Got lots of warnings:

gmtmath [WARNING]: Long input record (4898 bytes) was truncated to first 4094 bytes!

How can I get past this? Any ideas?

That would be so simple with GMT.jl that it even hurts

# Invent a table with 10 columns
julia> gmtwrite("lixo.dat", mat2ds(rand(5,10)))

# Load the pretend-to-be dataset
julia> D = gmtread("lixo.dat")
BoundingBox: [0.0581610695909, 0.980840753115, 0.072613296562, 0.866659845031, 0.107663139704, 0.908576186605, 0.354323700223, 0.678818573041, 0.0271959458778, 0.6064954132, 0.071393531721, 0.927347389372, 0.01304400113, 0.696648710976, 0.0417805833876, 0.827697720595, 0.211541809729, 0.986194363033, 0.0415487557649, 0.732005040647]
5×10 GMTdataset{Float64, 2}
 Row │     col.1      col.2     col.3     col.4      col.5      col.6     col.7      col.8     col.9     col.10
     │   Float64    Float64   Float64   Float64    Float64    Float64   Float64    Float64   Float64    Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 0.0581611  0.59417    0.107663  0.678819  0.606495   0.114512   0.113356  0.463102   0.439212  0.303902
   2 │ 0.892739   0.86666    0.696239  0.354324  0.540871   0.518044   0.122986  0.728437   0.644147  0.677205
   3 │ 0.460035   0.663536   0.908576  0.606943  0.541868   0.0713935  0.175379  0.596579   0.986194  0.0415488
   4 │ 0.827295   0.502392   0.662726  0.580983  0.0271959  0.927347   0.013044  0.0417806  0.211542  0.732005
   5 │ 0.980841   0.0726133  0.471736  0.531702  0.380023   0.82452    0.696649  0.827698   0.615786  0.704942

# Compute averages of each row
julia> mean(D,dims=2)
5×1 Matrix{Float64}:
 0.34793926271188996
 0.6041650133401001
 0.5052053096779902
 0.45263105080714006
 0.6106508736944

Ouch! Yes, that hurts! I’m afraid that I’ve never met Julia.

Paul, is there a simple way to increase the maximum byte size for the width of the input table? Or is this limited by the c-language?

To do what I was hoping to do will require table widths on the order of 500,000 bytes (or more!). That is, I want to compute mean values over ~25,000 columns, all with one command!

Would be just as simple in PyGMY or GMT/MEX of course, but tricky in bash.
Sounds like your input file is wrong, I.e., the long direction should be the columns (same variable) and then it is easy with gmtmath, but presumably you have no control of that file format (?).
Unless you do this day long and cannot wait for a slow GMT transposing step, I would convert table to a dummy grid, transpose with grdedit, dump to binary table via grd2xyz then gmtmath on those columns.

Alternatively, do as Joaquim (or equivalent in Matlab or python, have your bash script create the mex, Julia, or python scripts, then have it call it, e.g.

matlab -nodisplay mymex.m

And then continue with whatever.

Thanks, Paul, for the reply. The problem can be solved fairly easily in awk.

From a post I discovered on the StackExchange website, something like this can be done to compute the mean of each row of a table:

awk ' { sumrows = 0; lcol = 0
    for (i=1; i<=NF; i++) {
        if ( $i !~ /NaN/ ) { sumrows+= $i; lcol++ }
    }; print sumrows/lcol } ' infile

In my case, I wanted to avoid NaN values, thus the need for the if-statement.

The awk command creates a single column of mean values computed for for each row.

The input file is not exactly “wrong” but each row can vary by the number of non-NaN fields. Here’s the background story:

Each row contains values every 10m. The whole row represents a “segment” and there can be rows that have 710 bins (extending up to 7,100 meters). There can be thousands of these rows (segments) in the file I’m working from.

The 4094 limit is a GMT limitation when reading text files. No way around it, not even my dummy Julia example. But no problem if the file is in binary.

Thanks, Joaquim. Understand the limitation.

I think I found a work-around. I’ll convert the binary file to text and pass it through the awk command. No need to switch rows to columns, since it just works straight-through with data in rows.

Sorry to take time away for your troubles. Just needed to search for alternatives.

Hi @John_Geodesy -

Long boring flight from NEW to LAX gave me time to review a paper and implement this PR:

gmt convert -Z

which will transpose the single segment in the input; trailing text obviously gets lost. Seems to work as I added a test in the PR - give it a spin on your large files.

gmt convert -Z your.txt | gmt math -Ca -Sl STDIN MEAN =