Sample1d on gmt6.1 and gmt6.2

szahirovic · August 31, 2020, 11:56am

Hi team,

Hope you’re all doing well! I have a sample1d line that used to work, but now throws errors. I’m not sure if the syntax has changed so much that I have confused the flags (I’ve checked, and I’m pretty sure they’re OK), or there’s some sort of precision issue…

I have a file with four columns - lon, lat, z, cumulative_distance

I want to interpolate along that line using the fourth column, in this case 0.1 decimal degrees (not sure if this is meant to be in “time”, but it’s worked before).

This is the code I use:
gmt sample1d input.txt -Ar -Fl -N3 -T0.1 -o0,1,2 -V > output.txt

And the error I get is
sample1d [ERROR]: Failure in gmt_intpol near row 9!

The input file can be downloaded from:
https://www.dropbox.com/s/ueohv5cz0guulsd/input.txt?dl=0

Let me know if I’m doing something strange here… Thanks for having a look in advance, really appreciate it!

Sabin

szahirovic · August 31, 2020, 1:06pm

Hmmm I managed to get rid of most of the problems, but I can see I still have some duplicated z lines. I’ll post again later as I think I can get around it, and the solution may help others… Fingers crossed.

szahirovic · August 31, 2020, 1:34pm

Oh my goodness, the solution was quite convoluted. The problem were duplicated points at the origin that had zero distance… Here’s my hack below, just in case it helps anyone in the future… Yikes.

Example problem:

>
111.961609519	19.433004169	28.5703073519	0
111.961609519	19.433004169	28.415148969	0
111.961609519	19.433004169	24.5341996297	0
111.961609519	19.433004169	20.4528577786	0
111.961609519	19.433004169	17.6771346168	0
111.961609519	19.433004169	16.8710608079	0
111.961609519	19.433004169	13.9730055121	0
111.961609519	19.433004169	10.5867751914	0
111.961609519	19.433004169	9.96239246195	0
111.961609519	19.433004169	10.7793480341	0
111.961609519	19.433004169	11.7011659068	0
111.961609519	19.433004169	14.6546469797	0
111.961609519	19.433004169	16.5674624051	0
111.961609519	19.433004169	18.2760157006	0
111.961609519	19.433004169	15.5111926005	0
111.961609519	19.433004169	11.1377707716	0
111.961609519	19.433004169	4.82451194919	0
111.961609519	19.433004169	1.07042616578	0
111.961609519	19.433004169	3.47431415072	0
111.961609519	19.433004169	9.66939725796	0
111.961609519	19.433004169	20.6836665011	0
111.961609519	19.433004169	25.5667699716	0
111.961609519	19.433004169	28.3081772482	0
111.961609519	19.433004169	28.5703073519	0
111.961609519	19.433004169	27.7910588681	0
111.961609519	19.433004169	25.1986607434	0
111.961609519	19.433004169	17.4962789249	0
111.961609519	19.433004169	10.3389556163	0
111.961609519	19.433004169	10.3355917541	0
112.054205785	19.4381497266	8.79474380069	0.0875132773335
112.146807893	19.4432483193	7.25250671114	0.175026597796
112.226382334	19.447591837	7.35283206866	0.250223825109

# Simplifies output to three columns
gawk '{print $1, $2, $3}' input.txt > aaa

# Removes duplicate lines
gawk 'x !~ $0; {x=$0}' aaa > bbb

# Computes cumulative distances in Column 4
gmt mapproject bbb -Gd -m > ccc

# Computes the distance between points in Column 5
gmt mapproject ccc -Gd- -m > ddd

# Discards points where the distance is negative, to ensure monotonically increasing values
gawk '{if ($0 ~/>/) {print $0} else if ( $4 == "0" || $5 > "0" ) {print $1, $2, $3, $4}}' ddd > eee

# Problematic points occur where the point is the origin, and this point is duplicated once or more times
# Appends column with difference in X
awk 'NR>1{print $0, $1-p} {p=$1}' eee  > fff

# Appends column with difference in Y
awk 'NR>1{print $0, $2-p} {p=$2}' fff  > ggg

# Where difference in X AND Y are zero (i.e., duplicated origin points), discard
gawk '{if ($5 != 0 && $6 != 0) {print $1, $2, $3, $4}}' ggg > hhh

# And this seems to work!
gmt sample1d hhh -Ar -Fl -N3 -T0.1 -o0,1,2 -V > output.txt

pwessel · August 31, 2020, 5:41pm

I’ve been down that dark alley too. I think GMT should be able to help with this in two ways:

Option in sample1d to skip duplicates
Option in mapproject to skip output records if the distance increment is zero.

szahirovic · August 31, 2020, 8:35pm

Haha it took me all day to figure it out. The only thing that alerted me to where the problem was, was using GMT4 sample1d, which alerted me to a problem in line 404,104 of the original file. sample1d in GMT6 was reporting an issue with rows 6 and 9, but there were no issues there. So it’d be amazing if one of your suggested solutions were implemented, as I’ve seen this issue often enough that it’s caused issues - however, at least now I have a (wonky) workaround! Thanks for all that you guys do, I GMT!

pwessel · August 31, 2020, 11:56pm

Hi Sabin-

Could you just email me the original file (the dropbox says it is gone) and restate what you want. I think it is

1.Take input lon lat columns and add a distance column, output lon ,lat, z, dist
2.Sample the result at every 1 km.

szahirovic · September 1, 2020, 4:16am

Hi Paul,

Ah Dropbox must have assumed the file changed and probably broke the link. It should be easier with a new link that points to the entire folder - it has the input file, and my BASH script that has the workaround:

For me it’s great to have mapproject to identify negative “distance” (time?) increments, to ensure monotonically increasing values - but the duplicated points at the origin of a profile (0 distance) are tough, mainly because the SAMPLE1D error message doesn’t even point to them. To address it, one needs to run a convoluted set of steps as I have shown here, which might be harder on a Windows machine, although maybe there’s an easier/quicker way to do it with some Python/Pandas magic.

And yep, given x, y, z, and distance - resample using a regular increment (kms, decimal degrees, etc.).

Thanks again for having a look at it, hope all is well with you and the GMT team!

Sabin

pwessel · September 1, 2020, 7:19am

Thanks Sabin-

So your very first gawk command removes a single record from aaa:

73.1831511779 -5.19852241196 0

but it is not a true duplicate since the other matching record is

73.1831511779 -5.19852241196 0.382695603922

So I dont understand why gawk thinks they are the same record.

Joaquim · September 1, 2020, 9:49am

The magic is quite simple in Julia

using GMT
D = gmtread("input.txt");

for k = 1:length(D)
       ind = [true; diff(D[k].data[:,4]) .!= 0.];
       D[k].data = D[k].data[ind,:];
end

but the problem is that there some other problem in one of the segments and the error it causes kills julia (we can’t have crashes in GMT)

Dint = sample1d(D, A=:r, F=:l, N=3, T=0.1, o="0,1,2")

...
sample1d [WARNING]: Segment 10150 in table 0 has < 2 records - skipped as no interpolation is possible
sample1d [WARNING]: Segment 10156 in table 0 has < 2 records - skipped as no interpolation is possible
sample1d [ERROR]: Option T: (max - min) is <= 0

The same error occurs if I save the cleaned dataset to file and run from command line

 gmtwrite("no_dups.dat", D)

gmt sample1d no_dups.dat -Ar -Fl -N3 -T0.1 -o0,1,2 -V > output.txt

EDIT: PR #4137 fixes the crash and now both Julia and cmd commands work.

szahirovic · September 2, 2020, 12:29am

Hi Paul,

The X and Y points are duplicated, but you’re right, there are multiple unique Z values for the duplicated points. This is not ideal, but I was willing to sacrifice the duplicated points, as overall it wouldn’t make much difference to the subsequent interpolation - my main aim was to resample along the profile.

Sabin

szahirovic · September 2, 2020, 12:32am

Hi Joaquim,

Thanks so much for looking into this, hope you’ve been well.

Really appreciate your insights and your Julia/GMT wizardry!

Just checking, before I go down the path of recompiling, will I need to run those Julia lines, or will it be handled within gmt/sample1d? Either way, it’s a good solution - just wanted to ask the obvious silly question before I get myself confused.

Sabin

pwessel · September 2, 2020, 12:44am

Yes, if there actually were duplicate (x,y,) with different z values and you wanted to replace those by some mean z-value then that is a different problem.
gmt convert will soon be able to skip duplicate based on the columns you decide to include in the comparison, including trailing text. gmt sample will have a simpler option to skip duplicates solely based on the time-column value. These upgrades will appear in master (6.2) soon. They will not be in 6.1.1 which we are releasing in a day or two (bug-fixes only).

However, my question with your gawk was why it flagged those two records as duplicates when their z-values are different? I thought $0 refers to the entire record as a string, so would expect aaa == bbb?

szahirovic · September 2, 2020, 2:16am

Hi Paul,

That was my typo, that line finds “similar” lines, denoted by the tilde characters.

So the line
gawk ‘x !~ $0; {x=$0}’ aaa > bbb

should actually be
gawk ‘x != $0; {x=$0}’ aaa > bbb

And there shouldn’t be any duplicates to delete, I put it in as I assumed there could be cases in the future with identical XYZ entries.

pwessel · September 2, 2020, 2:49am

OK. So this is what enhanced gmt convert will do on your input file:

gmt convert -Td0:1 input.txt -o0-2 -V > t.txt
gmtconvert [INFORMATION]: Processing input table data
gmtconvert [INFORMATION]: Reading Data Table from File input.txt
gmtconvert [INFORMATION]: Write Data Table to <stdout>
gmtconvert [INFORMATION]: 1 tables concatenated, 1725937 records passed (input cols = 4; output cols = 4)
gmtconvert [INFORMATION]: Eliminated 494 duplicate records

As you can see, I am asking it to exclude duplicates defined as having the same repeat values in columns 0 and 1 (your lon, lat), but you can specify any column combination. This should eliminate your first two gawk commands.

I will see how things fare downstream with mapproejct and sample1d as well.

pwessel · September 2, 2020, 3:08am

With the duplicates gone, I encountered no problems:

gmt mapproject t.txt -Gd > ccc
gmt sample1d ccc -Ar -Fl -N3 -T0.1 -o0,1,2 -V > output.txt

Your script checks for negative distance increments. If I compute increments I do not get anything negative since distances are positive definite. While sample1d -A+d will soon be available as well to exclude duplicate records (only determined from time-column) it is not needed here since the gmt convert call handled that. So I think your script can collapse to just these lines:

gmt convert -Td0:1 input.txt -o0:2 -V > aaa
gmt mapproject aaa -Gd > bbb
gmt sample1d bbb -Ar -Fl -N3 -T0.1 -o0:2 -V > output.txt

The sample1d will complain about a bunch of segments with one point so it cannot interpolate and skips them.

szahirovic · September 2, 2020, 3:22am

Hi Paul,

That’s fantastic, thanks so much! I’ll download the code from Github and recompile in the coming days. Really appreciate all that you and the team do.

Sabin

pwessel · September 2, 2020, 6:27am

Hi Sabin, our new enhancements have been merged into master.

Joaquim · September 2, 2020, 11:08am

You need to rebuild GMT and run those Julia commands.