`gmt info` with `-di0.0` filters out small non-zero values?

jaltekruse · September 1, 2022, 9:45pm

I was doing some basic data validation after reformatting some text data (fewer decimal places, omit records with values of zero, …), expecting that the range of values in my reformatted files would match the range of non-zero values in the original data. However, it seems like gmt info misses the smallest non-zero value(s) if I tell it to ignore records with a value of 0 (-di0.0, which I think replaces data values equal to 0.0 with NaN).

cat << EOD >> test.csv
-133.0,48.0,0e0
-140.5,47.2,1e-17
-140.6,47.3,1e-16
-132.9,48.0,1e-15
-132.9,48.0,1.1e-15
-132.9,48.0,1.2e-15
-132.9,48.0,2e-15
-132.9,48.0,3e-15
-130.2,50.6,1e-14
-130.2,50.6,1e-1
EOD

gmt info test.csv 
# test.csv: N = 10       <-140.6/-130.2> <47.2/50.6>     <0/0.1>
# don't exclude zeros - OK

gmt info test.csv -di0.0
# test.csv: N = 10       <-140.6/-130.2> <47.2/50.6>     <1.2e-15/0.1>
# misses all non-zero values below 1.2e-15

awk -F, '$3>0{print}' test.csv | gmt info
# <Standard Input>: N = 9 <-140.6/-130.2> <47.2/50.6>     <1e-17/0.1>
# if zeros are omitted first, we get the actual minimum non-zero value

The -di0.0 seems to filter out any values below about 1.2e-15 (somewhere between 1.1e-15 and 1.2e-15). Trying negative numbers suggests that -di0.0 filters out values with an absolute value less than 1.2e-15. What’s going on here?

Thanks!

pwessel · September 1, 2022, 10:40pm

Given how floating points are represented we are using the theory discussed on this page when comparing two floating point numbers. One cannot usually do if (a == b) when it is floating points since there are issues with representation of numbers exactly. So we use 5* DBL_EPSILON as the threshold here, and the web will say " DBL_EPSILON is almost always about 10^-16". It is possible that 5 is too much but remember all numbers are represented via IEEE binary values so there is no 1.0e-15 per se.

pwessel · September 2, 2022, 2:53am

An alternative way that does not check for equality is something like this:

gmt info test.csv -qi1e-50/+c2
test.csv: N = 9 <-140.6/-130.2> <47.2/50.6> <1e-17/0.1>

jaltekruse · September 2, 2022, 3:22pm

as in “slowly backing out of the floating-point math black hole, closing all those tabs, and returning to productivity…” (almost)

Thanks for the -q tip, I haven’t used that GMT flag, er, swab before