How to read a list of coordinates?

KristofKoch · January 15, 2023, 8:20pm

The German AIP (Wikipedia) has lots of coordinates in the form of

B10  N 50 02 55.23  E 008 34 17.78
B20  N 50 02 52.90  E 008 34 19.84
B22  N 50 02 50.46  E 008 34 18.10
B23  N 50 02 49.70  E 008 34 20.35
B24  N 50 02 48.70  E 008 34 16.77
B25  N 50 02 47.48  E 008 34 15.73
V115  N 50 02 56.00  E 008 35 19.62
V116  N 50 02 55.48  E 008 35 19.92
V326  N 50 01 42.83  E 008 34 05.72
V328  N 50 01 45.41  E 008 34 04.24

A more easily readable form for the first entry would be

B10 N50°02'55.23" E008°34'17.78"

where B10 is the label text. Currently I extract that with some regex and a bit of math to get it into a GMT-friendly version:

8.571606 50.048675 B10

Is there an elegant way to read the original input with GMT directly? My convert-fu seems to be lacking as my experimentation led nowhere so far.

Thank you for your ideas & all the best,
Kristof

EJFielding · January 15, 2023, 9:14pm

That looks like an excellent project for Python and PyGMT.

Andreas · January 15, 2023, 9:34pm

It’s always nice to be able to read in data directly, but don’t forget the Rule of Clarity: Clarity is better than cleverness. If the regex works, and is easy to read, consider keeping it!

Joaquim · January 16, 2023, 12:04am

The wiki link does not have that table.

KristofKoch · January 16, 2023, 12:54am

@Andreas, that rule is true. The regex does work, easy to read … not so much. I try to keep the scripts portable, that’s why I try to strive for simple bash solutions. Currently I’m experimenting with

line=$'B10  N 50 02 55.23  E 008 34 17.78'
unset IFS
read -ra arr -d '' <<<"$line"
i=0; for a in "${arr[@]}"; do let i++; echo "$i [$a]"; done

Output:

1 [B10]
2 [N]
3 [50]
4 [02]
5 [55.23]
6 [E]
7 [008]
8 [34]
9 [17.78]

This appears to be more robust than the regex. Now I got all the parts extracted and can manipulate them to my liking.

KristofKoch · January 16, 2023, 1:02am

Hi @Joaquim that’s true – the Wikipedia link was meant to explain AIP. If you want the tables have a look at this page for example. If you are interested in the full thing try https://aip.dfs.de/basicIFR/ and have a look around.
Clicking through AD > AD 2 > Frankfurt Main > AD 2 EDDF 1-5 would lead you to one of those pages.

KristofKoch · January 16, 2023, 1:14am

Looks like a good time to brush up my Python skills! Where would be the advantages over a bash/regex solution?

EJFielding · January 16, 2023, 1:36am

Python has many tools and functions for reading and parsing text files. Then you can do the math and plot with PyGMT without having to combine regex and bash scripts. Of course, some people would recommend Julia ;-).

Joaquim · January 16, 2023, 1:43am

But those are images, not tables so not scrappable

EJFielding · January 16, 2023, 3:08am

Yes, it seems there must be a text version somewhere instead of only those images of the AIP report pages.

KristofKoch · January 16, 2023, 12:41pm

Gentlemen, my apologies. I didn’t point out to you that the pdf-version of the page hides behind the little printer icon in the upper right corner. Not sure if direct linking to it works.

Running it through my parser setup gives me the following text file: AD 2 EDDF 1-5.txt (17.6 KB)

There is - even a nice XML version in AIXM 5.1 format - but unfortunately those things come with a price tag. We are talking in the range of tens of thousands of Euros. Just a little bit outside my budget.

So what I’m doing right now is getting the pdf files, parse them and then precondition them for GMT usage. That preconditioning step is not as flexible as I hoped with my regex solution. That’s why I’m looking for alternatives.

Joaquim · January 16, 2023, 2:26pm

For obvious reasons

Needs the PDFIO.jl package (to install ] add PDFIO)
Put the followin in a .jl file

using PDFIO, GMT
function squeeze_aip(file::String="AD 2 EDDF 1-5.pdf")
	# 'file' is the the (full) name of the PDF to squeeze. If not provided, defaults to "AD 2 EDDF 1-5.pdf"
	doc = pdDocOpen(file)

	page = pdDocGetPage(doc, 1)
	io = IOBuffer();
	t = pdPageExtractText(io, page);
	tt = split(String(take!(io)), '\n')

	nc = length(tt) - 9		# Number of actual coordinates
	lonlat = zeros(nc, 2)	# Pre-allocate the lon,lat matrix 

	for k = 1:nc
		s = split(tt[k+6])	# ["B10", "N", "50", "02", "55.23", "E", "008", "34", "17.78"]
		lonlat[k,2] = (parse(Int, s[3]) + parse(Float64, s[4])/60 + parse(Float64, s[5])/3600) * (s[2] == "N" ? 1 : -1)
		lonlat[k,1] = (parse(Int, s[7]) + parse(Float64, s[8])/60 + parse(Float64, s[9])/3600) * (s[6] == "E" ? 1 : -1)
	end
	return mat2ds(lonlat, proj="geog")
end

run it (include("squeeze_aip.jl); squeeze_aip())

julia> squeeze_aip()
BoundingBox: [8.52953888888889, 8.591800000000001, 50.03703055555555, 50.05206944444444]
PROJ: +proj=longlat +datum=WGS84 +units=m +no_defs
69×2 GMTdataset{Float64, 2}
 Row │     Lon      Lat
     │ Float64  Float64
─────┼──────────────────
   1 │ 8.57161  50.0487
   2 │ 8.57218  50.048
   3 │ 8.57169  50.0474
   4 │ 8.57232  50.0471
...

PlanetGus · January 16, 2023, 4:03pm

Hi @KristofKoch,

Not sure it is what you’re looking for but … maybe … :

cat test.txt >
B10  N 50 02 55.23  E 008 34 17.78
V326  N 50 01 42.83  E 008 34 05.72

(just considering the latitude) :

awk '{print $3,$4,$5'} test.txt | ...
gmt math STDIN -C1 60 DIV -C2 3600 DIV = | ...
awk '{print $1+$2+$3}'

50.0487
50.0286

It can be adapted to your heart content.

I tried to sum the columns, but math module seems to only deal with rows.
Good luck

KristofKoch · January 22, 2023, 10:16am

Thank you gentlemen for your helpful input!

Sometimes it is hard to see the wood for the trees and an outside input helps wonders. In the end the combination of read and gmt math turned out to be the most robust solution for my level of expertise.

pwessel · January 22, 2023, 12:22pm

math is definitely row oriented by sett the COL operator for doing column math.

PlanetGus · January 22, 2023, 4:14pm

I tried but failed to use it