Pattern matching with xyz2grd; help with an example?

jedokaplan · November 22, 2023, 1:46am

Consider the following ASCII table fragment:

2022/12/31,00:00:00.137822,  25.5796, -092.8005
2022/12/31,00:00:00.137828,  25.5142, -092.7933
2022/12/31,00:00:00.196525,  25.5239, -092.7855
2022/12/31,00:00:00.137898,  25.7603, -092.6680
2022/12/31,00:00:00.196527,  25.4814, -092.7792
2022/12/31,00:00:00.196526,  25.5268, -092.7614
2022/12/31,00:00:00.255014,  25.5596, -092.7512
2022/12/31,00:00:00.255030,  25.5970, -092.7089
2022/12/31,00:00:00.333076,  25.5067, -092.7766
2022/12/31,00:00:00.576707,  25.5335, -092.7463

The table is comprised of comma separated fields containing date, time, lat, and lon. Each table covers one day (00:00-23:59). I am ingesting these data into xyx2grd to create hourly rasters. I have been pre-filtering the data in awk to extract only the lines of interest, but I think this is slowing me down and I have a lot of data to process.

I recently realized that xyz2grd has the -e option that should allow me to filter the table data directly. However, I’m struggling to construct a pattern or regular expression that would allow me to include only the lines of interest, covering one hour, e.g., [00:00:00, 01:00:00).

I would be grateful if anyone could help me with an example pattern or expression for the -e option that would allow me to get just one hour at a time. Many thanks!

Alpiner · November 22, 2023, 4:15am

i prefer awk & select.
min=2022-12-31T00:00:00
max=2022-12-31T01:00:00
awk -F"[, ]+" ‘{print $4,$3,$1"T"$2}’ file | gmt select -f2T -Z$min/$max | gmt xyz2grd …

jedokaplan · November 22, 2023, 6:32pm

Thanks for the tip. I’m trying to avoid invoking awk as I think it slows down the processing quite a bit and in the end I don’t need the timestamp, but I will give it a try.

jedokaplan · November 22, 2023, 7:57pm

The following two methods produce the desired result of extracting just a single hour’s data from the input ASCII table:

grep ".*,00:.*" inputfile.txt | gmt gmtinfo

gmt gmtinfo inputfile.txt -e"/.*,00:.*/"

Interestingly, the method using grep is significantly faster, up to 3x faster when working on a table with ~70,000 rows.

pwessel · November 22, 2023, 9:08pm

Expected since GMT does the conversion fro ascii coordinates to double precision internal representations and applies checks on these values (e.g., |lat| <= 90 etc). grep does not care what the patterns mean so probably a grep piped to xyz2grd is the faster option.