plotting rows? or reading in 10,000 line character string using awk and sending to gmt psxy plot, then color by likelihood value

kmcd · March 28, 2020, 8:43pm

I want to use either native GMT commands or awk or similar, to read an output file from another program from the beginning as a character string and search for the word CHAIN, then read (and convert the values to numbers) until I hit the word CHAIN again. I need to convert paired values (they are time and temperature) from rows into columns. I’m going to be generating plots in GMT and using this newly created file as plot input.

See truncated example of text file requiring extraction, below (just a single line). The file is over 100,000 lines so this may not be efficient. I need to ignore the “100000 100000 1” after the word CHAIN and then grab the -633 (likelihood), and the time-temperature values in the row and convert to columns

…
CHAIN 100000 100000 1
100001 -633.673715 -705.087080 5 1690.796631 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
…

The 100001 is the iteration #, the negative values are likelihood and posterior probabilities, and then the 5 indicates that there are 5 time and temperature point pairs, with those points then listed. Not sure of the best way to grapple this but each of these records needs to be plotted together as a path; i.e. 1690.7 (time) and 399.0 (temp.) is point one, 1660.5 and 334.5 is point two, etc. then the -633.67 value will be used for the entire assembled path as a value for color in a color ramp during plotting when combining all paths. I’m not sure if there is a way to plot rows rather than columns within GMT, but if so this may be easier that way.

pwessel · March 28, 2020, 8:59pm

This works for me:

awk '{printf "> %s %s %s %s\n", $1, $2, $3, $4}; {for (col = 9; col < NF; col +=2) printf "%s\t%s\n", $(col), $(col+1)}' yourfile > gmtfile

pwessel · March 28, 2020, 9:11pm

Well, the first printf should have the first 8 items if you care about carrying it all forward:

awk '{printf "> %s %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8}; {for (col = 9; col < NF; col +=2) printf "%s\t%s\n", $(col), $(col+1)}' yourfile > gmtfile

kmcd · March 28, 2020, 9:22pm

This file has text before and after the line I quoted. Would this work for 10000 records in a file? and I need it to ignore all other text. I was trying the use the word “CHAIN” as a way to starting reading the file…

Here is the entire beginning of the file:

z4111_R2.0
1
/Applications/QTQt5.7/HUDSON_STRAIT/Salisbury/z4111-AFT-ZHe.txt : 0 : 0
0.011 2 2 0 1 0
HIERACHICAL 1
Setting tt points 1 = 3
894.000000 894.000000 225.000000 225.000000 0.000000
1725.000000 37.000000 400.000000 50.000000 0.000000
1688.000000 100.000000 350.000000 50.000000 0.000000
5.000000 0.000000
Max allowable dTdt = 5.000000 No reheating = 0 Rate Tolerance 5.000000
Age/VR rescaling ranges = Min : 1.000000 Max : 100.000000
Reference samples 0 0 1.000000 0
DoHyp 0 0
Gradient 0
Setting tt points 2 = 3 TOFFSET 0
0 894.000000 894.000000 225.000000 225.000000 -1
1 1725.000000 37.000000 400.000000 -50.000000 0
2 1688.000000 100.000000 350.000000 -50.000000 1
1688.000000 100.000000 350.000000 50.000000
3 0.000000 0.000000 5.000000 0.000000 2
0.000000 0.000000 5.000000 0.000000
Setting tt points 3 = 3
0 894.000000 894.000000 225.000000 225.000000 -1
1 1725.000000 37.000000 400.000000 -50.000000 0
2 1688.000000 100.000000 350.000000 -50.000000 1
1688.000000 100.000000 350.000000 50.000000
3 0.000000 0.000000 5.000000 0.000000 2
0.000000 0.000000 5.000000 0.000000
Initial model no of tt points = 3
0 894.000000 894.000000 225.000000 225.000000 -1
1 1725.000000 37.000000 400.000000 -50.000000 0
2 1688.000000 100.000000 350.000000 -50.000000 1
1688.000000 100.000000 350.000000 50.000000
3 0.000000 0.000000 5.000000 0.000000 2
0.000000 0.000000 5.000000 0.000000

EEK 0 894.000000 894.000000 225.000000 225.000000 0.000000 0.000000 0.000000 -1
EEK 1 1725.000000 37.000000 400.000000 -50.000000 0.000000 0.000000 0.000000 0
EEK 2 1688.000000 100.000000 350.000000 -50.000000 0.000000 0.000000 0.000000 1
EEK 1 3 0.000000 0.000000 5.000000 0.000000 0.000000 0.000000 0.000000 2
Initial LIKE = -9731.567249
CHAIN 100000 100000 1
100001 -633.673715 -705.087080 5 1690.796631 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100002 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100003 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100004 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100005 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100006 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100007 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100008 -633.724703 -705.138069 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 30.221972 464.256348 101.782127 449.261597 60.643616 0.000000 3.577390
100009 -633.478262 -704.891628 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 37.398453 464.256348 101.782127 449.261597 60.643616 0.000000 4.721468
100010 -633.478262 -704.891628 5 1756.631592 399.020813 1660.518921 334.510010 1161.580566 37.398453 464.256348 101.782127 449.261597 60.643616 0.000000 4.721468
100011 -633.490232 -704.903598 5 1756.631592 399.020813 1660.518921 334.510010 1200.190796 37.398453 464.256348 101.782127 449.261597 60.643616 0.000000 4.721468
100012 -633.220983 -704.634348 5 1756.631592 399.020813 1660.518921 334.510010 1200.190796 37.398453 464.256348 94.137177 449.261597 60.643616 0.000000 5.851962
100013 -633.220983 -704.634348 5 1756.631592 399.020813 1660.518921 334.510010 1200.190796 37.398453 464.256348 94.137177 449.261597 60.643616 0.000000 5.851962
100014 -633.578604 -704.991970 5 1756.631592 399.020813 1660.518921 334.510010 1200.190796 37.398453 459.904083 94.137177 449.261597 60.643616 0.000000 5.851962
100015 -632.108599 -715.733771 6 1756.631592 399.020813 1660.518921 334.510010 1312.814941 319.276215 1200.190796 37.398453 459.904083 94.137177 449.261597 60.643616 0.000000 5.851962
100016 -632.108599 -715.733771 6 1756.631592 399.020813 1660.518921 334.510010 1312.814941 319.276215 1200.190796 37.398453 459.904083 94.137177 449.261597 60.643616 0.000000 5.851962

kmcd · March 28, 2020, 9:25pm

As soon as I hit CHAIN I need to keep reading for 10k lines and then I will hit CHAIN again and it should terminate there and save everything to a file for GMT to read in. Here is an example of the output plot I need to create

pwessel · March 28, 2020, 9:37pm

I see. This will require more programming to parse that file. Best to do in your favorite programming language, especially if there are more than one block of CHAIN … CHAIN sections. Might be a lot simple to do in C or Python than in shell scripts, for instance.