Overflow bug with Pygmt project?

Feva67 · June 13, 2022, 2:36am

Hello everyone. I’ve been trying to use pygmt.project to project a dataset I have, but I keep running into the same error. My data is in Pandas dataframes with valid values for the fields ‘lon’ and ‘lat’, and the function runs well enough when I use a small number of rows for each dataframe. But as soon as I cross a certain threshold, the process grinds to a halt and I have to close the jupyter notebook to stop it. When I run the function in verbose mode I get this

project [WARNING]: Latitude (2.83351e+190) at line # 63 exceeds -|+ 90! - set to NaN
project [WARNING]: Latitude (8.51889e+223) at line # 64 exceeds -|+ 90! - set to NaN

I guess it’s some kind of overflow based on the magnitude of the numbers. The dataset doesn’t have anything weird on it. Also, I don’t know if the line numbers reference rows on the dataframe, but I can run the function with dataframes of length 120, so it doesn’t make much sense that it would suddenly have a problem with rows 63 and 64 when I run it at length 140. For length 120 I get this output from verbose mode

project [INFORMATION]: Processing input table data
project [INFORMATION]: Reading Data Table from Input memory location via vector
project [INFORMATION]: Writing Data Table to file /tmp/pygmt-vm9gv76b.csv
project [INFORMATION]: 120 read, 120 used
project [INFORMATION]: Processing input table data
project [INFORMATION]: Reading Data Table from Input memory location via vector
project [INFORMATION]: Writing Data Table to file /tmp/pygmt-1f0cynx7.csv
project [INFORMATION]: 120 read, 120 used

This is how I’m calling the function

dataref=data[data['cycle']==cicloref]
datacenter=dataref[['lon','lat']].iloc[0]
datacenter=[datacenter['lon'],datacenter['lat']]
dataend=dataref[['lon','lat']].iloc[-1]
dataend=[dataend['lon'],dataend['lat']]
for ciclo in cycles:
    datacycle=data[data['cycle']==ciclo][['lon','lat']].head(120)
    dataproj=pygmt.project(data=datacycle,center=datacenter,endpoint=dataend,verbose=True)

I can give the dataset if someone needs it, but I’m guessing it’s a bug and not something specific to my data. Please let me know if it’s a known bug, or if it might be a problem with how I’m approaching the function. Thanks a lot!

seisman · June 13, 2022, 2:52am

@Feva67 It would be better if you can share your dataset with us, so that we can easily reproduce your issue and see if it’s a bug.

Feva67 · June 13, 2022, 3:06am

Sure, I have a ton of data, but I uploaded the data that was giving me trouble (data.csv) and the track to which I want to project it (ref.csv). I’m using center as the first row of the ref dataframe, and endpoint as the last. When I run just that one projection one time, it does work, but with major issues, as the first values for the data after the projection gives me longitudes of around 1e-310. And when I rerun it, it stops working altogether.

Here’s the link with the data: https://we.tl/t-23jO4kGMAv

seisman · June 13, 2022, 12:18pm

Could you please provide a minimal script to reproduce your issue?

I also see something weird using the following script:

import pygmt
import pandas as pd
import numpy as np
data = pd.read_csv("data.csv")
# works
pygmt.project(data=np.array([data["lon"], data["lat"]]).T, center=(data["lon"][0], data["lat"][0]))
# doesn't work
# pygmt.project(data=np.array([data["lon"], data["lat"]]), center=(data["lon"][0], data["lat"][0]))

I expect [data["lon"], data["lat"]] or np.array([data["lon"], data["lat"]]) should work, but they don’t. I have to use np.array([data["lon"], data["lat"]]).T instead.

Feva67 · June 13, 2022, 12:40pm

I have the data inside my program from previous uses, but trying to reproduce the problem with as minimal a script as I can

import pygmt
import pandas as pd
import numpy as np
data = pd.read_csv("data.csv")
ref=pd.read_csv("ref.csv")

proj=pygmt.project(data=data, center=[ref["lon"].iloc[0], ref["lat"].iloc[0]], 
                   endpoint=[ref["lon"].iloc[-1], ref["lat"].iloc[-1]]
              ,verbose=True)

print(proj)

I’m getting seven columns instead of six, with the first one being whole, negative numbers, and the projection being clearly wrong (as I guess it’s using that first column as the longitude)

     0     1          2          3          4            5          6
0   -38 -73.212460 -46.530087  28.743623  12.147509 -82.975708 -74.963838
1   -37 -73.212591 -46.530980  28.890962  12.397887 -83.114786 -75.106733
2   -36 -73.212617 -46.531158  29.042756  12.645793 -83.260823 -75.253860
3   -35 -73.215821 -46.554212  29.201824  12.889398 -83.416947 -75.407937
4   -34 -73.215846 -46.554390  29.362601  13.131979 -83.578058 -75.563561
..   ..        ...        ...        ...        ...        ...        ...
631 -32 -73.744973 -50.229856  30.140998  13.302399 -84.408714 -76.315362
632 -31 -73.745000 -50.230034  30.309917  13.529008 -84.600915 -76.478120
633 -30 -73.745027 -50.230213  30.483040  13.752715 -84.802665 -76.644773
634 -29 -73.745054 -50.230391  30.660329  13.973446 -85.014457 -76.815267
635 -28 -73.745082 -50.230570  30.841744  14.191129 -85.236818 -76.989544

[636 rows x 7 columns]

Not the same error but still. I saw your example avoids giving the dataframe as input data, but when I tried to turn it into a numpy array as you did it stopped working. Maybe it’s something to do with the center and endpoint I’m using (from the ref.csv dataframe instead of the data.csv one). Let me know if you can recreate any of the problems I’m getting. Thanks a lot

seisman · June 13, 2022, 12:50pm

There are three columns in your data data.csv. Is it really what you want?
Does the output make sense to your if you use the following code instead?

data = pd.read_csv("data.csv", usecols=(1,2))

Feva67 · June 13, 2022, 12:55pm

Sorry, since the first column is the index number of the dataframe, I thought it would be interpreted as so by csv_read. I’ll try again later and let you know. Thanks for your help so far

Feva67 · June 19, 2022, 2:38am

Hello, sorry for the long wait. I’ve been trying to work out a minimal version of my bug in order to share the data, but I don’t know how to do it. The problems seem to appear when I work with my whole dataset, and not one isolated example. Is there any reason why this shouldn’t work:

import pygmt
import numpy as np

data=dfsplit['gt1l'][11][dfsplit['gt1l'][11]['cycle']==3]
data=data[['lon','lat']]
data.to_csv('data.csv')
ref=dfsplit['gt1l'][11][dfsplit['gt1l'][11]['cycle']==8]
ref=ref[['lon','lat']]
ref.to_csv('ref.csv')
dataproj=pygmt.project(data=data,center=[ref['lon'].iloc[0],ref['lat'].iloc[0]]
                       ,endpoint=[ref['lon'].iloc[-1],ref['lat'].iloc[-1]],verbose=True)

While this does:

import pygmt
import pandas as pd
import numpy as np

data = pd.read_csv("data.csv")
data=data[['lon','lat']]
ref=pd.read_csv("ref.csv")
ref=ref[['lon','lat']]

proj=pygmt.project(data=data, center=[ref["lon"].iloc[0], ref["lat"].iloc[0]], 
                   endpoint=[ref["lon"].iloc[-1], ref["lat"].iloc[-1]]
              ,verbose=True)

Considering the data I load on my second example is exactly the one I’m saving in the first one? With the first implementation it never stops running, can’t even interrupt the kernel, while with the second one it works perfectly. I see no difference in the calculations I’m making in both cases, so I just assume it’s something to do with how pygmt is using the data when it comes from one source or the other. Also, for some reason, the first example works just the first time I run it, for some reason interpreting the data[‘lon’] column as zero (1e-310) for just the first four lines? And if I try to run it again it just dies?

I’m sorry I can’t give you a more localized example you can play with, I’m loading tons of data on my program and it wouldn’t make much sense to send you all of it, I’ll keep trying to make a simpler example.

Thank you very much

Feva67 · June 19, 2022, 2:51am

Another example. This works:

import pygmt
import pandas as pd
import numpy as np

data11=dfsplit['gt1l'][11]

data11.to_csv('data11.csv')
data11b=pd.read_csv("data11.csv")

data=data11b[data11b['cycle']==3]
data=data[['lon','lat']]

ref=data11b[data11b['cycle']==8]
ref=ref[['lon','lat']]

dataproj=pygmt.project(data=data,center=[ref['lon'].iloc[0],ref['lat'].iloc[0]]
                       ,endpoint=[ref['lon'].iloc[-1],ref['lat'].iloc[-1]],verbose=True)
print(dataproj)

While this doesn’t

import pygmt
import pandas as pd
import numpy as np

data11=dfsplit['gt1l'][11]

data11.to_csv('data11.csv')
data11b=pd.read_csv("data11.csv")

data=data11[data11['cycle']==3]
data=data[['lon','lat']]

ref=data11[data11['cycle']==8]
ref=ref[['lon','lat']]

dataproj=pygmt.project(data=data,center=[ref['lon'].iloc[0],ref['lat'].iloc[0]]
                       ,endpoint=[ref['lon'].iloc[-1],ref['lat'].iloc[-1]],verbose=True)
print(dataproj)

The second implementation, as before, fails for the first four lines, giving me this output

           0            1               2         3         4           5
0    6.950641e-310  6.950641e-310 -56.154731  69.097668 -68.794012   9.485766
1    6.950641e-310  6.950641e-310 -56.154731  69.097668 -68.794012   9.485766
2    4.650598e-310  4.650598e-310 -56.154731  69.097668 -68.794012   9.485766
3    4.650598e-310  4.650598e-310 -56.154731  69.097668 -68.794012   9.485766
4    -7.321585e+01  -4.655439e+01   0.024773  -0.000234 -73.215508 -46.554412