Type error with pygmt.grdtrack - possible bug or intended?

Feva67 · September 27, 2023, 1:02am

Hello there! I was trying to use pygmt.grdtrack for a project, but kept getting an error when trying to run it. I was trying to use a dataframe that has a lot of extra columns apart from the first two (being ‘lon’, ‘lat’) as my points argument in the function

df=pygmt.grdtrack(grid=grid, points=df, newcolname="newcol")

and got the error

[TypeError: sequence item 1: expected string, int found]

I found it came from a ' '.join() line being run in one of the functions called by grdtrack. I called the function back as

df[['lon','lat','newcol']]=pygmt.grdtrack(grid=grid, points=df[['lon','lat']], newcolname="newcol")

and this time it worked without issue. The columns in my dataframe had different types of data (float, str and datetime objects). I don’t know if grdtrack has an issue with any of this data types, or just with the fact that the dataframe was too big (about 100k rows by 10 cols) but I think it would be good to clarify it in the documentation if it does.

I didn’t look too hard into trying to replicate the issue in a sample that can be shared, but I can try if someone from the development team is interested.

Edit: Adding a simple example, here it has problems with the timedelta object. Not the same error but you can see that, if you turn it into a datetime object, it also has some weird behaviour (it turns it into float after grdtrack is called)

import pandas as pd
import numpy as np
import datetime

df = pd.DataFrame(columns=['lon','lat','trash','trashdate'])
df['lon']=np.random.default_rng().uniform(low=-74,high=-71,size=[500])
df['lat']=np.random.default_rng().uniform(low=-51,high=-46,size=[500])
df['trash']=np.random.default_rng().uniform(low=-51,high=-46,size=[500])
df['trashdate']=(np.random.default_rng().uniform(low=0,high=5,size=[500]))*datetime.timedelta(seconds=3e7)#+datetime.date(2018,1,1)

spacing=0.1 #grados de arco
region=[-74.0, -71.0, -51.0, -46.0]
n_lon=round((region[1]-region[0])/spacing)+1
n_lat=round((region[3]-region[2])/spacing)+1
lat,lon=np.mgrid[region[2]:region[3]:n_lat*1j,region[0]:region[1]:n_lon*1j]

grd = pygmt.xyz2grd(x=lon.flatten(),y=lat.flatten(),
                    z=np.random.default_rng().uniform(low=-500,high=500,size=len(lon.flatten())),
                    region=region, spacing=spacing, projection='m6c',
                    outgrid='/home/ejemplo.grd')
df=pygmt.grdtrack(grid='/home/ejemplo.grd', points=df, newcolname="hsup3d")

Once again, I don’t know if it’s intended or if it’s a side effect of decisions made for efficiency sake, but I think it would be useful to document.

yvonnefroehlich · September 29, 2023, 9:11am

Hello @Feva67,

thanks for writing this detailed post !
Actually, there is no change in the data type of the columns, but the column names are mixed up by grdtrack. Interestingly, this seems to happen only for datetime objects.
I slightly changed your code example:

import pygmt
import pandas as pd
import numpy as np
import datetime

df = pd.DataFrame(columns=['lon','lat','trash','trashdate'])
df['lon']=np.random.default_rng().uniform(low=-74,high=-71,size=[500])
df['lat']=np.random.default_rng().uniform(low=-51,high=-46,size=[500])
df['trash']=np.random.default_rng().uniform(low=-51,high=-46,size=[500])
df['trashdate']=(np.random.default_rng().uniform(low=0,high=5,size=[500]))*datetime.timedelta(seconds=3e7) + datetime.date(2018,1,1)

spacing=0.1 #grados de arco
region=[-74.0, -71.0, -51.0, -46.0]
n_lon=round((region[1]-region[0])/spacing)+1
n_lat=round((region[3]-region[2])/spacing)+1
lat,lon=np.mgrid[region[2]:region[3]:n_lat*1j,region[0]:region[1]:n_lon*1j]

grd = pygmt.xyz2grd(x=lon.flatten(),y=lat.flatten(),
                    z=np.random.default_rng().uniform(low=-500,high=500,size=len(lon.flatten())),
                    region=region, spacing=spacing, projection='m6c',
                    outgrid='ejemplo.grd')
df_new = pygmt.grdtrack(grid='ejemplo.grd', points=df, newcolname="hsup3d")

When comparing the pandas DataFrames df and df_new, one can see that the dates which
initially were listed under trashdate now are listed under the new column name hsup3d and the extracted grid values are listed under trashdate:

df
           lon        lat      trash   trashdate
0   -73.568697 -47.459938 -49.356210  2022-03-19
1   -72.313962 -47.150966 -47.851923  2019-06-20
2   -72.152289 -48.838568 -46.270479  2020-01-01
3   -73.958346 -50.504973 -47.712409  2020-02-29
4   -73.830190 -49.592392 -48.526433  2021-11-17
..         ...        ...        ...         ...
495 -72.896381 -48.944317 -49.268924  2020-08-15
496 -71.394546 -46.051909 -47.649541  2021-11-06
497 -72.712320 -50.024669 -48.255074  2020-12-01
498 -71.000018 -50.159363 -49.492276  2020-11-21
499 -71.973386 -48.036988 -48.933350  2018-02-04

[500 rows x 4 columns]

df_new
           lon        lat      trash   trashdate      hsup3d
0   -73.568697 -47.459938 -49.356210 -438.733107  2022-03-19
1   -72.313962 -47.150966 -47.851923   46.840081  2019-06-20
2   -72.152289 -48.838568 -46.270479  215.213289  2020-01-01
3   -73.958346 -50.504973 -47.712409   32.759593  2020-02-29
4   -73.830190 -49.592392 -48.526433 -409.640038  2021-11-17
..         ...        ...        ...         ...         ...
495 -72.896381 -48.944317 -49.268924   -3.834141  2020-08-15
496 -71.394546 -46.051909 -47.649541  -94.446515  2021-11-06
497 -72.712320 -50.024669 -48.255074 -384.853777  2020-12-01
498 -71.000018 -50.159363 -49.492276 -450.903007  2020-11-21
499 -71.973386 -48.036988 -48.933350  230.477535  2018-02-04

[500 rows x 5 columns]

Feva67 · September 29, 2023, 2:29pm

That is very interesting, thank you for sharing! Did you manage to run it with timedelta objects as in my previous example?

I am not familiar at all with how python packages are built, but maybe if grdtrack could be called as in my example (using only the relevant columns from the dataframe) this issues could be avoided.

yvonnefroehlich · October 2, 2023, 2:18pm

I have reported this issue to the PyGMT GitHub repository at https://github.com/GenericMappingTools/pygmt/issues/2703.
@Feva67 can you please check if the solution suggested by @seisman works for your use case?

Feva67 · October 2, 2023, 7:24pm

It does work now, although the gmt behaviour regarding text that @seisman explains makes it useless for my application. I will be using the workaround I posted before for any future endeavors, good to know how it works. Thank you!