Can pyGMT remove duplicates when going from xyz to grid?

Hi,

I want to convert lon, lat, value (xyz) data to a non-overlapping evenly spaced grid and count the number of unique values in each grid cell. I can get the total count of values for each grid cell with either xyz2grd setting duplicate=‘z’ or blockmean setting summary=‘s’, but I want just the count of the unique values. Is that possible in pyGMT?

To give you an example I’m using the blockmean example with minor changes. I added an ID column where several locations have the same ID. Instead of just getting the number of earthquakes in each cell, I’d like to know how many unique IDs are in each cell if that’s possible.

import numpy as np
import pygmt

# Load sample data
data = pygmt.datasets.load_sample_data(name="japan_quakes")
# Select only needed columns
data = data[["longitude", "latitude", "depth_km"]]
# Add column with magnitudes
rng_seeded = np.random.default_rng(seed=42)
data.insert(3,'ID',rng_seeded.integers(1,5,size=115))

# Set the region for the plot
region = [130, 152.5, 32.5, 52.5]
# Define spacing in x- and y-directions (150x150 arc-minute blocks)
spacing = "150m"

fig = pygmt.Figure()

# Calculate number of total locations within 150x150 arc-minute bins
grd = pygmt.xyz2grd(data=data.drop('depth_km', axis=1), region=region, spacing=spacing, duplicate='z')

fig.grdimage(
    grid=grd,
    region=region,
    frame=["af", "+tNumber of points inside each block"],
    cmap="batlow",
)
fig.coast(land="darkgray", transparency=40)
fig.plot(x=data.longitude, y=data.latitude, style="c0.3c", fill="white", pen="1p,black")
fig.colorbar(frame="x+lcount")

fig.show()

thank you!!

Hi Johanna. Welcome to the GMT forum!

You could try to use blockmean -Sn to count the total number of values. Then you can see how many nodes have a 1. For this, you could use grd2xyz to extract the values to a table. Finally you have to count how many “1” you have.

I think it is impossible to get a count of unique values per grid cell using (py)GMT.
I thought binstats could possibly count unique values, but no, it counts the total number of values per cell, just like xyz2grd or blockmean.

Would be a reasonable feature request

Thanks so much for your replies! I’ll put in a feature request and will figure out some other way to do this. I’m able to in R but I haven’t found an equally simple way in python yet.