Blockmedian output issue - multiple points per grid block

G’Day All,

I posted a few days ago about some gridding issues I was having, however I think the question was maybe too in depth. I’ve stripped the issue back to what I think the core issue is, which is the blockmedian method output. I’m just a bit unsure about how the output is supposed to look. My blockmedian output, when imported into GIS looks the below. I’m getting multiple points per grid block. My understanding was that I am supposed to be getting 1 point per block, which is the median of all the points that fall within that block. You can see on the left of the image, there is only 1 point per grid block but on the right, where I have multiple data sources, it has multiple points per grid block.

Can someone please advise what the blockmedian output should look like and if this is unusual output?

Thanks in advance.

Not possible to get multiple output per block from the block* modules. Must be confusion on your part of what -R and -I are in your original coordinates. Please zoom in on a small patch of your write side, plot those points, and overlay gridlines using the spacing selected with -I. If I recall you are doing geographic data but specifying the increment in meter, which means the increment is some decimated version of degrees from converting meters to degrees.

Hi @pwessel, thanks for your response, I modified my code based on your feedback and I’ve gopt the output with 1 point per grid block. I’m still getting weird positioning of the points within the block, where I have multiple data sources, in comparison to the areas where I have one data source. See attached image, you can see to the right compared to the fishnet grid of 0.0003 degree spacing, the points are randomly offset compared to the grid. The blockmedian code I’m using is below. Is there something I’m missing that’s causing this issue?

import duckdb
import geopandas as gpd
import pandas as pd
import pygmt

#-----Variables-----#
con = duckdb.connect(r"D:\QLD\DB\qld_test_albers_merge.db")
existing_table_name = "Victas_hr_B_1"
out_file_name = "qld_test_pygmt_forum"
geom_col = 'geom'
min_x = 150  #west
max_x = 151.2  #east
max_y = -20 #north
min_y = -21.2  #south
region = min_x, max_x, min_y, max_y
spacing = 0.0003
search_radius = spacing*3
filter = spacing*2
distance = spacing*3

#----Load duckdb spatial-----#
# Install and load the 'spatial' extension
con.execute("INSTALL 'spatial';")
con.execute("LOAD 'spatial';")

#----Read duckdb as GDF----#
# Read database as GeoDataFrame
try:
    gdf = gpd.GeoDataFrame.from_postgis("SELECT * FROM "+existing_table_name, con, geom_col=geom_col)
except:
    gdf = pd.read_sql("SELECT * FROM "+existing_table_name, con)
gdf = gdf.astype(float)
gdf_convert = gpd.GeoDataFrame(gdf, geometry=gpd.points_from_xy(gdf.X, gdf.Y), crs="EPSG:4326")
gdf_convert = gdf_convert.rename(columns={"geometry": "geom"})
gdf = gdf_convert
if 'X' not in gdf.columns:
    gdf['X'] = gdf['geom'].x
if 'Y' not in gdf.columns:
    gdf['Y'] = gdf['geom'].y
gdf = gdf.reset_index()
gdf = gdf[['geom','X','Y','Z']]

#----Create grid----#

grid = pygmt.blockmedian(
    x=gdf.X,
    y=gdf.Y,
    z=gdf.Z,
    spacing=f"{spacing}",
    region=region,
)`Preformatted text`

By default, blockmedian positions the points using the median XY position of all the points in each block, rather than as an evenly spaced grid. You could try to use pygmt.blockmedian(..., C=True) to get the points to be in the centre position instead perhaps. See https://docs.generic-mapping-tools.org/6.5/blockmedian.html#c

By the way, nice to see DuckDB being used here with PyGMT! We’ve got a semi-long-term plan to integrate with PyArrow at https://github.com/GenericMappingTools/pygmt/issues/2800, and it would be cool to pass in GeoArrow data directly (so you wouldn’t need to parse the XYZ data out manually). Watch this space :wink:

Hi @weiji14, thank you for your response! I’ll see if the center position option helps! That’s interesting about PyArrow, I’ll definitely be watching that space :grinning:

Just so it is clear, blockmean, blockmedian, and blockmode return a single point per cell that has at least one data point and the coordinates is the mean, median, or mode location. Those locations can be overwritten by using -C which just places the point in the middle of the cell. If you don’t care about aliasing and that the point is shifted arbitrary to the mid-point then use -C. Most of us use blockm* to reduce the data sets to only have one (or no) representative value per bin and then grid that data set with surface, greenspline, or nearneighbor.

Are you using blockm* to preprocess data for use with xyz2grd?

Ok thanks Paul, that is noted.

Hi @Andreas, I did attempt to use the following workflow: blockmedian > xyz2grd > nearneighbor. However I was getting an error feeding the xyz2grd out into the nearneighbour, as the xyz2grd output isn’t compatible as an input for nearneighbour. I didn’t pursue that as a solution but if you know something i don’t I’d love to hear it. Thanks.

Using nearneighbor after xyz2grd doesn’t make sense I think. Why are you doing that? xyz2grd basically reformats the input data to another format, e.g. xyz (ascii) -> netCDF. It expects a grid as input.

Thanks @Andreas after I commented I went back to look at the documentation and realised I had misinterpreted the method originally.