I posted a few days ago about some gridding issues I was having, however I think the question was maybe too in depth. I’ve stripped the issue back to what I think the core issue is, which is the blockmedian method output. I’m just a bit unsure about how the output is supposed to look. My blockmedian output, when imported into GIS looks the below. I’m getting multiple points per grid block. My understanding was that I am supposed to be getting 1 point per block, which is the median of all the points that fall within that block. You can see on the left of the image, there is only 1 point per grid block but on the right, where I have multiple data sources, it has multiple points per grid block.
Can someone please advise what the blockmedian output should look like and if this is unusual output?
Not possible to get multiple output per block from the block* modules. Must be confusion on your part of what -R and -I are in your original coordinates. Please zoom in on a small patch of your write side, plot those points, and overlay gridlines using the spacing selected with -I. If I recall you are doing geographic data but specifying the increment in meter, which means the increment is some decimated version of degrees from converting meters to degrees.
Hi @pwessel, thanks for your response, I modified my code based on your feedback and I’ve gopt the output with 1 point per grid block. I’m still getting weird positioning of the points within the block, where I have multiple data sources, in comparison to the areas where I have one data source. See attached image, you can see to the right compared to the fishnet grid of 0.0003 degree spacing, the points are randomly offset compared to the grid. The blockmedian code I’m using is below. Is there something I’m missing that’s causing this issue?
By default, blockmedian positions the points using the median XY position of all the points in each block, rather than as an evenly spaced grid. You could try to use pygmt.blockmedian(..., C=True) to get the points to be in the centre position instead perhaps. See https://docs.generic-mapping-tools.org/6.5/blockmedian.html#c
By the way, nice to see DuckDB being used here with PyGMT! We’ve got a semi-long-term plan to integrate with PyArrow at https://github.com/GenericMappingTools/pygmt/issues/2800, and it would be cool to pass in GeoArrow data directly (so you wouldn’t need to parse the XYZ data out manually). Watch this space
Hi @weiji14, thank you for your response! I’ll see if the center position option helps! That’s interesting about PyArrow, I’ll definitely be watching that space
Just so it is clear, blockmean, blockmedian, and blockmode return a single point per cell that has at least one data point and the coordinates is the mean, median, or mode location. Those locations can be overwritten by using -C which just places the point in the middle of the cell. If you don’t care about aliasing and that the point is shifted arbitrary to the mid-point then use -C. Most of us use blockm* to reduce the data sets to only have one (or no) representative value per bin and then grid that data set with surface, greenspline, or nearneighbor.
Hi @Andreas, I did attempt to use the following workflow: blockmedian > xyz2grd > nearneighbor. However I was getting an error feeding the xyz2grd out into the nearneighbour, as the xyz2grd output isn’t compatible as an input for nearneighbour. I didn’t pursue that as a solution but if you know something i don’t I’d love to hear it. Thanks.
Using nearneighbor after xyz2grd doesn’t make sense I think. Why are you doing that? xyz2grd basically reformats the input data to another format, e.g. xyz (ascii) -> netCDF. It expects a grid as input.