HDF5 Access by GMT?

Is there a way to directly access HDF5 files directly via a GMT command?

Right now, I run a data staging script that makes h5dump calls to save data into binary files, then use GMT commands to reset them to all be the same type (double precision), then use gmtconvert to form a binary table. Then, I can access the resulting multi-column binary file with my GMT plotting scripts, as needed. Just a lot of steps.

I guess what I was thinking of is something akin to gmtconvert. But I suspect it is very ugly prospect to actually implement, since the user would need to know, before-hand, the structure, group and variable names inside the HDF5 file.

Hereā€™s a little code snippet to show you what I do to create a multi-column binary file:

# indsn is an HDF5 file, defined as an input parameter to the script
beam="gt1l"
p_param=${beam}"/ssh_segments"
echo -n " delta_time"
h5dump -y -A 0 -o t0.bin -b LE -d ${p_param}/delta_time $indsn >& /dev/null
echo -n " heights/h"
h5dump -y -A 0 -o t1.bin -b LE -d ${p_param}/heights/h $indsn >& /dev/null
echo " heights/h_var"
h5dump -y -A 0 -o t2.bin -b LE -d ${p_param}/heights/h_var $indsn >& /dev/null
# h5dump initially stages the data
# the -y option is to: Do not print array indices with the data
# the -A 0 option shuts off all the attribute screen dump
# the -o option routes the data to a file
# the -b LE option sets the output as little endian binary
# the -d option selects the specified dataset in the hdf5 file
# gather the data, suppress h5dump messages, too

# convert single precision parameters to double precision
gmt gmtconvert -bi1f -bo1d t2.bin > t2a.bin; mv t2a.bin t2.bin

# gmtconvert command to pull data together into one data set
gmt gmtconvert -bi1d -A -bo3d t0.bin t1.bin t2.bin > outdata.bin

Arrays yes (see my Improving user documentation Ā· GenericMappingTools/gmt Ā· Discussion #4678 Ā· GitHub comment in). Vector data, maybe some. There were recent improvements in vector access to variables in groups but I dis not try it yet.

Thanks for the reply, Joaquim. I looked at your comment on the github page. It implies that GDAL might be able to assist. I havenā€™t tried using GDAL for that purpose, and am using hdf5 tools, instead. Iā€™ll lookover GDAL docs to see what command might be applicable.

Maybe someday, a GMT command can be created or modified to access HDF5 files, directly. To do what Iā€™m doing (forming multiple column binary files) might be awful to do in one step, however!

John, if you can point me into one of those hdf files I can see if I can access data with GMT only.

And my comment was in the sense that if GDAL can access it so can GMT (via GDAL)

NSIDC hosts thousands of HDF5 files from the ICESat and ICESat-2 missions.

As an example, I tried to upload a small one, but the system wouldnā€™t permit .h5 files to be uploaded. So I zipped it. Still too big. So hereā€™s a link (available for ten days starting today) to the file:

Test HDF5 file

This is a test HDF5 file from ICESat-2. The top level groups include the six beams, names gtxa where x is a beam number and a is (l) left or Ā® right.

Inside each gtxa beam are a bunch of parameters. One can use HDFView to get a sense of the structure. Pick a parameter and determine whether you can pull it out.

Good luck, Joaquim!

That file seems to have vector data, not arrays/grids. Probably the changes added in this PR is what you are looking for

I donā€™t see a docs update but the examples in the PR should be a good starting point.
To use this youā€™ll need to build GMT from source.

That example file I provided may only contain vector data, but other publicly available ICESat-2 files (from NSIDC) can have grids and arrays, depending on the product downloaded.

My GMT installation is based on git, and I try to keep up with interim releases, so maybe Iā€™ll can try this.

Thanks again, Joaquim!

The grids can be accessed as explained in the CookBook.
It works.

Oh my goodness! I just tried it. Now, that is amazing! Bye bye, h5dump commands!

Iā€™d suggest that HDF5 be added to the cookbook documentation so that folks will know they can access HDF5 files just as easily as NetCDF files.

What a way to end a Friday! Woohoo!

One last comment. In the Cookbook, section 8.1.3, it says:

Currently, GMT only reads, but does not write, netCDF tabular data.

Thatā€™s fine, in that a user canā€™t create a new NetCDF file, but the user can write out the data as text or as a binary file.

What is particularly useful, is being able to specify the format of the output binary file:

gmt gmtconvert indsn.h5?group1/group2/param -bo1d > outdsn.bin

Really nice work, Remko - if all of this was something he added.

In some real-world runtime tests on a higher-level ICESat-2 product (ATL12, ocean product), using gmtconvert commands instead of h5dump commands, to stage data, was slightly slower.

I havenā€™t asked the photon data to try to do it this way, yet. That will be the real test, as some of those scripts take more than a minute to run, using h5dump commands.

@John_Geodesy just one side note, netCDF4 is HDF5 (except if no compression). So, yes it would be good to mention this in the docs as implying that GMT can read HDF files as well.

Hi @Joaqui, might you be able to check what is written and what may need to be added? You can ping @remkos if you need input.

Thank you for this thread. I successfully read an HDF5 into GMT this week, but ran into a few things along the way that might help other users.

Problems: if there is a Dimension Scale on an arbitrary dataset in HDF5, I found that it doesnā€™t get into the grdfile (unless the HDF5 file has a very specific structure of only one layer with only x, y, z). I also found that pixel-node grids were being read as gridline-node, despite the settings of the HDF5ā€™s node_offset. Not so important, but I suspect both behaviors have to do with defaults or hard-coded conventions in the GDAL-to-GMT interface?

Solution: So, I used grdedit to extract an arbitrary grid from a big HDF5 file and give it the right headers (with help from the cookbook).

gmt grdedit $hdf_filename=gd?HDF5:"$hdf_filename"://path/to/dataset -R$gmt_desired_range -Gtemp.grd   # send desired layer out to grdfile, using GDAL. Edit the header to have the proper geographic range. 
gmt grdedit temp.grd -T    # turn the file to pixel node registration if needed. Must be done in second step to preserve increment.

There are probably other ways, but this worked for me. Apologies if this post just comes from my HDF5 ignorance.

1 Like

Thanks, kmaterna, that is an excellent post.

I had been handling HDF5 grids via a two-step method: using gmtconvert to dump a grid as a single column binary file, then applying an xyz2grd command. Your grdedit approach pretty much does the job directly, with a minor manipulation to get the registration right. Bravo!

A slightly different topicā€¦

Lately, Iā€™ve been working out how to access an H5 data set that is provided in grid form with each grid cell containing a histogram. This is a 3-dimensional structure, where each grid cell (i,j) has a 601-element third dimension that makes up a histogram. Using h5dump, I am able to specify the grid cell (i,j) of interest and obtain a binary dump (into t.bin) of the 601-element histogram, via a command like:

h5dump -y -A 0 -o t.bin -b LE -d "path_to_dataset[$i,$j,0;;1,1,601]" $hdf_filename >& /dev/null

The items in the brackets provide the starting point (i,j,0) and the ā€œcountā€ (1,1,601) for the specific grid-cell histogram data to dump. The command above applies the short form for hyperslab subsetting.

Iā€™d like to learn what the appropriate GMT corollary command would be to do this same task? Any ideas are most welcome.

Slight revival to this thread, because a year later I encountered a similar situation where I needed to access a specific level of a 3D grid contained in an HDF5 file. I found that using the subsetting features of h5dump were terribly slow when extracting a 1440x480 grid, in a five level HDF5 file.

Instead, I found that h5dump is far faster when it dumps all levels of a 3D grid. I spent a little time understanding how the values are dumped, by h5dump, by dumping them into a text (non-binary) file, and studying that comparing the output to what HDFView showed for the same parameter.

Upon understanding how the data was dumped, I simply saved the dump as a binary file, then fed that into gmtconvert to isolate the specific level of interest. Finally, the output column (of the selected level) can be fed into xyz2grd to form a GMT-style (NetCDF) grid file required.

Here is an example.

Given an HDF5 file, called indsn.h5, and given a parameter name that is a 3D array (called: our_param), you can use h5dump to extract the whole 3D parameter, saving it as a Little Endian binary file (t.bin) with a command like:

h5dump -y -A 0 -o t.bin -b LE -d our_param indsn.h5 >& /dev/null

The above command will create a file with all levels (0 to (number of levels)-1) represented as individual columns. Next, the gmtconvert command can be used to select the column of interest (you need to know the number of levels in the 3D parameter, 5 in this example; also need to know the output format); to select column 1 (corresponding to level 1) and save it into t1.bin (converting a 4 byte float to an 8 byte float):

gmt gmtconvert -bi5f -bo1d -o1 t.bin > t1.bin

Finally, the xyz2grd command comes into play. One needs to know, ahead of time, the grid limits and resolution of the grid (one way to ascertain that is with HDFView). The final GMT-style grid is called tg.grd (-d converts invalid values to NaN, and -Vq quiets GMT messages).

gmt xyz2grd t1.bin -Gtg.grd -R-180/180/-60/60 -I0.25 -r -ZBLd -di${inval}+c0 -Vq

The execution of these three commands takes place in a matter of seconds. When I tried to use h5dump subsetting options, it was taking nearly 80 minutes!

Hope future readers find this additional post to be helpful.

4 Likes