NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Ken, > ... That sounds good to me, but your web site also puts a disclaimer > that "they probably should not be" read with the HDF5 tools. Can > you explain that disclaimer? In addition to what Ed wrote, interoperability will be enhanced by data providers if they can restrict the HDF5 features they use to what have been identified in the Common Data Model that netCDF-4 implements, because then they can use either HDF5 tools or netCDF-4 tools on the same file. If a data provider decides to use HDF5 References, the implications should be understood, that the data may not easily be mapped into netCDF-4 (or OPeNDAP) abstractions that support access through another interface. > Also, would you perhaps provide a comment on my leaning one way or > the other between netCDF-4 and HDF-5? The data sets I typically > deal with are typically either regularly gridded or at least > geo-referenced with lat/lon coords for each satellite observation. > HDF4's tiling/chunking is very important to me, especially for the > big global grids (serving subsets via OPeNDAP is much faster than > when the data are in netCDF-3 and the whole file needs to be > decompressed, even if the user wants only one pixel), so I am glad > to see that feature is part of netCDF-4/HDF5. Also, the new parallel > I/O features I feel will become ever more important especially when > we move into the NPOESS era. I'd say there are tradeoffs, and you should try to preserve some flexibility for the people who will access the data in the future. If most of the users of the data or developers of applications that will access the data are already familiar and happy with HDF5, that's important. At this point HDF5 is more mature in support for things like parallel I/O and chunking than netCDF-4. We aren't yet even sure that the chunking parameters we use as a default are optimal for any particular use. Some other considerations are: - your judgment of the importance of simplicity versus power among users and developers - likely future size and funding for the development/support groups: The HDF Group, Inc and UCAR's Unidata Program - size of external developer community providing additional tools and language interfaces - likely future user communities: NPP/NPOESS operational and research community, HPC communities, climate and geoscience users in research and education - stance toward backward compatibility versus new features - importance of optimal performance in comparison with other characteristics and so on. It's a difficult decision that involves some risks either way. If you decide to use HDF5, I would advise you to be conservative in use of features not supported by other data models. --Russ