NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Ed, On 08/24/2017 08:16 PM, Ed Hartnett wrote:
You can turn on HDF5 checksums with nc_def_var_fletcher32() (See: https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf-c/nc_005fdef_005fvar_005ffletcher32.html).Is this what you want?
If I understand the purpose of fletcher32() correctly, it is meant as an internal integrity check where the library checks data it reads from disk against a checksum that has been created at the time of writing?
What I am aiming at is a way of telling if, under the assumption that the files are not corrupted, the actual data contained in two data sets are identical, without re-hashing everytime I wanto to know this.
Cheers Willi
Thanks, Ed HartnettOn Thu, Aug 24, 2017 at 12:04 PM, dmh@xxxxxxxx <mailto:dmh@xxxxxxxx> <dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>> wrote:A small note. Since the goal is equality testing rather than security, you should be able to get by with CRC32 or CRC64 checksums. SHA256 is overkill. =Dennis Heimbigner Unidata On 8/24/2017 12:00 PM, Willi Rath wrote: Hi all, I'd like to find a way to verify the contents of a given netCDF dataset across different representations on disk. (Think of the data set being defined by its CDL code and different representations on disk being realised by different choices of format, deflation, chunking, etc. but with identical CDL.) There are tools that compare the contents of two netCDF files: cdo's diff or nccmp. These tools do, however, rely on both files being present on the same file system and at the same time. A hash-based approach calculating checksums from the contents rather than the binary representation of the data set would be a nice solution to the problem. I've tried and collected all attempts made at verification of netCDF files in: https://github.com/willirath/netcdf-hash <https://github.com/willirath/netcdf-hash> (The most successful of which circled around the possibility of including the functionality in `ncks` and lead to a pair of tools for calculation and verification of MD5 checksums of netCDF files that are stored within the files.) There also is a demo outlining an approach digesting different representations of the same netCDF data set into a sha256 hash and storing the hex-value of this hash in global arguments in the respective files. I'd be very happy about any pointers to additional ideas (or perhaps existing tools) solving the problem of netCDF-content verification, about suggestions, remarks, etc. Cheers Willi _______________________________________________ NOTE: All exchanges posted to Unidata maintained email lists are recorded in the Unidata inquiry tracking system and made publicly available through the web. Users who post to any of the lists we maintain are reminded to remove any personal information that they do not want to be made public. netcdfgroup mailing list netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/<http://www.unidata.ucar.edu/mailing_lists/>_______________________________________________ NOTE: All exchanges posted to Unidata maintained email lists are recorded in the Unidata inquiry tracking system and made publicly available through the web. Users who post to any of the lists we maintain are reminded to remove any personal information that they do not want to be made public. netcdfgroup mailing list netcdfgroup@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
-- Willi Rath Theorie und Modellierung GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel Duesternbrooker Weg 20, Raum 422 24105 Kiel, Germany ------------------------------------------------------------ Tel. +49-431-600-4010 wrath@xxxxxxxxx www.geomar.de -----------------------
netcdfgroup
archives: