NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdfgroup] Patch for netCDF4 file bit-for-bit reproducibility

Hi Rimvydas,

> Recently I started to work with netcdf in fortran, mainly changing f77
> interface to more flexible f90 one.
> And I love it! Fantastic API.
> 
> I am dealing with code that has huge testsuite for regression testing,
> so I am trying to found compromise for size and speed.
> Code was intended to output lots of diagnostics (~1Gb) for every test.
> 
> Lack of ncdiff tool made me to write my one, but while trying to
> optimize it for time
> I learned that half the time I am spending in my comparison loops,
> other half in swap8b...
> NETCDF4 features like compression and native endianess are very appealing
> but lack of BFB (even with nccopy) just because of internal timestamping is s
> ad.
> 
> http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2008/msg00003.
> html
> Is this still valid?

It looks like the fix you've developed and tested (first suggested by
Quincey Koziol in January 2008) would make bit-for-bit reproducibility
possible for netCDF-4 files.

We're testing turning off tracking times for HDF5 objects to determine
if there are any undesirable side effects.  If not, we'll incorporate it
into the next release.

Thanks for bringing this to our attention!

--Russ

> I am attaching small patch that I made on netcdf-4.1.3.
> Just with this patch and these configure options I successfully can
> reproduce identical files
> using nccopy not depending on system time or having to relay on some
> hooks for get unix time.
> CPPFLAGS="-I$(hdf}/include"
> CFLAGS="-DBFB_MODE"
> LDFLAGS="-L${hdf}/lib -ldl"
> ./configure --prefix=${netcdf} --enable-netcdf-4 --disable-hdf4
> --disable-pnetcdf --enable-cdmremote=no --disable-dap --disable-v2
> --disable-shared --with-pic
> 
> In source code there isn't more of H5P_[A-Z]+_CREATE calls (except for
> ones in tests)
> 
> Is is safe enough to be used for reproducibility checks at least with
> netcdf3/netcdf4 classic format?
> All I need is to be able to use md5sum on repeated runs to speed up
> the process with the same netcdf/hdf lib.
> 
> Best regards,
> Rimvydas