NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi, A netCDF bug first reported by Jeorg Henrichs and described under "Known Problems with NetCDF Distributions": http://www.unidata.ucar.edu/netcdf/docs/known_problems.html#lustre is apparently more serious than we first thought, although it appears to only occur rarely. The bug is fixed in the upcoming netCDF-4.1.3-beta release, which is currently undergoing final testing. This "nofill bug" has been in netCDF-3 releases since at least 1999. In summary, writing data in nofill mode that crosses disk block boundaries more than a disk block beyond the end of a file under some circumstances can zero out previously written data that hasn't yet been flushed to disk. The following conditions are necessary for the nofill bug to occur: 1. Writing data to a netCDF classic format file or 64-bit offset file using nofill mode (not the library default, but used in some utility software). 2. Writing data that crosses the boundary between one and two disk blocks beyond the last block in the file, as might happen when writing a multidimensional variable by slices in reverse order. The above conditions are required, but are not sufficient. Occurrence of the bug also depends on the amount of data, where the data is written in terms of disk block boundaries, and the current state of a memory buffer. These additional conditions make the bug unlikely, but more likely on file systems with large disk I/O block sizes. The bug was first reported on a high-performance file system that uses 2MB disk blocks. The result of the bug is a corrupt file, with data earlier in the file overwritten with zeros. The earlier data is overwritten with no indication that an error occurred, so the user may think the data is correctly stored. We've verified the bug exists in all previous versions of C-based netCDF-3 releases, but not in netCDF-Java. Perhaps it wasn't noticed until recently, because all the systems on which we were testing have small disk block sizes (8 KB or less), and the bug is more likely with large disk blocks. Also, most of our tests don't write the last variable in a file backwards, leaving file system "holes" when written in nofill mode. Writing data in nofill mode requires a call such as one of the following for C, Fortran-77, Fortran-90, or C++ APIs, respectively: nc_set_fill(ncid, NC_NOFILL, &old_fill_mode) nf_set_fill(ncid, NF_NOFILL, old_fill_mode) nf90_set_fill(ncid, NF90_NOFILL, old_fill_mode) file->set_fill(NcFile::NoFill) More information about nofill mode is available here: http://www.unidata.ucar.edu/netcdf/docs/netcdf-c.html#nc_005fset_005ffill Some widely used software, such as NCO, have used nofill mode as a default for better performance, so a user might not be aware that files are being written in nofill mode. A separate announcement of a new release of NCO that doesn't use nofill mode will appear soon. Although the nofill bug is fixed in netCDF version 4.1.3, to which we recommend upgrading, other workarounds include - avoiding use of nofill mode - enabling share mode (NC_SHARE) - not writing large variables at the end of a file in reverse of the order in which values are stored - using netCDF-4 More details are available in our bug database, if you're interested: http://www.unidata.ucar.edu/jira/browse/NCF-22 A separate patch to netCDF 4.1.2 that just fixes this bug is also available here: http://www.unidata.ucar.edu/netcdf/patches/nofill-bug.patch If you're interested in how to determine your disk I/O block size, see, for example: http://www.linfo.org/get_block_size.html --Russ
netcdfgroup
archives: