NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Russ, > For some light reading last night, I was reading the HDF5 FAQs (I > know, I've got to get out more :-), and came across a possible :-) > show-stopper: > > http://hdf.ncsa.uiuc.edu/hdf5-quest.html#grdwt > > As background, users of netCDF sometimes have one writer process and > one or more reader processes opening and accessing the same file > concurrently, using nc_sync() or NC_SHARE to make sure the readers and > writer see a consistent version of the file. The way concurrent > access is handled is explained here in about seven paragraphs: > > > http://www.unidata.ucar.edu/packages/netcdf/guidec/guidec-10.html#HEADING10-322 > > under the nc_sync() description. > > Note that there are two different levels of concern for > synchronization: > > 1. data, that is values of variables that are changed and new data > added, including new records as the result of the unlimited > dimension being increased by the writer process > > 2. schema changes, such as adding new dimensions, variables, or > attributes, changing the names of things, or even changing the > value of an attribute. > > NetCDF provides good support for multiple readers and one writer for > changes of the first type, to the data, by either using nc_sync() or > (preferred) by using the NC_SHARE flag on open. > > NetCDF provides almost no support for concurrent changes of the second > type, which involve a writer changing the schema (header) information > for a file, implying that the cached in-memory header information > would all have to be reread. > > So for the fairly uncommon second kind of change (to the schema), we > recommend that some external form of communication be used to inform > the readers of a need to close and reopen the file to see the changes > made by the writer. However the more common first kind of change is > handled without needing any communication between writer and readers > and without requiring closing and reopening the file. > > If my reading of the HDF5 FAQ answer is right, this common kind of > data concurrency is not supported in HDF5, so systems that make data > changes with a concurrent writer and one or more readers won't work > unless we provide some new communication among the processes doing I/O > to make sure readers close and then reopen the file after *any* write. > Is this right, or am I taking the HDF5 FAQ answer too literally? > > We're currently not doing all this stuff in our netCDF-4 prototype > if a file is open with the NC_SHARE flag or on nc_sync() calls. If we > have to add code on reads to close and then reopen the file if it's > been modified, this will require some rework and have performance > implications. > > On the other hand, maybe everything is OK and the above is not really > necessary to assure that the reader gets a consistent, if not > absolutely up-to-date, view of the file (which is all that the netCDF > implementation needs). > > Comments? This sort of concurrency is not supported by default, but it should be possible to achieve it with sufficient tweaking of the caching parameters. You can use H5Pset_sieve_buf_size() to turn off raw data caching and you can use H5Pset_cache() to turn off metadata caching also. Obviously, performance is not great in these scenarios, but I think it will work. If we want to recover some of the performance given up by these sort of tweaks, we could change the internal caches to allow write-through instead of write-back caching, which would probably recover a significant chunk of the slowdown. Quincey
netcdf-hdf
archives: