NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Howdy all! Here in (normally) sunny Boulder, Colorado, we have been having some very cold weather. As we huddle around the iron stove in rough-hewn log cabin that houses the netCDF programming team (wishing we had more coal for our fire) we fell to talking about how to set chunk sizes for netCDF-4/HDF5 data. The setting of good chunk sizes depends on how the data will be read, but it must be decided when the data are written. For those out there who are also interested in increasing performance with good chunk sizes in netCDF-4/HDF5 files, I can offer some information. New Documentation: ------------------ I have added a section on chunking to the NetCDF Users Guide. The latest version can be found here: http://www.unidata.ucar.edu/software/netcdf/docs_snapshot/netcdf.html#Chunking Use the Chunk Cache: -------------------- The chunk cache is important for chunking. It is (by default) 1 MB for netCDF-4.0, and the default was increased to 32 MB for netCDF-4.0.1. (The chunk cache can also be set at run-time with the nc_set_chunk_cache function; the default can be set at configure time.) You must set the chunk cache to be larger than one chunk, obviously. How much larger depends on your access pattern. Note that this is the one aspect of chunking that can be controlled by the data reader. Test Performance with the bm_file Program: ------------------------------------------ There is a program called "bm_file" which comes with the netCDF distribution (you must configure with --enable-benchmarks), and can be used to test different chunk/deflation/shuffle settings (with or without parallel I/O) to guide your selections. It is described in the new section of the manual. Default NetCDF-4 Chunking: -------------------------- The default chunking of netCDF is to assign a chunk size of 1 for unlimited dimensions, and chunk size matching the full dimension length for fixed dimensions, unless those fixed dimensions are very large. This works well for small data sets, or data sets which will be read in one "record" at a time. A complete discussion of the default chunking is in the Users Guide. I am certainly very open to suggestions as to better default chunk size choices. Thanks! Ed -- Ed Hartnett -- ed@xxxxxxxxxxxxxxxx
netcdfgroup
archives: