NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Howdy Ed!Thank you for the new documentation! As it happens, I am refining the user interface and writing docs for GrADS on the very same subject ... I sure would have benefitted from joining that fireside chat. If you are open to suggestions for default chunk size settings, I would like to lobby for chunks a bit smaller than your proposal, something more along the lines of GRIB records, which are 2-dimensional, varying only in longitude and latitude. In general terms, chunks should have size > 1 only for the fastest and second-fastest varying dimensions. I can't speak for all the software out there, but in GrADS, I/O requests vary only in 1 or 2 dimensions, so being forced to read three- dimensional chunks will be really costly (in memory terms) and will also slow down the performance to the point of being unusable as grid resolution increases.
On a related note, I would also like to lobby to reduce these parameters #define NC_LEN_TOO_BIG 65536 #define NC_LEN_WAY_TOO_BIG 1048576by a couple orders of magnitude. A chunk that is 65536 on a side would never fit into the 32Mb default cache. The cache must be big enough to hold at least ~50-100 chunks. And the cache is allocated on a per- variable basis, so if you are forced to set the cache size large because chunk size is large, then you're in danger of running out of memory (unless memory is an unlimited resource on your system, which is not the usual case). Chunks that are too small do a lot less harm than chunks that are too big.
Respectfully submitted, Jennifer -- Jennifer M. Adams IGES/COLA 4041 Powder Mill Road, Suite 302 Calverton, MD 20705 jma@xxxxxxxxxxxxx On Dec 8, 2009, at 3:54 PM, Ed Hartnett wrote:
Howdy all! Here in (normally) sunny Boulder, Colorado, we have been having somevery cold weather. As we huddle around the iron stove in rough-hewn logcabin that houses the netCDF programming team (wishing we had more coal for our fire) we fell to talking about how to set chunk sizes for netCDF-4/HDF5 data. The setting of good chunk sizes depends on how the data will be read, but it must be decided when the data are written. For those out there who are also interested in increasing performance with good chunk sizes in netCDF-4/HDF5 files, I can offer some information. New Documentation: ------------------I have added a section on chunking to the NetCDF Users Guide. The latestversion can be found here: http://www.unidata.ucar.edu/software/netcdf/docs_snapshot/netcdf.html#Chunking Use the Chunk Cache: -------------------- The chunk cache is important for chunking. It is (by default) 1 MB for netCDF-4.0, and the default was increased to 32 MB for netCDF-4.0.1. (The chunk cache can also be set at run-time with thenc_set_chunk_cache function; the default can be set at configure time.)You must set the chunk cache to be larger than one chunk, obviously. Howmuch larger depends on your access pattern. Note that this is the one aspect of chunking that can be controlled by the data reader. Test Performance with the bm_file Program: ------------------------------------------ There is a program called "bm_file" which comes with the netCDF distribution (you must configure with --enable-benchmarks), and can beused to test different chunk/deflation/shuffle settings (with or withoutparallel I/O) to guide your selections. It is described in the new section of the manual. Default NetCDF-4 Chunking: -------------------------- The default chunking of netCDF is to assign a chunk size of 1 forunlimited dimensions, and chunk size matching the full dimension length for fixed dimensions, unless those fixed dimensions are very large. Thisworks well for small data sets, or data sets which will be read in one "record" at a time. A complete discussion of the default chunking is in the Users Guide. I am certainly very open to suggestions as to better default chunk size choices. Thanks! Ed -- Ed Hartnett -- ed@xxxxxxxxxxxxxxxx _______________________________________________ netcdfgroup mailing list netcdfgroup@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
netcdfgroup
archives: