NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Chris, I wrote a couple of blogs on netCDF-4 chunking that might provide additional guidance: http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes You're right that there's no single chunking strategy that fits all access patterns, which makes it important to not rely on default chunking if you know how the data will be accessed. > float time(time) ; time:_Storage = "chunked" ; time:_ChunkSizes = 16384 ; > > This is the 1-d array -- similar to our case. How did you come up with the > 16384 (2^14)? Is there a benefit to base-2 numbers here -- I tend to do > that, too, but I'm not sure why. Disk blocks are typically powers of 2 bytes, for example 4096 or 1048576 bytes. For netCDF-4 and HDF5, a chunk is the atomic unit of I/O access. Thus a chunk size that is a multiple of the disk block size or slightly under makes sense when you access only a few chunks at a time. It's not that important, if a large number of chunks are typically accessed at once. Providing a large enough chunk cache to prevent re-accessing the same chunk repeatedly is also important. > But my minimum tests have indicated that performance isn't all that > sensitive to chunk sizes within a wide range. In the blogs above, I present some cases where chunk sizes and shapes can make a significant difference in performance. An HDF5 white paper also demonstrates this point: http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/ Thanks for offering some new ideas for a chunking strategy. We hereby resolve to try to improve chunking in the New Year! --Russ
netcdfgroup
archives: