NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Leon, > Thanks for mentioning chunk sizing; that's not something I had thought > about. I've got one unlimited dimension, and it sounds like that means an > inefficient default chunks size < > http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Default-Chunking.html > #Default-Chunking>. > ("For > unlimited dimensions, a chunk size of one is always used." What's the > unit? > One DEFAULT_CHUNK_SIZE? Maybe it'll become clear as I read more.) > It means that if you have a variable with an unlimited dimension, such as float var(time, lon, lat) where time is unlimited, then the default chunks will be of shape 1 x clon x clat values (not bytes), for integers clon, clat computed to be smaller than but proportioanl to the sizes of the lon and lat dimensions, resulting in a default chunksize close to but less than 4 MB (so in this case each chunk has about 1 million values). These default chunks are not necessarily good for some kinds of access. A good chunk size and shape may depend on anticipated access patterns as well as disk block size of the file system on which the data is stored. I've started a series of blog postings about chunk shapes and sizes, but so far only posted the first part: http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters Eventually, with feedback on these, better guidance and software defaults for chunking may result. I'll try to post the second installment next week. > I guess I've got some reading ahead of me. For resources, I see the > powerpoint presentation<http://hdfeos.org/workshops/ws13/presentations/day1/H > DF5-EOSXIII-Advanced-Chunking.ppt>that's > linked to and the HDF5 page on > chunking <http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/>. Do you have > any other recommendations? I liked these papers, though they get a bit technical: Efficient Organization of Large Multidimensional Arrays http://cs.brown.edu/courses/cs227/archives/2008/Papers/FileSystems/sarawagi94efficient.pdf Optimal Chunking of Large Multidimensional Arrays for Data Warehousing http://www.escholarship.org/uc/item/35201092 --Russ > Thanks. > -Leon > > On Wed, Feb 20, 2013 at 4:31 PM, Russ Rew <russ@xxxxxxxxxxxxxxxx> wrote: > > > > Large chunk sizes might mean a lot of extra I/O, as well as extra CPU > > for uncompressing the same data chunks repeatedly. You might see if > > lowering your chunk size significantly improves network usage ...
netcdfgroup
archives: