NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] NF90_SYNC question

Hi Leon,

> Thanks for mentioning chunk sizing; that's not something I had thought
> about. I've got one unlimited dimension, and it sounds like that means an
> inefficient default chunks size <
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Default-Chunking.html
> #Default-Chunking>.
> ("For
> unlimited dimensions, a chunk size of one is always used." What's the
> unit?
> One DEFAULT_CHUNK_SIZE? Maybe it'll become clear as I read more.)
> 

It means that if you have a variable with an unlimited dimension, such
as 

   float var(time, lon, lat) 

where time is unlimited, then the default chunks will be of shape 

   1 x clon x clat

values (not bytes), for integers clon, clat computed to be smaller
than but proportioanl to the sizes of the lon and lat dimensions,
resulting in a default chunksize close to but less than 4 MB (so in
this case each chunk has about 1 million values).  These default
chunks are not necessarily good for some kinds of access.  A good
chunk size and shape may depend on anticipated access patterns as well
as disk block size of the file system on which the data is stored.

I've started a series of blog postings about chunk shapes and sizes,
but so far only posted the first part:

  
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters

Eventually, with feedback on these, better guidance and software
defaults for chunking may result.  I'll try to post the second
installment next week.

> I guess I've got some reading ahead of me. For resources, I see the
> powerpoint presentation<http://hdfeos.org/workshops/ws13/presentations/day1/H
> DF5-EOSXIII-Advanced-Chunking.ppt>that's
> linked to and the HDF5 page on
> chunking <http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/>. Do you have
> any other recommendations?

I liked these papers, though they get a bit technical:

  Efficient Organization of Large Multidimensional Arrays
  
http://cs.brown.edu/courses/cs227/archives/2008/Papers/FileSystems/sarawagi94efficient.pdf

  Optimal Chunking of Large Multidimensional Arrays for Data Warehousing
  http://www.escholarship.org/uc/item/35201092

--Russ  

> Thanks.
> -Leon
> 
> On Wed, Feb 20, 2013 at 4:31 PM, Russ Rew <russ@xxxxxxxxxxxxxxxx> wrote:
> >
> > Large chunk sizes might mean a lot of extra I/O, as well as extra CPU
> > for uncompressing the same data chunks repeatedly.  You might see if
> > lowering your chunk size significantly improves network usage ...



  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: