NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] Alternate chunking specification

On Tue, May 23, 2017 at 2:30 PM, Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
wrote:

> On a related note, many users have complained of very poor performance on
> files with a chunksize of 1 in the record dimension, when they are using
> the data in other ways that reading one lat-lon grid at a time. Naturally,
> this is understandable. To even get one value in the level, the entire
> lat-lon grid must be read.
>

This is the inherent problem with chunking -- a good chunking strategy
completely depends on the access pattern.


> So perhaps having all the non-1 dimensions use a chunksize of their
> fullest extent is not such a good idea.
>

exactly -- for defaults, I think it's better that full extend chunks NOT be
used.

I did some experiment a while back ,and wildly too small or large chunks
had a big impact on performance, but it was not that sensitive to mid-size
chunks.

So if, for example, you have a 10kx10k lat-lon grid, you probably don't
want to use 1,10k,10k chunks

Better to use: 1, 1k, 1k, chunks. I'd bet that it would be almost as fast
when accessing the full grid at a given time, but much faster when
accessing only a small part of the grid.

or maybe (10, 100, 100) would be best -- much better for a time series at a
single point, and still probably not too slow for the whole grid (I found
1k chunks not too bad on that particular machine anyway...)

-CHB


>>
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: