NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Chris, FWIW, here is a chunk size recipe that works well for some rather large gridded files that I work with. This is optimized for writing sequentially along the time dimension, and reading whole lat/lon grids sequentially or randomly. Selected metadata: ncdump -hst uwnd.1979-2012.nc ... level = 37 ; lat = 256 ; lon = 512 ; time = UNLIMITED ; // (49676 currently) int level(level) ; level:_Storage = "contiguous" ; float lat(lat) ; lat:_Storage = "contiguous" ; float lon(lon) ; lon:_Storage = "contiguous" ; float time(time) ; time:_Storage = "chunked" ; time:_ChunkSizes = 16384 ; float uwnd(time, level, lat, lon) ; uwnd:_Storage = "chunked" ; uwnd:_ChunkSizes = 1, 1, 256, 512 ; This scheme depends on good chunk cacheing with adequate buffers for both read and write. I think it is a good idea to design chunking on a per-variable basis, not per-dimension. Think of chunks as small hyperslabs, not dimension steps. Note in particular the successful use of two very different chunk numbers in two different variables on the unlimited time dimension. I do not have answers for your specific questions right now, hopefully someone else will respond. --Dave On Fri, Dec 27, 2013 at 2:15 PM, Chris Barker <chris.barker@xxxxxxxx> wrote: > Hi all, > > We're having some issues with unlimited dimensions and chunking. First, a > few notes: > > I'm using the netCDF4 python wrappers, and having different symptoms on > Windows and Mac, so this could be issues in the py wrappers, or the > netcdflib, or the hdf lib, or how one of those is built... > > If i try to use an unlimted dimension and NOT specify any chunking, I get > odd results: > > On Windows: > It takes many times longer to run, and produces a file that is 6 times > as big. > > On OS-X: > The mac crashes if I try to use an unlimited dimension and not specify > chunking. > > This page: > > > http://www.unidata.ucar.edu/software/netcdf/docs/default_chunking_4_0_1.html > > Does indicate that the default is chunksize of 1, which seems insanely > small to me, but should at least work. Note: does setting a chunksize of 1 > mean that HDF will really use that small of chunks? -- it perusing those > HDF docs, it seems it needs to beuild up a tree structure to > store where all the chunks are, and there are performance implications to a > large tree -- s a chunksize of 1 guarantees a really big tree. Wouldn't a > small, but far from 1 value make some sense? like 1k or something? > > In my experiments with a simple 1-d array, with an unlimited dimension, > writing a MB at a time, dropping the chunksize below about 512MB started to > effect write performance. > > Very small chunks really made it crawl. > > And explicitly setting size-1 chunks made it crash (on OS-X with a malloc > error). So I think that explains my problem. > > With smaller data sets, it works, but runs really slowly -- with a 8MB > dataset, going from a chunksize of 1 to a chunksize of 128 reduced write > time from 10 seconds to 0.1 second. > > Increasing to 16k reduces it to about 0.03 seconds -- larger than that > makes no noticable difference. > > So I think I know why I'm having getting problems with unspecified > chunksizes, and a chunksize of 1 probably shouldn't be the default! > > However, if you specify a chunksize, HDF does seem to allocate at least > one full chunk in the file -- makes sense, so you wouldn't want to store > very small variable with a large chunk size, but I suspect: > > 1) if you are using an unlimited dimension, you are unlikely to be storing > VERY small arrays. > > 2) netcdf4 seems to have about 8k of overhead anyway. > > So a 1k or so sized default seems reasonable. > > One last note: > > From experimenting, it appears that you set chunksizes in numbers > of elements rather than number of bytes. Is that the case, I haven't been > able to find it documented anywhere. > > Thanks, > -Chris > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker@xxxxxxxx > > _______________________________________________ > netcdfgroup mailing list > netcdfgroup@xxxxxxxxxxxxxxxx > For list information or to unsubscribe, visit: > http://www.unidata.ucar.edu/mailing_lists/ >
netcdfgroup
archives: