NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
opps -- that should have been a "reply all". minor rant: All mailing list should be set to reply to the list be default! (yes, I know there are arguments otherwise -- carry on) ---------- Forwarded message ---------- From: Chris Barker - NOAA Federal <chris.barker@xxxxxxxx> Date: Thu, Oct 23, 2014 at 8:40 AM Subject: Re: [netcdfgroup] nccopy should use 1 as default-chunksize for unlimited dimension To: Ed Hartnett <edwardjameshartnett@xxxxxxxxx> On Oct 23, 2014, at 7:23 AM, Ed Hartnett <edwardjameshartnett@xxxxxxxxx> wrote: This gives very poor performance when the number of timesteps in the file is large. Well, that's the trick with chunking--appropriate chunk sized depend on the shape/size of the array, hardware specs, and access patterns. The code that determines defaults can only know about the array shape. But we need to make sure it at least accounts for that. Using a small chunk on the time dimension is fine IF the other dimensions are large, AND you want most of a chunk's worth of data at each access. A common use case is a 3 or 4-d array, (say T x X x Y) where the user needs to access all of X and Y, one time step at a time. In this case, a chunk size if one in the t dimension makes sense, regardless of how large the t dimension is. In my (pretty limited) experimentation, I found that performance is not very sensitive to chunk sizes within "reasonable" bounds. Very small chunks (on order of 10 bytes) give horrible performance both in file size and speed, and really large chunks (maybe > 10s of MB) can provide bad performance depending a bit on access patterns. The goals are to a) not have very small chunks, and b) have most of a chunk used for each access. But without knowing the access patterns, the is no way to optimize for (b) in defaults. For example, for a t,x,y array, having a 1x1024x1024 chunk size would work well if the user typically wanted the entire x,y domain for each time step. But if they wanted the entire time series at one point, it would be pretty bad. (Accessing 1MB in order to get a single value) It seems reasonable to me to assume that an unlimited dimension is one least likely to be accessed all at once, and thus should be the smallest chunk size dimension. But another approach would be to make no assumptions about access patterns, and create "square" chunks by default. This would yield equally good (and bad) performance for any access pattern. In either case, default chunks should never be tiny or huge. -Chris On Thu, Oct 23, 2014 at 5:25 AM, Heiko Klein <Heiko.Klein@xxxxxx> wrote: > Hi, > > when chunking files with unlimited dimension, the unlimited dimension must > be given explicitly in nccopy, and will usually be set to one. > > In a file with time as unlimited dimension, and X and Y as dimensions, it > is currently required to use > > $ nccopy -k 4 -c "time/1,X/100,Y/100" in.nc out.nc > > > When running without time, it does not work: > > $ nccopy -k 4 -c "X/100,Y/100" in.nc out.nc > NetCDF: Invalid argument > Location: file nccopy.c; line 637 > > > Only for unlimited dimensions, one needs to give the dimension explicitly, > for all other dimensions a useful default (full dim-size) is used. I think > a useful default for unlimited dimensions is 1. > > > Heiko > > > > -- > Dr. Heiko Klein Tel. + 47 22 96 32 58 > Development Section / IT Department Fax. + 47 22 69 63 55 > Norwegian Meteorological Institute http://www.met.no > P.O. Box 43 Blindern 0313 Oslo NORWAY > > _______________________________________________ > netcdfgroup mailing list > netcdfgroup@xxxxxxxxxxxxxxxx > For list information or to unsubscribe, visit: > http://www.unidata.ucar.edu/mailing_lists/ _______________________________________________ netcdfgroup mailing list netcdfgroup@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/ -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@xxxxxxxx
netcdfgroup
archives: