NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Dennis, I agree with you that your proposed slicing strategy is what we most often use. Since the input-data often is weather-data and therefore grib-related, the 'grib-strategy' with c=m-1, i.e. 2-d arrays is our default. We have a few exceptions to this strategy. a) global high-resolution datasets, tiling strategy Since most of our reads are only interested in our region, we chunk the world into a few (4x3 or 5x2) tiles, which gives us usually 4x faster IO on gzipped chunks. b) timeseries strategy (this is not operational, just testing) For serving point-timeseries of weather data to the public, we rechunk the files to 2x2 or 4x4 tiles in x/y direction, and make the time-chunk as large possible. In most cases, we don't chunk per variable but per dimension. About the usefulness of your approach: It is not as flexible as the old approach, so point a) and b) aren't covered. It would be a nice simplification if one could easily set a chunking strategy like in netcdf-java, e.g. GRIB_CHUNK_STRATEGY, or, "COMPLETE_RIGHT_DIMENSIONS_CHUNK_STRATEGY, 2". I prefer to set c from the right, rather than the left, since I often have (time,y,x), (time,z,y,x) and (time,ensemble,z,y,x) variables in the same file and it's the rightmost part which is the same. Best regards, Heiko On 2017-05-15 21:29, dmh@xxxxxxxx wrote: > I am soliciting opinions about an alternate way to specify chunking > for netcdf files. If you are not familiar with chunking, then > you probably can ignore this message. > > Currently, one species a per-dimension decomposition that > together determine how a the data for a variable is decomposed > into chunks. So e.g. if I have variable (pardon the shorthand notation) > x[d1=8,d2=12] > and I say d1 is chunked 4 and d2 is chunked 4, then x will be decomposed > into 6 chunks (8/4 * 12/4). > > I am proposing this alternate. Suppose we have > x[d1,d2,...dm] > And we specify a position 1<=c<m > Then the idea is that we create chunks of size > d(c+1) * d(c+2) *...dm > There will be d1*d2*...dc such chunks. > In other words, we split the set of dimensions at some point (c) > and create the chunks based on that split. > > The claim is that for many situations, the leftmost dimensions > are what we want to iterate over: e.g. time; and we then want > to read all of the rest of the data associated with that time. > > So, my question is: is such a style of chunking useful? > > If this is not clear, let me know and I will try to clarify. > =Dennis Heimbigner > Unidata > > > > _______________________________________________ > NOTE: All exchanges posted to Unidata maintained email lists are > recorded in the Unidata inquiry tracking system and made publicly > available through the web. Users who post to any of the lists we > maintain are reminded to remove any personal information that they > do not want to be made public. > > > netcdfgroup mailing list > netcdfgroup@xxxxxxxxxxxxxxxx > For list information or to unsubscribe, visit: > http://www.unidata.ucar.edu/mailing_lists/ -- Dr. Heiko Klein Norwegian Meteorological Institute Tel. + 47 22 96 32 58 P.O. Box 43 Blindern http://www.met.no 0313 Oslo NORWAY
netcdfgroup
archives: