NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] Alternate chunking specification

Hi Dennis,

I agree with you that your proposed slicing strategy is what we most
often use. Since the input-data often is weather-data and therefore
grib-related, the 'grib-strategy' with c=m-1, i.e. 2-d arrays is our
default.

We have a few exceptions to this strategy.
a) global high-resolution datasets, tiling strategy
Since most of our reads are only interested in our region, we chunk the
world into a few (4x3 or 5x2) tiles, which gives us usually 4x faster IO
on gzipped chunks.

b) timeseries strategy (this is not operational, just testing)
For serving point-timeseries of weather data to the public, we rechunk
the files to 2x2 or 4x4 tiles in x/y direction, and make the time-chunk
as large possible.

In most cases, we don't chunk per variable but per dimension.

About the usefulness of your approach: It is not as flexible as the old
approach, so point a) and b) aren't covered. It would be a nice
simplification if one could easily set a chunking strategy like in
netcdf-java, e.g. GRIB_CHUNK_STRATEGY, or,
"COMPLETE_RIGHT_DIMENSIONS_CHUNK_STRATEGY, 2". I prefer to set c from
the right, rather than the left, since I often have (time,y,x),
(time,z,y,x) and (time,ensemble,z,y,x) variables in the same file and
it's the rightmost part which is the same.

Best regards,

Heiko


On 2017-05-15 21:29, dmh@xxxxxxxx wrote:
> I am soliciting opinions about an alternate way to specify chunking
> for netcdf files. If you are not familiar with chunking, then
> you probably can ignore this message.
> 
> Currently, one species a per-dimension decomposition that
> together determine how a the data for a variable is decomposed
> into chunks. So e.g. if I have variable (pardon the shorthand notation)
>   x[d1=8,d2=12]
> and I say d1 is chunked 4 and d2 is chunked 4, then x will be decomposed
> into 6 chunks (8/4 * 12/4).
> 
> I am proposing this alternate. Suppose we have
>     x[d1,d2,...dm]
> And we specify a position 1<=c<m
> Then the idea is that we create chunks of size
>    d(c+1) * d(c+2) *...dm
> There will be d1*d2*...dc such chunks.
> In other words, we split the set of dimensions at some point (c)
> and create the chunks based on that split.
> 
> The claim is that for many situations, the leftmost dimensions
> are what we want to iterate over: e.g. time; and we then want
> to read all of the rest of the data associated with that time.
> 
> So, my question is: is such a style of chunking useful?
> 
> If this is not clear, let me know and I will try to clarify.
> =Dennis Heimbigner
>  Unidata
> 
> 
> 
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
> 
> 
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/

-- 
Dr. Heiko Klein                   Norwegian Meteorological Institute
Tel. + 47 22 96 32 58             P.O. Box 43 Blindern
http://www.met.no                 0313 Oslo NORWAY



  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: