Re: [netcdfgroup] unlimited dimensions and chunking??

To: Charlie Zender <zender@xxxxxxx>
Subject: Re: [netcdfgroup] unlimited dimensions and chunking??
From: Chris Barker <chris.barker@xxxxxxxx>
Date: Thu, 9 Jan 2014 15:44:19 -0800

On Thu, Jan 9, 2014 at 3:28 PM, Charlie Zender <zender@xxxxxxx> wrote:

> I read with interest your discussion on chunking.
> I added Chris's suggestion to NCO's supported chunking options.
>

> NCO 4.4.0 now implements six "chunking maps"
> http://nco.sf.net/nco.html#cnk
>

nice!

One small doc comment:

"""
Unchunking

Definition: Unchunk all variables possible. The HDF5 storge layer requires
that record variables (i.e., variables that contain at least one record
dimension) must be chunked. Also variables that are compressed or use
checksums must be chunked.
"""
Unlimited dimensions must be chunked as well. Not sure if NCO preserves
those.

And some thoughts:

I may be mis-interpreting some of this (and not totally sure what a "record
dimension" is), but

"""
Chunksize Equals Dimension Size except Record Dimension

Definition: Chunksize equals dimension size except record dimension has
size one. Explicitly specify chunksizes for particular dimensions with
‘--cnk_dmn’ option.
cnk_map key values: ‘rd1’, ‘cnk_rd1’, ‘map_rd1’
Mnemonic: Record Dimension size 1
"""

if you had a 1-d variable of records, would that mean chunks equal the
record size? 'cause that would be way too small  in the common case.

"""
Chunksize Lefter Product Matches Scalar Size Specified

Definition: The product of the chunksizes for each variable (approximately)
equals the size specified with the ‘--cnk_scl’ option. This is accomplished
by using dimension sizes as chunksizes for the rightmost (most rapidly
varying) dimensions, and then “flexing” the chunksize of the leftmost
(least rapidly varying) dimensions such that the product of all chunksizes
matches the specified size. All dimensions to the left of and including the
first record dimension define the left-hand side. This map was first
proposed by Chris Barker.
cnk_map key values: ‘lfp’, ‘cnk_lfp’, ‘map_lfp’
Mnemonic: LeFter Product
"""

That sounds good -- and thanks for the credit!

Not so clear from the amount of time I've spent reading that, but what
would be the default chunking for a 1-d unlimited variable? or a 2-d, with
one dimension very small (Nx3, for instance)?

Those were the use cases where the default chunking in netcdf4 killed us.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx

Follow-Ups:
- Re: [netcdfgroup] unlimited dimensions and chunking??
  - From: Charlie Zender

References:
- Re: [netcdfgroup] unlimited dimensions and chunking??
  - From: Charlie Zender