NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] nccopy -c does not rechunk properly (4.3.1.1)

Hi Russ,

On Thu, Feb 27, 2014 at 2:38 PM, Russ Rew <russ@xxxxxxxxxxxxxxxx> wrote:

>   #define CHUNK_THRESHOLD (8192)   /* variables with fewer bytes don't get
> chunked */
>
> The intent of the CHUNK_THRESHOLD minimum is to not create chunks
> smaller than a physical disk block, as an I/O optimization, because
> attempting to read a smaller chunk will still cause a whole disk block
> to be read.


So I take it 8k is a reasonable expectation for disk cache these days?

But this is a great tidbit -- I'm working on code to write data in the
"new" UGRID standard:

https://github.com/ugrid-conventions/ugrid-conventions

And the code:
https://github.com/pyugrid/pyugrid

And I wanted to set some reasonable defaults for chunking. In this case,
you tend to have a lot of large 1-d arrays, and most of the discussions
I've seen are about multi-dimensional arrays. It sounds like I should set a
minimum chunk size of 8k bytes then.


>   However, I think for the next
> release, we should lower the default threshold to 512 bytes, and
> document the behavior.
>

Document -- of course, but why lower the threshold?

Though maybe the thresholds are good for defaults, but if a user asks for
smaller than optimum chunk sizes, maybe that's what they should get.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: